Blame view

src/site/markdown/indexing_formats_tei.md 3.32 KB
Matthijs Brouwer authored
1
#TEI
Matthijs Brouwer authored
2
Matthijs Brouwer authored
3
For indexing [ISO-TEI](http://www.tei-c.org/) resources, the *mtas.analysis.parser.MtasTEIParser* extending the abstract *MtasXMLParser* is available; full examples of configuration files are provided on [GitHub](https://github.com/textexploration/mtas/tree/master/conf/parser/mtas).
Matthijs Brouwer authored
4
Matthijs Brouwer authored
5
6
7
8
9
10
11
12
13
14
15
16
17
```xml
<!-- START CONFIGURATION MTAS PARSER -->
<parser name="mtas.analysis.parser.MtasTEIParser">
...
  <!-- START MAPPINGS -->
  <mappings>
  ...
  </mapping>
  <!-- END MAPPINGS --->
  ...
</parser>
<!-- END CONFIGURATION MTAS PARSER -->
```
Matthijs Brouwer authored
18
Matthijs Brouwer authored
19
The syntax of the parser part in the [configuration file](indexing_configuration.html#configuration) is, besides from the *name* attribute, almost identical to the configuration of the [FoLiA-parser](indexing_formats_folia.html). An additional feature is the definition and use of *variables*, again illustrated and explained with examples.
Matthijs Brouwer authored
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99

**Variables**

From occurring elements, variable-mappings may be derived and defined. Just as *references*, these definitions are placed within a *variables*-tag outside the *mappings*-tag within the *parser* configuration section. In the example below the variable-mapping *interval* is defined from each occurring *when*-tag, defining a mapping from the *id* of the *when*-tag to value of the *interval* attribute.

```xml
<variables>
  <variable name="when" value="interval">
    <value>
      <item type="attribute" name="interval" />
    </value>
  </variable>
</variables>
```

This will define for a TEI resource containing

```xml
...
<timeline unit="s">
  <when xml:id="TLI_0"/>
  <when xml:id="TLI_1" interval="0.64" since="#TLI_0"/>
  <when xml:id="TLI_2" interval="9.7" since="#TLI_0"/>
  <when xml:id="TLI_3" interval="10.216" since="#TLI_0"/>
  <when xml:id="TLI_4" interval="13.052" since="#TLI_0"/>
  <when xml:id="TLI_5" interval="16.28" since="#TLI_0"/>
...  
```

a mapping *interval* that will map for example "TLI_3" to "10.216". Now, when defining other elements, for example a word, we can refer to this defined *variable*: 

```xml
<mapping type="word" name="anchor">
  <token type="string" offset="false" realoffset="false" parent="false">
    <pre>
      <item type="name" />
      <item type="string" value=".time" />
    </pre>
    <post>
      <item type="variableFromAttribute" name="interval" value="synch" />
    </post>
  </token>
</mapping>
```

describing the mapping for resource elements like

```xml
<anchor synch="#TLI_3"/>
```

This will define the *postfix* value from the generated token as the value in the defined mapping *interval* for the value defined by the *sync* attribute of the matching *anchor* tag. In the example above, this will generate a token with *prefix* "anchor.time" and *postfix* "10.216".

Furthermore, if for an element in the mapping a *start* and *end* is defined, for example

```xml
<mapping type="groupAnnotation" name="span" start="from" end="to">
...
</mapping>
```

the start and end position of the elements referenced in the defined attributes is used for position and offset of the generated tokens. So, if the source contains

```xml
...
<w xml:id="w115">hier</w>
<w xml:id="w116">sehn</w>
<w xml:id="w117">wir</w>
...
```

and

```xml
...
<span from="#w116" to="#w116">sehen</span>
...
```

the tokens generated from the groupAnnotation mapping on the *span*-tag will have the position and offset from the *word*-tag with *id* "w116".
Matthijs Brouwer authored
100