indexing_formats.md
1.87 KB
#Formats
To configure the mapping from resources to the index structure, several parsers are available for different formats:
- MtasFoliaParser : mapping FoLiA resources
- MtasTEIParser: mapping ISO-TEI resources
- MtasChatParser: mapping CHAT transcription format resources converted to XML
- MtasSketchParser: mapping Sketch Engine resources
- MtasCRMParser: mapping resources with format Corpus Van Reenen-Mulder/Adelheid
For XML-based formats, these parsers often just slightly extend the abstract MtasXMLParser by defining the correct namespaces and root tags.
The configuration file defining the mapping contains general settings and more specific settings defining and configuring the parser.
The index part may contain general default settings to be applied in the mapping, the content of the parser part is more specific for the defined Mtas parser.
<?xml version="1.0" encoding="UTF-8" ?>
<mtas>
<!-- START MTAS INDEX CONFIGURATION -->
<index>
<!-- START GENERAL SETTINGS MTAS INDEX PROCESS -->
<payload index="false" />
<offset index="false" />
<realoffset index="false" />
<parent index="true" />
<!-- END GENERAL SETTINGS MTAS INDEX PROCESS -->
</index>
<!-- END MTAS INDEX CONFIGURATION -->
<!-- START CONFIGURATION MTAS PARSER -->
<parser name="...">
...
<!-- START MAPPINGS -->
<mappings>
...
</mapping>
<!-- END MAPPINGS --->
...
</parser>
<!-- END CONFIGURATION MTAS PARSER -->
</mtas>