#Apache Solr #set($metadata = $xml.read("http://central.maven.org/maven2/org/textexploration/mtas/mtas/maven-metadata.xml")) #set($version = $metadata.versioning.latest) #set($fullversion = $version.getText()) #set($versionnumbers = $StringUtils.split($fullversion,".")) #foreach( $item in $loop.watch($versionnumbers) ) #set($itemindex = $loop.getIndex()) #if ($itemindex == 0) #set($majorversion = $item) #elseif ($itemindex == 1) #set($minorversion = $item) #elseif ($itemindex == 2) #set($incrementalversion = $item) #elseif ($itemindex == 3) #set($mtasversion = $item) #end #end Mtas can be used as plugin for Apache Solr **Prerequisites** - Installed [Apache Solr](https://lucene.apache.org/solr/) - Currently supported and advised version is ${majorversion}.${minorversion}.${incrementalversion} Start with a new Solr core. **Libraries** Add the `mtas-${fullversion}.jar` to the `lib` directory of the new Solr core. A prebuilt `mtas-${fullversion}.jar` is available from the [download](download.html) page. Furthermore, add the [Apache Commons Mathematics Library](http://commons.apache.org/proper/commons-math/) to the `lib` directory of the new Solr core. **Solrconfig.xml** Some changes have to be made within the `solrconfig.xml` file, elements have to be added to the `<config/>` or existing elements have te be adjusted: Define a new **mtas searchComponent**: ```console <searchComponent name="mtas" class="mtas.solr.handler.component.MtasSolrSearchComponent"> <str name="collectionCacheDirectory">${solr.core.instanceDir}/cache/collection</str> <long name="collectionLifetime">86400</long> <int name="collectionMaximumNumber">1000</int> <int name="collectionMaximumOverflow">10</int> </searchComponent> ``` Add this component to the select requestHandler by inserting the following within the `<requestHandler/>` with name `"/select"`: ``` console <arr name="last-components"> <str>mtas</str> </arr> ``` Define a new **mtas_cql queryParser** and **mtas_join queryParser**: ```console <queryParser name="mtas_cql" class="mtas.solr.search.MtasSolrCQLQParserPlugin"/> <queryParser name="mtas_join" class="mtas.solr.search.MtasSolrJoinQParserPlugin"/> ``` Define a new **mtas requestHandler**: ```console <requestHandler name="/mtas" class="mtas.solr.handler.MtasRequestHandler" /> ``` Define a new updateRequestProcessorChain: ```console <updateRequestProcessorChain name="mtasUpdateProcessor"> <processor class="mtas.solr.update.processor.MtasUpdateRequestProcessorFactory" /> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> ``` Define or adjust the update requestHandler with this updateRequestProcessorChain: ```console <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">mtasUpdateProcessor</str> </lst> </requestHandler> ``` Finally, in this instruction we will use a classic schema instead of the managed-schema. So the configuration must contain: ```console <schemaFactory class="ClassicIndexSchemaFactory"/> ``` **Schema.xml** We extend a (classic) schema with one (or multiple) fields that may contain annotated text, e.g. ```console <field name="text" type="mtas" required="false" multiValued="false" indexed="true" stored="true" /> ``` We define the referred Mtas fieldType by ```console <fieldType name="mtas" class="solr.TextField" postingsFormat="MtasCodec"> <analyzer type="index"> <charFilter class="mtas.analysis.util.MtasCharFilterFactory" type="url" prefix="http://localhost/demo/" postfix="" /> <tokenizer class="mtas.analysis.util.MtasTokenizerFactory" configFile="folia.xml" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="mtas.analysis.util.MtasPrefixTokenFilterFactory" prefix="t" /> </analyzer> </fieldType> ``` The charFilter with class *mtas.analysis.util.MtasCharFilterFactory* in the index analyzer contains an obligatory attribute `type` and two optional attributes `prefix` and `postfix`. The *type* can be *url* or *file*, referring to either an external url or a file on the filesystem. On indexing, the optional *prefix* and *postfix* attributes will be added to the provided value, resulting in a full url or location of a file. The tokenizer with class *mtas.analysis.util.MtasTokenizerFactory* in the index analyzer has an attribute `configFile` containing the name of the required tokenizer configuration. The filter in the query analyzer contains an obligatory attribute `prefix` defining the assumed prefix when this field will be queried directly within Solr. See [configuration](indexing_configuration.html) for more information about the definition of a tokenizer configuration. **Multiple tokenize configurations** If multiple tokenizer configurations are required, the Mtas fieldType has to be defined slightly different: ```console <fieldType name="mtas_config" class="solr.TextField" postingsFormat="MtasCodec"> <analyzer type="index"> <charFilter class="mtas.analysis.util.MtasCharFilterFactory" config="mtas.xml" /> <tokenizer class="mtas.analysis.util.MtasTokenizerFactory" config="mtas.xml" /> </analyzer> </fieldType> <fieldType name="mtas" class="mtas.solr.schema.MtasPreAnalyzedField" followIndexAnalyzer="mtas_config" defaultConfiguration="default" configurationFromField="type" setNumberOfTokens="numberOfTokens" setNumberOfPositions="numberOfPositions" setSize="size" setError="error" setPrefix="prefixes" setPrefixNumbers="prefix_" postingsFormat="MtasCodec"> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <filter class="mtas.analysis.util.MtasPrefixTokenFilterFactory" prefix="t" /> </analyzer> </fieldType> ``` An additional fieldType (here named mtas_config) is defined, containing only an index analyzer. Both the charFilter and tokenizer within this analyzer have an attribute `config` referring to a Mtas configuration file. Depending on the required tokenizer configuration, for the charFilter this file will define *type*, *prefix* and *postfix* and for the tokenizer this file will define the *configFile*. An example of a Mtas configuration file is added below. The Mtas fieldType is defined with class *mtas.solr.schema.MtasPreAnalyzedField*, an obligatory attribute `followIndexAnalyzer` referring to the additional fieldType we defined before. The optional attribute `defaultConfiguration` contains the name of the default configuration to be used, and the obligatory attribute `configurationFromField` contains the name of the field defining the required configuration. The optional attributes `setNumberOfTokens`, `setNumberOfPositions`, `setSize`, `setPrefix` and `setError` define fields that may be filled with respectively number of tokens, number of positions, filesize, prefixes and possible errors. The optional attribute `setPrefixNumbers` defines a prefix to be used for adding for each occurring prefix a field with a name constructed from this `setPrefixNumbers` value followed by the prefix, and containing the number of occurrences of this prefix. **Example of a Mtas configuration file** ```console <?xml version="1.0" encoding="UTF-8" ?> <mtas> <configurations type="mtas.analysis.util.MtasTokenizerFactory"> <configuration name="folia" file="folia.xml" /> <configuration name="tei" file="tei.xml" /> </configurations> <configurations type="mtas.analysis.util.MtasCharFilterFactory"> <configuration name="folia" type="url" prefix="http://www.mycompany.com/archive/" postfix=".xml" /> <configuration name="tei" type="file" prefix="/storage/tei/" postfix="" /> </configurations> </mtas> ```