installation_solr.md.vm
7.11 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
#Apache Solr
#set($metadata = $xml.read("http://repo1.maven.org/maven2/nl/knaw/meertens/mtas/mtas/maven-metadata.xml"))
#set($version = $metadata.versioning.latest)
#set($fullversion = $version.getText())
#set($versionnumbers = $StringUtils.split($fullversion,"."))
#foreach( $item in $loop.watch($versionnumbers) )
#set($itemindex = $loop.getIndex())
#if ($itemindex == 0)
#set($majorversion = $item)
#elseif ($itemindex == 1)
#set($minorversion = $item)
#elseif ($itemindex == 2)
#set($incrementalversion = $item)
#elseif ($itemindex == 3)
#set($mtasversion = $item)
#end
#end
Mtas can be used as plugin for Apache Solr
**Prerequisites**
- Installed [Apache Solr](https://lucene.apache.org/solr/)
- Currently supported and advised version is ${majorversion}.${minorversion}.${incrementalversion}
Start with a new Solr core.
**Libraries**
Add the `mtas-${fullversion}.jar` to the `lib` directory of the new Solr core.
A prebuilt `mtas-${fullversion}.jar` is available from the [download](download.html) page.
Furthermore, add the [Apache Commons Mathematics Library](http://commons.apache.org/proper/commons-math/) to the `lib` directory of the new Solr core.
**Solrconfig.xml**
Some changes have to be made within the `solrconfig.xml` file, elements have to be added to the `<config/>` or existing elements have te be adjusted:
Define a new **mtas searchComponent**:
```console
<searchComponent name="mtas" class="mtas.solr.handler.component.MtasSolrSearchComponent"/>
```
Add this component to the select requestHandler by inserting the following within the
`<requestHandler/>` with name `"/select"`:
``` console
<arr name="last-components">
<str>mtas</str>
</arr>
```
Define a new **mtas_cql queryParser** and **mtas_join queryParser**:
```console
<queryParser name="mtas_cql" class="mtas.solr.search.MtasSolrCQLQParserPlugin"/>
<queryParser name="mtas_join" class="mtas.solr.search.MtasSolrJoinQParserPlugin"/>
```
Define a new **mtas requestHandler**:
```console
<requestHandler name="/mtas" class="mtas.solr.handler.MtasRequestHandler" />
```
Define a new updateRequestProcessorChain:
```console
<updateRequestProcessorChain name="mtasUpdateProcessor">
<processor class="mtas.solr.update.processor.MtasUpdateRequestProcessorFactory" />
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
```
Define or adjust the update requestHandler with this updateRequestProcessorChain:
```console
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">mtasUpdateProcessor</str>
</lst>
</requestHandler>
```
Finally, in this instruction we will use a classic schema instead of the
managed-schema. So the configuration must contain:
```console
<schemaFactory class="ClassicIndexSchemaFactory"/>
```
**Schema.xml**
We extend a (classic) schema with one (or multiple) fields that may contain
annotated text, e.g.
```console
<field name="text" type="mtas" required="false" multiValued="false" indexed="true" stored="true" />
```
We define the referred Mtas fieldType by
```console
<fieldType name="mtas" class="solr.TextField" postingsFormat="MtasCodec">
<analyzer type="index">
<charFilter class="mtas.analysis.util.MtasCharFilterFactory" type="url" prefix="http://localhost/demo/" postfix="" />
<tokenizer class="mtas.analysis.util.MtasTokenizerFactory" configFile="folia.xml" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="mtas.analysis.util.MtasPrefixTokenFilterFactory" prefix="t" />
</analyzer>
</fieldType>
```
The charFilter with class *mtas.analysis.util.MtasCharFilterFactory* in the
index analyzer contains an obligatory attribute `type` and two optional attributes
`prefix` and `postfix`. The *type* can be *url* or *file*, referring to either an external
url or a file on the filesystem. On indexing, the optional *prefix* and *postfix* attributes
will be added to the provided value, resulting in a full url or location of a file.
The tokenizer with class *mtas.analysis.util.MtasTokenizerFactory* in the index analyzer
has an attribute `configFile` containing the name of the required tokenizer configuration.
The filter in the query analyzer contains an obligatory attribute `prefix` defining the
assumed prefix when this field will be queried directly within Solr.
See [configuration](indexing_configuration.html) for more information about
the definition of a tokenizer configuration.
**Multiple tokenize configurations**
If multiple tokenizer configurations are required, the Mtas fieldType has to be
defined slightly different:
```console
<fieldType name="mtas_config" class="solr.TextField" postingsFormat="MtasCodec">
<analyzer type="index">
<charFilter class="mtas.analysis.util.MtasCharFilterFactory" config="mtas.xml" />
<tokenizer class="mtas.analysis.util.MtasTokenizerFactory" config="mtas.xml" />
</analyzer>
</fieldType>
<fieldType name="mtas" class="mtas.solr.schema.MtasPreAnalyzedField"
followIndexAnalyzer="mtas_config" defaultConfiguration="default"
configurationFromField="type" setNumberOfTokens="numberOfTokens"
setNumberOfPositions="numberOfPositions" setSize="size"
setError="error" setPrefix="prefixes" postingsFormat="MtasCodec">
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="mtas.analysis.util.MtasPrefixTokenFilterFactory" prefix="t" />
</analyzer>
</fieldType>
```
An additional fieldType (here named mtas_config) is defined, containing
only an index analyzer. Both the charFilter and tokenizer within this analyzer have
an attribute `config` referring to a Mtas configuration file. Depending on the required
tokenizer configuration, for the charFilter this file will define *type*, *prefix* and *postfix* and for the tokenizer this file
will define the *configFile*. An example of a Mtas configuration file is added below.
The Mtas fieldType is defined with class *mtas.solr.schema.MtasPreAnalyzedField*,
an obligatory attribute `followIndexAnalyzer` referring to the additional fieldType
we defined before. The optional attribute `defaultConfiguration` contains the name
of the default configuration to be used, and the obligatory attribute
`configurationFromField` contains the name of the field defining the required
configuration. The optional attributes `setNumberOfTokens`,
`setNumberOfPositions`, `setSize`, `setPrefix` and `setError` define fields that may be filled
with respectively number of tokens, number of positions, filesize, prefixes and possible errors.
**Example of a Mtas configuration file**
```console
<?xml version="1.0" encoding="UTF-8" ?>
<mtas>
<configurations type="mtas.analysis.util.MtasTokenizerFactory">
<configuration name="folia" file="folia.xml" />
<configuration name="tei" file="tei.xml" />
</configurations>
<configurations type="mtas.analysis.util.MtasCharFilterFactory">
<configuration name="folia" type="url" prefix="http://www.mycompany.com/archive/" postfix=".xml" />
<configuration name="tei" type="file" prefix="/storage/tei/" postfix="" />
</configurations>
</mtas>
```