Blame view

src/site/markdown/index.md 2.16 KB
Matthijs Brouwer authored
1
#Multi Tier Annotation Search
Matthijs Brouwer authored
2
Matthijs Brouwer authored
3
4
5
6
7
8
In recent years, multiple solutions have come available providing search on huge amounts of plain text and metadata. Scalable searchability on annotated text however still appears to be problematic. Using Mtas, we not only take advantage of the strength from Lucene and Solr, but extend queries with [CQL](search_cql.html) conditions on annotated text

> `[pos="LID"] [pos="ADJ"]? [lemma="amsterdam"]`
>
> `<entity="location/> within (<s/> containing [lemma="utrecht"])`
Matthijs Brouwer authored
9
Parsers for several [document formats](indexing_formats.html) are provided, each with extended possibilities for [configuration](indexing_configuration.html), and advanced query [features](features.html) like [statistics](search_component_stats.html), [termvectors](search_component_termvector.html) and [kwic](search_component_kwic.html) are available.
Matthijs Brouwer authored
10
Matthijs Brouwer authored
11
Source code and releases are available on [GitHub](https://github.com/textexploration/mtas/), see [installation instructions](installation.html) on how to get started.
Matthijs Brouwer authored
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

---

**Nederlab** 

One of the primary use cases for Mtas, the [Nederlab project](http://www.nederlab.nl/), currently<sup>1</sup> provides access, both in terms of metadata and 
annotated text, to over 15 million items for search and analysis as specified below. 

|                 | Total          | Mean    | Min   | Max        |
|-----------------|---------------:|--------:|------:|-----------:|
| Solr index size | 1,146 G        | 49.8 G  | 268 k | 163 G      |
| Solr documents  | 15,859,099     | 689,526 | 201   | 3,616,544  |

Collections are added and updated regularly by adding new cores, replacing cores and/or merging new cores with existing ones. Currently, the data is divided over 23 separate cores. For 14,663,457 of these documents, annotated text varying in size from 1 to over 3.5 million words is included:

|                 | Total          | Mean    | Min   | Max        |
|-----------------|---------------:|--------:|------:|-----------:|
| Words           | 9,584,448,067  | 654     | 1     | 3,537,883  |
| Annotations     | 36,486,292,912 | 2,488   | 4     | 23,589,831 |

---

<sup><a name="footnote">1</a></sup> <small>situation january 2017</small>