# Document Mtas can produce statistics on used terms for the individual listed documents. To get this information, in Solr requests, besides the parameter to enable the [Mtas query component](search_component.html), the following parameter should be provided. | Parameter | Value | Obligatory | |-----------------------|--------|-------------| | mtas.document | true | yes | Multiple document results can be produced within the same request. To distinguish them, a unique identifier has to be provided for each of the required document results. | Parameter | Value | Info | Obligatory | |-------------------------------------------------|--------------|--------------------------------|-------------| | mtas.document.\<identifier\>.key | \<string\> | key used in response | no | | mtas.document.\<identifier\>.field | \<string\> | Mtas field | yes | | mtas.document.\<identifier\>.prefix | \<string\> | prefix |yes | | mtas.document.\<identifier\>.number | \<double\> | create list with specified number of most frequent items | no | | mtas.document.\<identifier\>.type | \<string\> | required [type of statistics](search_stats.html) | no | | mtas.document.\<identifier\>.regexp | \<string\> | regular expression condition on term | no | | mtas.document.\<identifier\>.ignoreRegexp | \<string\> | regular expression condition for terms that have to be ignored | no | ## List A list can be provided, specifying the set of terms to consider when computing the result. | Parameter | Value | Info | Obligatory | |-------------------------------------------------|--------------|--------------------------------|-------------| | mtas.document.\<identifier\>.list | \<string\> | comma separated list of values | yes | | mtas.document.\<identifier\>.listRegexp | \<boolean\> | list of values are to be interpreted as regular expressions | no | | mtas.document.\<identifier\>.listExpand | \<boolean\> | expand the matches on values from list | no | | mtas.document.\<identifier\>.listExpandNumber | \<double\> | number of expansions of matches on values from list | no | ## Ignore list Also a ignore list can be provided, specifying the set of terms not to consider when computing the result. | Parameter | Value | Info | Obligatory | |-------------------------------------------------|--------------|--------------------------------|-------------| | mtas.document.\<identifier\>.ignoreList | \<string\> | comma separated list of values | yes | | mtas.document.\<identifier\>.ignoreListRegexp | \<boolean\> | list of values are to be interpreted as regular expressions | no | --- ## Examples 1. [Basic](#basic) : Statistics unique words for each document 2. [Regexp](#regexp) : Most frequent words containing only letters a-z and minimum length 5 3. [List](#list) : Statistics for a provided list of words 4. [Ignore](#ignore) : Statistics for a provided list of regular expressions, ignoring another list of regular expressions --- <a name="basic"></a> ### Basic **Example** Statistics for set of unique tokens with prefix *t* (words) for each listed document. **Request and response** `fq=%7B%21mtas_cql+field%3D%22text%22+query%3D%22%5B%5D%22+++%7D&q=%2A%3A%2A&mtas=true&mtas.document=true&mtas.document.0.field=text&mtas.document.0.prefix=t&mtas.document.0.key=words&mtas.document.0.type=all&fl=*&start=0&rows=2&wt=json&indent=true` ```json "mtas":{ "document":[{ "key":"words", "list":[{ "documentKey":"4115a95c-011c-11e4-b0ff-51bcbd7c379f", "sumsq":113964.0, "populationvariance":126.5639231447591, "max":166.0, "sum":3336.0, "kurtosis":92.19837080635624, "standarddeviation":11.257199352433314, "n":789, "quadraticmean":12.01836364230935, "min":1.0, "median":1.0, "variance":126.72453726042504, "mean":4.228136882129286, "geometricmean":1.9285975498109995, "sumoflogs":518.209740627951, "skewness":8.377350653392202}, { "documentKey":"4115aac4-011c-11e4-b0ff-51bcbd7c379f", "sumsq":25489.0, "populationvariance":35.695641666666134, "max":77.0, "sum":1563.0, "kurtosis":72.57030420433823, "standarddeviation":5.979568021426876, "n":600, "quadraticmean":6.517796151051877, "min":1.0, "median":1.0, "variance":35.75523372287092, "mean":2.6050000000000004, "geometricmean":1.5249529474773036, "sumoflogs":253.1781332820801, "skewness":7.70682353088895}]}]} ``` <a name="regexp"></a> ### Regexp **Example** Most frequent tokens containing only letters a-z and minimum length 5 with prefix *t* (words) for each listed document. **Regexp**<br/> `[a-z]{5,}` **Request and response** `fq=%7B%21mtas_cql+field%3D%22text%22+query%3D%22%5B%5D%22+++%7D&q=%2A%3A%2A&mtas=true&mtas.document=true&mtas.document.0.field=NLContent_mtas&mtas.document.0.prefix=t&mtas.document.0.key=list+of+words&mtas.document.0.type=n%2Csum%2Cmean&mtas.document.0.regexp=%5Ba-z%5D%7B5%2C%7D&mtas.document.0.number=5&fl=%2A&start=0&rows=2&wt=json&indent=true` ```json "mtas":{ "document":[{ "key":"list of words", "list":[{ "documentKey":"c0c4200c-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":471, "key":"zijne"}, { "sum":317, "key":"eenen"}, { "sum":304, "key":"zegde"}, { "sum":249, "key":"hebben"}, { "sum":229, "key":"welke"}], "mean":4.552402402402403, "sum":30319, "n":6660}, { "documentKey":"c0c453d8-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":348, "key":"heeft"}, { "sum":243, "key":"hebben"}, { "sum":199, "key":"prins"}, { "sum":173, "key":"vader"}, { "sum":161, "key":"komen"}], "mean":4.641632967456191, "sum":24104, "n":5193}]}]} ``` <a name="list"></a> ### List **Example** Statistics for a provided list of words for each listed document. **List**<br/> `koe,paard,schaap,geit,kip` **Request and response** `fq=%7B%21mtas_cql+field%3D%22text%22+query%3D%22%5Bt_lc%3D%5C%22koe%5C%22%7Ct_lc%3D%5C%22paard%5C%22%7Ct_lc%3D%5C%22schaap%5C%22%5D%22+++%7D&q=%2A%3A%2A&mtas=true&mtas.document=true&mtas.document.0.field=text&mtas.document.0.prefix=t_lc&mtas.document.0.key=list+of+words&mtas.document.0.type=n%2Csum%2Cmean&mtas.document.0.list=koe%2Cpaard%2Cschaap%2Cgeit%2Ckip&mtas.document.0.listRegexp=false&mtas.document.0.listExpand=false&mtas.document.0.number=100&fl=%2A&start=0&rows=2&wt=json&indent=true` ```json "mtas":{ "document":[{ "key":"list of words", "list":[{ "documentKey":"c0c46b7a-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":3, "key":"paard"}, { "sum":2, "key":"schaap"}], "mean":2.5, "sum":5, "n":2}, { "documentKey":"c0c453d8-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":31, "key":"paard"}, { "sum":1, "key":"kip"}], "mean":16.0, "sum":32, "n":2}]}]} ``` <a name="ignore"></a> ### Ignore **Example** Statistics for a provided list of regular expressions, ignoring another list of regular expressions for each listed document. **Regexp**<br/> `[a-z]{7,}` **Ignore**<br/> `[a-z]{10,}` **List**<br/> `een.*,.*heid` **Ignore list**<br/> `een.*heid,ee.*nheid` **Request and response** `fq=%7B%21mtas_cql+field%3D%22text%22+query%3D%22%5Bt_lc%3D%5C%22eenheid%5C%22%5D%22+++%7D&q=%2A%3A%2A&mtas=true&mtas.document=true&mtas.document.0.field=text&mtas.document.0.prefix=t_lc&mtas.document.0.key=advanced+list+of+words&mtas.document.0.type=n%2Csum%2Cmean&mtas.document.0.regexp=%5Ba-z%5D%7B7%2C%7D&mtas.document.0.list=een.%2A%2C.%2Aheid&mtas.document.0.listRegexp=true&mtas.document.0.listExpand=true&mtas.document.0.listExpandNumber=3&mtas.document.0.ignoreRegexp=%5Ba-z%5D%7B10%2C%7D&mtas.document.0.ignoreList=een.%2Aheid%2Cee.%2Anheid&mtas.document.0.ignoreListRegexp=true&mtas.document.0.number=10&fl=text_numberOfPositions%2CNLCore_NLIdentification_nederlabID%2CNLProfile_name%2CNLTitle_title&start=0&rows=2&wt=json&indent=true` ```json "mtas":{ "document":[{ "key":"advanced list of words", "list":[{ "documentKey":"c0c41486-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":166, "list":{ "droefheid":{ "sum":36}, "godheid":{ "sum":22}, "waarheid":{ "sum":22}}, "key":".*heid"}, { "sum":93, "list":{ "eenigen":{ "sum":46}, "eensklaps":{ "sum":32}, "eenigste":{ "sum":3}}, "key":"een.*"}], "mean":5.886363636363637, "sum":259, "n":44}, { "documentKey":"c0c453d8-1eee-11e5-b891-f48ce0be173a", "list":[{ "sum":36, "list":{ "afscheid":{ "sum":12}, "hoogheid":{ "sum":4}, "bezigheid":{ "sum":3}}, "key":".*heid"}, { "sum":24, "list":{ "eenvoudig":{ "sum":15}, "eenzame":{ "sum":3}, "eenmaal":{ "sum":2}}, "key":"een.*"}], "mean":3.1578947368421053, "sum":60, "n":19}]}]} ``` --- **Lucene** To get statistics on used terms for the listed documents [directly in Lucene](installation_lucene.html), *ComponentDocument* together with the provided *collect* method can be used.