Commit 9ec8175f8912235f32b58074f30c0d0682b52a59

Authored by Szymon Rutkowski
1 parent a0752add

uaktualnienie readme do otagowanej listy frekwencyjnej

resources/NKJP1M/NKJP-tagged-frequency-tagset.txt
... ... @@ -39,9 +39,10 @@ NON-SGJP - all the others.
39 39 Possible values are:
40 40 NCH - not checked (automatic label for all entries which are NOT NON-SGJP), may
41 41 be actually any of the others.
  42 +ACRO - acronyms.
42 43 SYMB - a number, abbreviation, other symbol (eg. signs of articles in legal
43 44 acts).
44   -COMP - a compound form, with the second part after a hyphen.
  45 +COMPD - a compound form, with the second part after a hyphen, or apostrophe.
45 46 PN - proper name. This includes the following cases, spelled in lowercase acco-
46 47 rding to the standard ortography: 1) names of inhabitants and adjectives deri-
47 48 ved from the names of places, nations etc. 2) words that are part of encyclo-
... ... @@ -61,6 +62,8 @@ CORR - correctly spelled word.
61 62 ERR - incorrectly spelled word (including forms considered non-standard Polish
62 63 *but not* dialectal).
63 64 CERR - common error, often not perceived as such in less official contexts.
  65 +PHON - word spelled incorrectly, but in a way clearly intended to reflect its
  66 +pronunciation.
64 67 TAGD - word spelled correctly, but there's a disagreement between NKJP1M and
65 68 SGJP in its part of speech classification (of type "chory" - adj or subst?,
66 69 "cierpiący" - adj or pact?).
... ...