Login
Datasets

License Agreement

Treebank Synopsis

[sample sentence]
  • # Sentences: 5635
  • # (Orthographic) Words: 56422
  • # (Syntactic) Tokens: 63066
  • # Single-headed Tokens: 61585
  • # Multi-headed Tokens: 1481
  • # Surface Dependencies (incl. DERIV): 63066
  • # Surface Dependencies (excl. DERIV): 56424
  • # Deep Dependencies: 1746
POS Tag #
Adj 7582 (11,70%)
Adverb 3096 (4,78%)
Conj 2260 (3,49%)
Det 1006 (1,55%)
Dup 23 (0,04%)
Interj 99 (0,15%)
Noun 24539 (37,86%)
Postp 1726 (2,66%)
Pron 2320 (3,58%)
Punc 10425 (16,08%)
Verb 11736 (18,11%)

Feature #
A1pl 565 (0,51%)
A1sg 1733 (1,56%)
A2pl 371 (0,33%)
A2sg 639 (0,58%)
A3pl 3888 (3,51%)
A3sg 27029 (24,39%)
Abl 1095 (0,99%)
Able 500 (0,45%)
Abr 1 (0,00%)
Acc 2678 (2,42%)
Aor 1117 (1,01%)
Caus 679 (0,61%)
Cond 155 (0,14%)
Cop 372 (0,34%)
Dat 2833 (2,56%)
Desr 134 (0,12%)
Dist 12 (0,01%)
Equ 38 (0,03%)
Fitfor 9 (0,01%)
Fut 315 (0,28%)
Gen 2884 (2,60%)
Hastily 12 (0,01%)
Imp 410 (0,37%)
Ins 810 (0,73%)
Loc 2218 (2,00%)
Narr 760 (0,69%)
Neces 58 (0,05%)
Neg 1054 (0,95%)
Nom 14303 (12,91%)
Noun 42 (0,04%)
Opt 148 (0,13%)
Ord 65 (0,06%)
P1pl 261 (0,24%)
P1sg 824 (0,74%)
P2pl 118 (0,11%)
P2sg 242 (0,22%)
P3pl 652 (0,59%)
P3sg 6345 (5,73%)
Pass 1198 (1,08%)
Past 3209 (2,90%)
Pnon 19059 (17,20%)
Pos 9920 (8,95%)
Pres 682 (0,62%)
Prog1 1329 (1,20%)
Prog2 35 (0,03%)
Prop 1 (0,00%)
Stay 3 (0,00%)

DepRel #
APPOSITION 79 (0,12%)
ARGUMENT 1782 (2,75%)
CONJUNCTION 1325 (2,04%)
COORDINATION 3062 (4,72%)
DERIV 6642 (10,25%)
DETERMINER 2159 (3,33%)
INTENSIFIER 1033 (1,59%)
MODIFIER 15058 (23,23%)
MWE:COMP 582 (0,90%)
MWE:CONJ 101 (0,16%)
MWE:DUP 228 (0,35%)
MWE:ENAMEX:LOC 83 (0,13%)
MWE:ENAMEX:ORG 255 (0,39%)
MWE:ENAMEX:PERS 313 (0,48%)
MWE:FORMEX 27 (0,04%)
MWE:IDEX 882 (1,36%)
MWE:LVC 541 (0,83%)
MWE:NCOMP 370 (0,57%)
MWE:NUMEX 61 (0,09%)
MWE:NUMEX:DATE 1 (0,00%)
MWE:NUMEX:MONEY 115 (0,18%)
MWE:NUMEX:PCT 46 (0,07%)
MWE:PROVERB 8 (0,01%)
MWE:SIMEX 15 (0,02%)
MWE:TIMEX:DATE 73 (0,11%)
MWE:TIMEX:TIME 17 (0,03%)
OBJECT 4530 (6,99%)
POSSESSOR 4081 (6,30%)
PREDICATE 5743 (8,86%)
PUNCTUATION 10374 (16,01%)
RELATIVIZER 128 (0,20%)
SUBJECT 4889 (7,54%)
VOCATIVE 209 (0,32%)

Terms of Use

  • If you would use this treebank in any form of publication, please make sure you cite the following papers:
    • Umut Sulubacak, Tuğba Pamay and Gülşen Eryiğit. IMST: A Revisited Turkish Dependency Treebank. In Proceedings of the 1st International Conference on Turkic Computational Linguistics (TurCLing) at CICLing, Konya, Turkey, 2016.
    • Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür. Building a Turkish Treebank. In Building and Exploiting Syntactically-Annotated Corpora. Anne Abeille (ed.), Kluwer Academic Publishers, 2003.
    • Nart B. Atalay, Kemal Oflazer, Bilge Say. The Annotation Process in the Turkish Treebank. In Proceedings of the EACL Workshop on Linguistically Interpreted Corpora (LINC), Budapest, Hungary, 2003.
  • The IMST is licensed under Creative Commons (BY-NC-SA 4.0). A summary for the terms of the license is given below (see here for more information). Under the terms of the license,

    You are free to:

    • Share — copy and redistribute the material in any medium or format
    • Adapt — remix, transform, and build upon the material
    • The licensor cannot revoke these freedoms as long as you follow the license terms.

    Under the following terms:

    • AttributionYou must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
    • NonCommercial — You may not use the material for commercial purposes.
    • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
    • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
  • Upon accepting the terms, you must manually fill in and sign the provided requisition form, and then scan and send it by e-mail to the address specified in the form. You will receive an e-mail response.
I understand and agree to the Terms of Use.


License Agreement

Treebank Synopsis

[sample sentence]
  • # Sentences: 5009
  • # (Orthographic) Words: 43191
  • # (Syntactic) Tokens: 47226
  • # Single-headed Tokens: 46080
  • # Multi-headed Tokens: 1144
  • # Surface Dependencies (incl. DERIV): 47226
  • # Surface Dependencies (excl. DERIV): 43192
  • # Deep Dependencies: 1271
POS Tag #
Adj 5148 (10,62%)
Adverb 3572 (7,37%)
Conj 1709 (3,52%)
Det 974 (2,01%)
Dup 11 (0,02%)
Interj 309 (0,64%)
Noun 18556 (38,26%)
Num 2 (0,00%)
Postp 1519 (3,13%)
Pron 1905 (3,93%)
Punc 5983 (12,34%)
Verb 8809 (18,16%)

Feature #
A1pl 556 (0,66%)
A1sg 1882 (2,23%)
A2pl 546 (0,65%)
A2sg 835 (0,99%)
A3pl 2595 (3,08%)
A3sg 20759 (24,63%)
A3spl 2 (0,00%)
Abl 778 (0,92%)
Able 42 (0,05%)
Acc 2003 (2,38%)
Aor 1081 (1,28%)
Card 1 (0,00%)
Caus 9 (0,01%)
Cond 223 (0,26%)
Cop 406 (0,48%)
Dat 1907 (2,26%)
Desr 154 (0,18%)
Equ 49 (0,06%)
Fitfor 19 (0,02%)
Fut 326 (0,39%)
Gen 1223 (1,45%)
Imp 683 (0,81%)
Ins 342 (0,41%)
Loc 1459 (1,73%)
Mention 1 (0,00%)
Narr 492 (0,58%)
Neces 61 (0,07%)
Neg 1009 (1,20%)
Nom 12697 (15,07%)
Opt 162 (0,19%)
Ord 39 (0,05%)
P1pl 194 (0,23%)
P1sg 823 (0,98%)
P2pl 318 (0,38%)
P2sg 417 (0,49%)
P3pl 191 (0,23%)
P3sg 3259 (3,87%)
Pass 36 (0,04%)
Past 1707 (2,03%)
Pnom 3 (0,00%)
Pnon 15613 (18,53%)
Pos 7131 (8,46%)
Pres 909 (1,08%)
Prog1 1296 (1,54%)
Prog2 37 (0,04%)
Prop 1 (0,00%)

DepRel #
APPOSITION 16 (0,03%)
ARGUMENT 1555 (3,21%)
CONJUNCTION 921 (1,90%)
COORDINATION 2751 (5,67%)
DERIV 4034 (8,32%)
DETERMINER 1846 (3,81%)
INTENSIFIER 799 (1,65%)
MODIFIER 11808 (24,35%)
MWE:COMP 579 (1,19%)
MWE:CONJ 76 (0,16%)
MWE:DUP 139 (0,29%)
MWE:ENAMEX:LOC 22 (0,05%)
MWE:ENAMEX:ORG 122 (0,25%)
MWE:ENAMEX:PERS 136 (0,28%)
MWE:FORMEX 278 (0,57%)
MWE:IDEX 669 (1,38%)
MWE:LVC 660 (1,36%)
MWE:NCOMP 251 (0,52%)
MWE:NUMEX 36 (0,07%)
MWE:NUMEX:MONEY 50 (0,10%)
MWE:NUMEX:PCT 5 (0,01%)
MWE:PROVERB 23 (0,05%)
MWE:SIMEX 8 (0,02%)
MWE:TIMEX:DATE 41 (0,08%)
MWE:TIMEX:TIME 18 (0,04%)
OBJECT 3002 (6,19%)
POSSESSOR 2213 (4,56%)
PREDICATE 5027 (10,37%)
PUNCTUATION 5959 (12,29%)
RELATIVIZER 99 (0,20%)
SUBJECT 4124 (8,50%)
VOCATIVE 1230 (2,54%)

Terms of Use

  • If you would use this treebank in any form of publication, please make sure you cite the following paper:
    • Tuğba Pamay, Umut Sulubacak, Dilara Torunoğlu-Selamet, Gülşen Eryiğit. The Annotation Process of the ITU Web Treebank. In Proceedings of the 9th Linguistic Annotation Workshop (LAW) at NAACL, Denver, CO, USA, 2015.
  • The ITU Web Treebank is licensed under Creative Commons (BY-NC-SA 4.0). A summary for the terms of the license is given below (see here for more information). Under the terms of the license,

    You are free to:

    • Share — copy and redistribute the material in any medium or format
    • Adapt — remix, transform, and build upon the material
    • The licensor cannot revoke these freedoms as long as you follow the license terms.

    Under the following terms:

    • AttributionYou must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
    • NonCommercial — You may not use the material for commercial purposes.
    • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
    • No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
I understand and agree to the Terms of Use.


License Agreement

Treebank Synopsis

[sample sentence]
  • # Sentences: 5635
  • # (Orthographic) Words: 58085
  • # (Syntactic) Tokens: 58146
  • # Single-headed Tokens: 58146
  • # Multi-headed Tokens: 0
  • # Surface Dependencies: 58146
  • # Deep Dependencies: 0
  • # Projective Dependencies: 56472 (97,12%)
  • # Non-Projective Dependencies: 1674 (2,88%)
POS Tag #
ADJ 5720 (9,84%)
ADP 2292 (3,94%)
ADV 2179 (3,75%)
AUX 979 (1,68%)
CCONJ 2252 (3,87%)
DET 1003 (1,72%)
INTJ 99 (0,17%)
NOUN 15738 (27,07%)
NUM 2052 (3,53%)
PRON 2148 (3,69%)
PROPN 2178 (3,75%)
PUNCT 10423 (17,93%)
VERB 11060 (19,02%)
X 23 (0,04%)

Feature #
Abbr=Yes 140 (0,09%)
Aspect=DurPerf 3 (0,00%)
Aspect=Imp 1176 (0,73%)
Aspect=Perf 9488 (5,86%)
Aspect=Prog 1360 (0,84%)
Aspect=ProgRapid 1 (0,00%)
Aspect=Rapid 11 (0,01%)
Case=Abl 1072 (0,66%)
Case=Acc 2571 (1,59%)
Case=Dat 2766 (1,71%)
Case=Equ 38 (0,02%)
Case=Gen 2333 (1,44%)
Case=Ins 777 (0,48%)
Case=Loc 2149 (1,33%)
Case=Nom 13509 (8,34%)
Echo=Rdp 23 (0,01%)
Evident=Nfh 637 (0,39%)
Mood=AbilCnd 5 (0,00%)
Mood=AbilDes 4 (0,00%)
Mood=AbilGen 1 (0,00%)
Mood=AbilGenNec 1 (0,00%)
Mood=AbilImp 1 (0,00%)
Mood=AbilNec 1 (0,00%)
Mood=Cnd 145 (0,09%)
Mood=CndPot 5 (0,00%)
Mood=Des 125 (0,08%)
Mood=DesPot 4 (0,00%)
Mood=Gen 359 (0,22%)
Mood=GenNec 4 (0,00%)
Mood=GenNecPot 1 (0,00%)
Mood=GenPot 1 (0,00%)
Mood=Imp 406 (0,25%)
Mood=ImpPot 1 (0,00%)
Mood=Ind 10299 (6,36%)
Mood=Nec 51 (0,03%)
Mood=NecPot 1 (0,00%)
Mood=Opt 147 (0,09%)
Mood=Pot 491 (0,30%)
Mood=Prs 2 (0,00%)
Negative=Neg 126 (0,08%)
NumType=Card 2007 (1,24%)
NumType=Dist 12 (0,01%)
NumType=Ord 33 (0,02%)
Number=Plur 4274 (2,64%)
Number=Sing 26094 (16,12%)
Number[psor]=Plur 986 (0,61%)
Number[psor]=Sing 7099 (4,38%)
Person=1 2246 (1,39%)
Person=2 982 (0,61%)
Person=3 27140 (16,76%)
Person[psor]=1 1052 (0,65%)
Person[psor]=2 341 (0,21%)
Person[psor]=3 6692 (4,13%)
Polarity=Neg 1120 (0,69%)
Polarity=Pos 9814 (6,06%)
Polite=Form 35 (0,02%)
Polite=Infm 1326 (0,82%)
PronType=Dem 359 (0,22%)
PronType=Ind 158 (0,10%)
PronType=Prs 1189 (0,73%)
Reflex=Yes 178 (0,11%)
Tense=Aor 1017 (0,63%)
Tense=AorPast 162 (0,10%)
Tense=Fut 592 (0,37%)
Tense=FutPast 37 (0,02%)
Tense=Past 4447 (2,75%)
Tense=Pqp 267 (0,16%)
Tense=Pres 5519 (3,41%)
VerbForm=Conv 794 (0,49%)
VerbForm=Part 2562 (1,58%)
VerbForm=Vnoun 1417 (0,88%)
Voice=Cau 537 (0,33%)
Voice=CauPass 136 (0,08%)
Voice=Pass 1061 (0,66%)

DepRel #
acl 1647 (2,83%)
advmod 1882 (3,24%)
advmod:emph 973 (1,67%)
amod 3353 (5,77%)
appos 40 (0,07%)
aux:q 209 (0,36%)
case 2248 (3,87%)
cc 868 (1,49%)
ccomp 36 (0,06%)
compound 1968 (3,38%)
compound:lvc 532 (0,91%)
compound:redup 215 (0,37%)
conj 3695 (6,35%)
cop 813 (1,40%)
csubj 7 (0,01%)
det 1972 (3,39%)
discourse 154 (0,26%)
fixed 96 (0,17%)
flat 970 (1,67%)
mark 78 (0,13%)
nmod 3389 (5,83%)
nmod:poss 3618 (6,22%)
nsubj 3754 (6,46%)
nummod 580 (1,00%)
obj 4309 (7,41%)
obl 4870 (8,38%)
parataxis 10 (0,02%)
punct 10225 (17,59%)
root 5635 (9,69%)

Availability

The IMST-UD Treebank is available under the official LINDAT repository for the Universal Dependencies initiative.


License Agreement

Treebank Synopsis

[sample sentence]
  • # Sentences: 5009
  • # (Orthographic) Words: 44463
  • # (Syntactic) Tokens: 44545
  • # Single-headed Tokens: 44545
  • # Multi-headed Tokens: 0
  • # Surface Dependencies: 44545
  • # Deep Dependencies: 0
  • # Projective Dependencies: 43855 (98,45%)
  • # Non-Projective Dependencies: 690 (1,55%)
POS Tag #
ADJ 4495 (10,09%)
ADP 1682 (3,78%)
ADV 2940 (6,60%)
AUX 1003 (2,25%)
CCONJ 1705 (3,83%)
DET 967 (2,17%)
INTJ 309 (0,69%)
NOUN 12129 (27,23%)
NUM 1403 (3,15%)
PRON 1773 (3,98%)
PROPN 1530 (3,43%)
PUNCT 5776 (12,97%)
SYM 536 (1,20%)
VERB 8286 (18,60%)
X 11 (0,02%)

Feature #
Abbr=Yes 238 (0,19%)
Aspect=Imp 1148 (0,93%)
Aspect=Perf 6808 (5,54%)
Aspect=Prog 1333 (1,08%)
Case=Abl 745 (0,61%)
Case=Acc 1881 (1,53%)
Case=Dat 1849 (1,50%)
Case=Equ 48 (0,04%)
Case=Gen 986 (0,80%)
Case=Ins 331 (0,27%)
Case=Loc 1404 (1,14%)
Case=Nom 12052 (9,80%)
Echo=Rdp 11 (0,01%)
Evident=Nfh 502 (0,41%)
Mood=AbilCnd 1 (0,00%)
Mood=AbilGen 3 (0,00%)
Mood=AbilImp 1 (0,00%)
Mood=Cnd 220 (0,18%)
Mood=CndPot 1 (0,00%)
Mood=Des 153 (0,12%)
Mood=Gen 376 (0,31%)
Mood=GenNec 13 (0,01%)
Mood=GenPot 3 (0,00%)
Mood=Imp 670 (0,55%)
Mood=ImpPot 1 (0,00%)
Mood=Ind 7597 (6,18%)
Mood=Nec 48 (0,04%)
Mood=Opt 161 (0,13%)
Mood=Pot 39 (0,03%)
Mood=Prs 10 (0,01%)
Negative=Neg 152 (0,12%)
NumType=Card 1362 (1,11%)
NumType=Ord 39 (0,03%)
Number=Plur 3369 (2,74%)
Number=Sing 21543 (17,53%)
Number[psor]=Plur 676 (0,55%)
Number[psor]=Sing 4304 (3,50%)
Person=1 2394 (1,95%)
Person=2 1350 (1,10%)
Person=3 21168 (17,22%)
Person[psor]=1 971 (0,79%)
Person[psor]=2 715 (0,58%)
Person[psor]=3 3294 (2,68%)
Polarity=Neg 1075 (0,87%)
Polarity=Pos 7057 (5,74%)
Polite=Form 37 (0,03%)
Polite=Infm 1297 (1,06%)
PronType=Dem 361 (0,29%)
PronType=Ind 189 (0,15%)
PronType=Prs 860 (0,70%)
Reflex=Yes 108 (0,09%)
Tense=Aor 1091 (0,89%)
Tense=AorPast 57 (0,05%)
Tense=Fut 442 (0,36%)
Tense=FutPast 19 (0,02%)
Tense=Past 2705 (2,20%)
Tense=Pqp 61 (0,05%)
Tense=Pres 4923 (4,01%)
VerbForm=Conv 532 (0,43%)
VerbForm=Part 1442 (1,17%)
VerbForm=Vnoun 651 (0,53%)
Voice=Cau 8 (0,01%)
Voice=CauPass 1 (0,00%)
Voice=Pass 35 (0,03%)

DepRel #
acl 990 (2,22%)
advmod 2784 (6,25%)
advmod:emph 782 (1,76%)
amod 2471 (5,55%)
appos 7 (0,02%)
aux:q 359 (0,81%)
case 1666 (3,74%)
cc 651 (1,46%)
ccomp 15 (0,03%)
compound 1792 (4,02%)
compound:lvc 642 (1,44%)
compound:redup 104 (0,23%)
conj 3908 (8,77%)
cop 783 (1,76%)
csubj 6 (0,01%)
det 1449 (3,25%)
discourse 327 (0,73%)
fixed 93 (0,21%)
flat 431 (0,97%)
mark 74 (0,17%)
nmod 2262 (5,08%)
nmod:poss 2025 (4,55%)
nsubj 3324 (7,46%)
nummod 492 (1,10%)
obj 2804 (6,29%)
obl 3510 (7,88%)
punct 5785 (12,99%)
root 5009 (11,24%)

Availability

The IWT-UD Treebank is not available for distribution at this time.

Report a bug