Development of a Persian Syntactic Dependency Treebank
Mohammad Sadegh Rasooli, Manouchehr Kouhestani and Amirsaeid Moloodi
This paper describes the annotation process and linguistic properties of the
Persian syntactic dependency treebank. The treebank consists of approximately
30,000 sentences annotated with syntactic roles in addition to morpho-syntactic
features. One of the unique features of this treebank is that there are almost
4800 distinct verb lemmas in its sentences making it a valuable resource for
educational goals. The treebank is constructed with a bootstrapping approach by
means of available tagging and parsing tools and manually correcting the
annotations. The data is splitted into standard train, development and test set
in the CoNLL dependency format and is freely available to researchers.
Back to Papers Accepted