A short description of the Morphologically Annotated Part of BulTreeBank (BulTreeBank-Morph)
This distribution represents only the morphological information encoded in BulTreeBank -
HPSG-based Treebank of Bulgarian. It contains about 214000 tokens. It was used for the training of the
TreeTagger for Bulgarian.
It contains sentences from Bulgarian Grammar Textbooks, Newspapers, Literature and other sources
of texts.
Full documentation (Style Book, Tagset description) of the Treebank can be found on:
http://www.bultreebank.org/TechRep.html .
Data Format
The morphological annotation is described in:
Tagset
The tagset is described in:
Acquiring the Data
If you are interested in using BulTreeBank-Morph, please, fill in the user
agreement form, print it, scan it and send it to Kiril Simov. If not possible to send it
electronically, please, send it by regular mail to:
Kiril Simov
BulTreeBank Project
Linguistic Modelling Laboratory, IPP,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria
After receiving the filled form we will send to you the data.
Acknowledgements
The BulTreeBank is developed under the BulTreeBank Project, which
is a joint project of the Linguistic Modelling Laboratory (LML), Institute for Parallel Processing, Bulgarian Academy of Sciences
and Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universitä t, Tübingen, Germany. The project is funded by the
Volkswagen Stiftung, Federal Republic of Germany under the Programme
"Cooperation with Natural and Engineering Scientists in Central and Eastern Europe".
We would like to thank our colleagues from Tübingen!
|