Available Resources
Text Acknowledgements
Related links


CLaRK System

CLaRK System Online Manual

Bulgarian dialects'
electronic archive

eXTReMe Tracker








BulTreeBank Group

The BulTreeBank Group is working on projects related to Computational Linguistics and Semantic Web. Our main task is to create Language Resources and Tools for Bulgarian. We have worked on creation of a Bulgarian Treebank, POS tagger, Partial Grammar, Text Archive, Domain Ontologies, Lexicons, XML Tools. The BulTreeBank Group is part of the Linguistic Modelling Laboratory (LML), Institute of Information and Communication Technologies , Bulgarian Academy of Sciences. The group originates from BulTreeBank Project. The project was funded by the Volkswagen Stiftung, Federal Republic of Germany under the Programme "Cooperation with Natural and Engineering Scientists in Central and Eastern Europe". The project was carried out mainly at LML in tight cooperation with researchers at the Seminar für Sprachwissenschaft (SfS), Eberhard-Karls-Universitä t, Tübingen, Germany.

The core members of BulTreeBank Group

  • Kiril Simov - an Associate Professor at LML, IICT, BAS,
  • Petya Osenova - an Associate Professor at Sofia University and a Senior Researcher at LML, IICT, BAS.

Special Issue of Cybernetics and Information Technologies Journal on Semantic Models for NLP

WebCLaRK – Bulgarian Portal for Language Services on the web

Current Projects

We are involved in the following projects and initiatives:

Past Projects

We were involved in the following projects:

  • EuroMatrixPlus - Bringing Machine Translation for European Languages to the User - Bulgarian-English Resources,
  • FLaReNet - Fostering Language Resources Network. (National Representative),
  • LTfLL - Language Technologies for Lifelong Learning. We are responsible for Common Semantic Framework, Ontologies, Semantic Annotation,
  • AsIsKnown - A Semantic-Based Knowledge Flow System for the European Home Textiles Industry. We are responsible for the Ontologies, Lexicons and Semantic Annotation,
  • LT4eL - Language Technology for eLearning. We were responsible for the Ontologies, Lexicons, Semantic Annotation and Bulgarian Resources,
  • BulTreeBank - HPSG-based Syntactic Treebank of Bulgarian. We have created a Bulgarian Treebank, Text Archive, Morphosyntactic Corpus, Partial Grammars, and other tools for Bulgarian,
  • CLaRK - Tьbingen-Sofia International Graduate Programme in Computational Linguistics and Represented Knowledge . We have implemented the CLaRK System.


Петя Осенова, Кирил Симов. Формална граматика на българския език. Институт по паралелна обработка на информацията - БАН. София, 18. 12. 2007 г. (Formal Grammar of Bulgarian Language. IPP, BAS.)

Here is a draft of Petya's habilitation (in Bulgarian). Any comments are welcome. Bulgarian Noun Phrases in HPSG. - Summary in English

Това е вариант на хабилитационния труд на Петя Осенова. Всякакви коментари са добре дошли. Именните фрази в българския език (с оглед на Опорната фразова граматика).

CLaRK system - XML-based system for corpora development

The core of CLaRK is an XML Editor, which is the main interface to the system. Besides the XML language itself, we implemented an XPath language for navigation in documents and an XSLT language for transformation of XML documents. CLaRK is based on an Unicode encoding of the information inside the system. The basic mechanism of CLaRK for linguistic processing of text corpora is the cascaded regular grammar processor. Several mechanisms for imposing constraints over XML documents are available. The constraints cannot be stated by the standard XML technology.

Technical Reports

Available Language Resources

The dependency format of the treebank, morphosyntactically annotated corpus, stopwords for Bulgarian, frequency list, etc.

Курсове по Линукс (Linux).


Kiril Simov
BulTreeBank Group
Linguistic Modelling Laboratory, IICT,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria
Fax: (+359 2) 870 72 73