Treebanks and Linguistic Theories
20th and 21st September 2002, Sozopol, Bulgaria
Workshop motivation and aims
Treebanks are a language resource that provides annotations of natural
languages at various levels of structure: at the word level, the
phrase level, the sentence level, and sometimes also at the level of
function-argument structure. Treebanks have become crucially
important for the development of data-driven approaches to natural
language processing, human language technologies, grammar extraction
and linguistic research in general. There are a number of on-going
projects on compilation of representative treebanks for languages that
still lack them (Spanish, Bulgarian, Portugese,Turkish) and a number
of on-going projects on compilation of treebanks for specific purposes
for languages that already have them (English).
The practices of building syntactically processed corpora have proved
that aiming at more detailed description of the data becomes more and
more theory-dependent (Prague Dependency Treebank and other
dependency-based treebanks as the Italian treebank (TUT) or the
Turkish treebank (METU); Verbmobil HPSG Treebanks, Polish HPSG
Treebank, Bulgarian HPSG-based Treebank etc.). Therefore the
development of treebanks and formal linguistic theories need to be
more tightly connected in order to ensure the necessary information
flow between them.
The workshop aims at being a forum for researchers and advanced students
working in one or both of these areas. It will be held in conjunction with the
summer school "Empiri
cal Linguistics and Natural Language Processing", Flagman hotel, Sozopol,
Frantisek Cermak, Charles University Prague, Czech
Today's Corpus Linguistics: Some Open Questions (a preliminary
Hans Uszkoreit, DFKI, Saarbruecken, Germany
(Title to be
Elisaveta Balabanova and Krassimira Ivanova.
a machine-readable version of Bulgarian valence dictionary: (A case study of
CLaRK system application).
Philippe Blanche and Marie-Laure GuŽnot.
Corpus Annotation with Property Grammar.
Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang
Lezius, George Smith.
Aoife Cahill, Mair'ead McCarthy, Josef Van Genabith and Andy
Evaluating Automatic F-Structure Annotation for the Penn-II
Montserrat Civit, Mº AntÔnia MartÎ and Nßria BufÎ.
Design Principles for a Spanish Treebank.
Erhard Hinrichs and Julia Trushkina.
Forging Agreement: Morphological Disambiguation of Noun
Krassimira Ivanova, Dimitar Doikoff.
Grammars and Constraints over Morphologically Annotated Data for Ambiguity
Jiri Mirovsky, Roman Ondruska, and Daniel Prusa.
Searching through Prague Dependency Treebank
Conception and Architecture.
What kinds of trees grow
in Swedish soil? A Comparison of Four Annotation Schemes for
Stephan Oepen, Dan Flickinger, Kristina Toutanova, Christoper
LinGO Redwoods: A Rich and Dynamic
Treebank for HPSG.
Bulgarian Nominal Chunks and Mapping
Strategies for Deeper Syntactic Analyses.
Petya Osenova and Sia Kolkovska.
Combining the named-entity recognition task and NP
chunking strategy for robust pre-processing.
Kiril Simov, Alexander Simov, Milen Kouylekov, Krassimira
CLaRK System: Construction of
Segmentation Layers in the Group of
the Predicate: a Case Study of Bulgarian within the BulTreeBank
Development with Deductive and Abductive Explanation-based Learning: Exploratory
Yovka Tisheva and Marina Dzhonova.
Structure Level in TreeBanks.
Kristina Toutanova, Christoper D. Manning, Stephan Oepen.
Parse Ranking for a Rich HPSG Grammar.
Bilingual corpora as a platform for
cross-linguistic treebank development
Two round-table discussions will be organized on the following topics:
- the relationship between the syntactic properties of a given language
and the choice of linguistic theory for annotation purposes
- the utility of treebanks for linguistic theorizing
Erhard Hinrichs, Germany (co-chair)
Tilman Berger , Germany
Marek Swidzinski, Poland
Adam Przepi'orkowski, Poland
Kiril Simov, Bulgaria (co-chair)
Vladimir Petkevic, Czech Republic
Anatolij N. Baranov, Russia
Sandra Kuebler, Germany
Kemal Oflazer, Turkey
Michael Barlow, USA
Tomaz Erjavec, Slovenia
Robert Engels, Norway
Andreas Wagner, Germany
Frank Richter, Germany
Manfred Sailer, Germany
Walter Daelemans, Belgium
Karel Oliva, Austria
Laurent Romary, France
The registration fee for the workshop is:
For participants from Central and Eastern Europe
the fee is reduced to 25 Euro.
It is preferably the fee to be paid at the workshop
place in cash.
The fees cover the following services: a copy of the proceedings of
the attended workshop, coffee-breaks and refreshments.
People interested in attending the workshop have to send a
letter of interest.
Deadline for participants' applications (registration): 20 August
Notification of acceptance: 25 August
Participation in the workshop is limited by the venue. Requests for
participation will be processed on first come first served basis.
The workshop will take place in the town of Sozopol, Bulgaria.
Sozopol is one of the best summer resorts on the Black Sea coast,
famous for its unique mixture of ancient Greek culture, Bulgarian
traditional atmosphere (18th century), excellent climate and
entertainment facilities. In addition, it is the favourite place
of Bulgarian artists, both for performances and for relaxation.
It is situated to the south of Bourgas (to be reached by plane -
Bourgas Airport, trains - Bourgas Railway Station, or intercity
buses). It takes 40 minutes to reach Sozopol from Bourgas with
the buses and minibuses that run regularly.
The Workshop will take place in hotel "Flagman". For
accommodation, the participants can choose among hotel "Flagman",
which is relatively expensive but luxurious, and a number of
other options available in Sozopol: cheaper (but still good)
hotels, and rooms in private houses. The organizers can arrange
reservations for hotel "Flagman" and provide assistance with the
rest of the options.
(1.95 BGL = 1 Euro)
Prices for hotel Flagman:
For foreigners: 43 BGL per bed
For Bulgarians: 29 BGL per bed
For foreigners: 65 BGL
For Bulgarians: 44 BGL
Approx. prices for other hotels/private rooms: 10-30 BGL per night.
Approx. expenses for meals, etc.: 8-20 BGL per day.
In the application, specify one of the following options:
- "Flagman, single room" - for foreigners this will cost 65 BGL/night
- "Flagman, bed in a double room" - for foreigners this
will cost 43 BGL/night
- "cheeper hotel, single room" - about 25-30 BGL/night
- "cheeper hotel, double room" - about 30-35 BGL/night
- "cheeper hotel, bed in double room" - about 15-20 BGL/night
- "bed in a private house" - about 10-15 BGL/night, to be
arranged at the moment of arrival
Linguistic Modelling Laboratory, CLPP,
Bulgarian Academy of Sciences
Acad. G.Bonchev St. 25A
1113 Sofia, Bulgaria
Tel: (+359 2) 979 2825
Fax: (+359 2) 70 72 73