Language Technology
for Cultural Heritage Data
(LaTeCH 2007)

June 28, 2007

9:00-13:00

Prague, Czech Republic

The Workshop on Language Technology for Cultural Heritage Data was held in conjunction with the 45th Annual Meeting of the Association for Computational Linguistics, which took place June 23-30, 2007, in Prague, Czech Republic. The ACL Anthology offers the complete proceedings online (also see the bibtex file).

About the Workshop

Museums, archives, and libraries around the world maintain large collections of cultural heritage objects, such as archaeological artefacts, sound recordings, historic manuscripts, or preserved animal specimens. Large scale digitisation projects are currently underway to make these collections more accessible. Of equal importance, however, is the development of powerful tools to search, link, enrich, and mine the digitised data. Language technology has an important role to play in this, even for collections which are primarily non-textual, since text is the pervasive medium used for metadata. At the same time, the cultural heritage domain poses special challenges for the NLP community, including the use of historic or non-standard language, the presence of OCR or transcription errors in the data, and the necessity to deal with data from various media. The cultural heritage domain is therefore also an interesting and challenging testbed for the robustness of existing language technology.

The half-day workshop aims to bring together researchers working on all aspects of applying language technology to the cultural heritage domain. The format will be a mixture of oral presentations, poster presentations, and an invited talk by Douglas W. Oard, University of Maryland, on the MALACH project. See the workshop programme.

Accepted Papers

We received 22 submissions for the workshop. Each was reviewed by three members of the programme committee. The following 11 papers were selected for inclusion in the workshop programme:

Avi Arampatzis, Jaap Kamps, Marijn Koolen and Nir Nussbaum
"Deriving a Domain Specific Test Collection from a Query Log"
David Bamman and Gregory Crane
"The Latin Dependency Treebank in a Cultural Heritage Digital Library"
Lars Borin, Dimitrios Kokkinakis and Leif-Jöran Olsson
"Naming the past: Named entity and animacy recognition in 19th century Swedish literature"
Marieke van Erp
"Retrieving lost information from textual databases: rediscovering expeditions from an animal specimen database"
Michel Généreux
"Cultural Heritage digital resources: from Extraction to Querying"
Karl Grieser, Timothy Baldwin and Steven Bird
"Dynamic visitor path prediction and recommendation in a Museum Environment"
Gareth Jones, Ying Zhang, Eamonn Newman, Fabio Fantino and Franca Debole
"Multilingual Search for Cultural Heritage Archives via Combining Multiple Translation Resources"
Véronique Malaisé, Antoine Isaac, Luit Gazendam and Hennie Brugman
"Anchoring Dutch Cultural Heritage Thesauri to WordNet: two case studies"
Tandeep Sidhu, Judith Klavans and Jimmy Lin
"Disambiguation for Improved Subject Search Using SenseRelate, WordNet, and the Art and Architecture Thesaurus"
Idan Szpektor, Ido Dagan, Alon Lavie, Danny Shacahm and Shuly Wintner
"Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation"
Alejandro Hector Toselli, Verónica Romero and Enrique Vidal
"Viterbi Based Alignment between Text Images and their Transcripts"

Sponsor

The ACL 2007 Workshop on Language Technology for Cultural Heritage Data is supported by the MultiMatch project.

Last update: July 17, 2007; csporled (at) uvt.nl

ACL 2007 Workshop on

Language Technology for Cultural Heritage Data (LaTeCH 2007)