ACL 2007 Workshop on
for Cultural Heritage Data
June 28, 2007
Prague, Czech Republic
The Workshop on Language Technology for Cultural Heritage
Data was held in conjunction with the 45th Annual Meeting of the Association for Computational Linguistics, which took place June 23-30, 2007, in Prague, Czech Republic. The ACL Anthology offers the complete
proceedings online (also see the bibtex file).
About the Workshop
Museums, archives, and libraries around the world maintain large
collections of cultural heritage objects, such as archaeological
artefacts, sound recordings, historic manuscripts, or preserved animal
specimens. Large scale digitisation projects are currently underway to
make these collections more accessible. Of equal importance, however,
is the development of powerful tools to search, link, enrich, and mine
the digitised data. Language technology has an important role to play
in this, even for collections which are primarily non-textual, since
text is the pervasive medium used for metadata. At the same time, the
cultural heritage domain poses special challenges for the NLP
community, including the use of historic or non-standard language, the
presence of OCR or transcription errors in the data, and the necessity to deal
with data from various media. The cultural heritage domain is
therefore also an interesting and challenging testbed for the
robustness of existing language technology.
The half-day workshop aims to bring together researchers working on all aspects
of applying language technology to the cultural heritage domain. The
format will be a mixture of oral presentations, poster presentations,
and an invited talk by Douglas W.
Oard, University of Maryland, on the MALACH project. See the
We received 22 submissions for the workshop. Each was reviewed by
three members of the programme committee. The following 11 papers were
selected for inclusion in the workshop programme:
- Avi Arampatzis, Jaap Kamps, Marijn
Koolen and Nir Nussbaum
"Deriving a Domain Specific Test Collection from a Query Log"
- David Bamman and Gregory Crane
"The Latin Dependency
Treebank in a Cultural Heritage Digital Library"
- Lars Borin, Dimitrios Kokkinakis and
"Naming the past: Named entity and animacy recognition in
19th century Swedish literature"
- Marieke van Erp
"Retrieving lost information from textual
databases: rediscovering expeditions from an animal specimen
- Michel Généreux
"Cultural Heritage digital resources: from
Extraction to Querying"
- Karl Grieser, Timothy Baldwin and Steven Bird
visitor path prediction and recommendation in a Museum
- Gareth Jones, Ying Zhang, Eamonn Newman, Fabio Fantino and
"Multilingual Search for Cultural Heritage
Archives via Combining Multiple Translation Resources"
- Véronique Malaisé, Antoine Isaac, Luit Gazendam and Hennie
"Anchoring Dutch Cultural Heritage Thesauri to
WordNet: two case studies"
- Tandeep Sidhu, Judith Klavans and Jimmy Lin
for Improved Subject Search Using SenseRelate, WordNet, and
the Art and Architecture Thesaurus"
- Idan Szpektor, Ido Dagan, Alon Lavie, Danny Shacahm and
"Cross Lingual and Semantic Retrieval for
Cultural Heritage Appreciation"
- Alejandro Hector Toselli, Verónica
Romero and Enrique Vidal
"Viterbi Based Alignment between Text Images and their
The ACL 2007 Workshop on Language Technology for Cultural Heritage
Data is supported by the MultiMatch project.