Shared Task: Automatic Arabic Error Correction
Subscribe to the shared task discussion group
For Frequently Asked Questions , please visit the FAQ page for the shared task
Deadline for system output collection extended to July 25, 2014 .
Deadline for paper submission extended to July 28, 2014 .
Submission instructions are available here.
NEW: List of Accepted Papers
As part of the Arabic Natural Language Processing
Workshop at EMNLP
2014 (to be held in Doha, Qatar), we will conduct a shared task on
Automatic Arabic Error Correction. We designed this task in the
traditions of high profile shared tasks in natural language processing
such as CONLL's grammar/error detection and correction shared tasks in
2011-2013 and numerous machine translation campaigns by
NIST/WMT/MEDAR, among others. The task relies on resources created
under the Qatar Arabic Language Bank (QALB) project (currently over 1M
words of manually corrected Arabic text).
A participating system in
this shared task will be given Modern Standard Arabic texts, which are
to be automatically corrected. The input will be provided in
Arabic script and in a standard Romanization scheme, and will be
annotated for part-of-speech (in three different granularities), inflectional features,
clitics (which appear in 20% of Arabic words), lemmas, and English
glosses. All of the input text will be
preprocessed in a common way to make sure all participants have access
to all of these features at no additional overhead novelty cost. The task is
focused on correction as opposed to identification. There will not be
an error identification task per se.
Participants need to register.
Once registered, all participating teams will be provided with a
common training data set, which includes common preprocessed input and
corrected output. Registration link is on the Shared Task Website (see below).
A common development set will also be provided. A
blind test data set will be used to evaluate the output of the
participating teams. An evaluation script will be provided to all the
teams. Each participating team can submit up to three systems.
Participants are welcome to use additional resources and tools that are not part of the released data set.
However, all such additions must be fully disclosed. Participants are expected to author a short paper (4 pages + 2
for references) describing their approach, resources and experiments. The paper needs to follow the standard format of EMNLP conference.
Important Dates
Shared task registration period: April 8, 2014 through July 1, 2014
Shared task test release: July 7, 2014
Shared task system output collection: July 25, 2014
Submission deadline (Workshop and shared task papers): July 28, 2014
Author notification: August 26, 2014
Camera Ready: September 15, 2014
Workshop: October 25, 2014
Registering to acquire the QALB Corpus
Please complete the QALB corpus release form in order to receive a link to the training data.
Shared Task Committee
Behrang Mohit (co-chair), Carnegie Mellon University Qatar
Alla Rozovskaya (co-chair), Columbia University
Wajdi Zaghouani, Carnegie Mellon University Qatar
Ossama Obeid, Carnegie Mellon University Qatar
Nizar Habash (advisor), New York University Abu Dhabi