Extracting the Native Language Signal for Second Language Acquisition
Ben Swanson and Eugene Charniak
We develop a method for effective extraction of linguistic patterns that are
differentially expressed based on the native language
of the author. This
method
uses
multiple
corpora
to
allow
for the
removal of data set specific patterns, and addresses both feature
relevancy and redundancy. We
evaluate different relevancy ranking metrics and
show that common measures of relevancy can be inappropriate for
data with many rare features. Our feature set is a broad class of
syntactic patterns, and to better capture the signal
we extend the Bayesian Tree Substitution
Grammar induction algorithm to a supervised mixture of latent grammars.
We show that this extension can be used to extract a larger set of relevant
features.
Back to Papers Accepted