Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Michael Paul and Mark Dredze
Multi-dimensional latent text models, such as factorial LDA (f-LDA), capture
multiple factors of corpora, creating structured output for researchers to
better understand the contents of a corpus. We consider such models for
clinical research of new recreational drugs and trends, an important
application for mining current information for healthcare workers. We use a
"three-dimensional" f-LDA variant to jointly model combinations of drug
(marijuana, salvia, etc.), aspect (effects, chemistry, etc.) and route of
administration (smoking, oral, etc.) Since a purely unsupervised topic model is
unlikely to discover these specific factors of interest, we develop a novel
method of incorporating prior knowledge by leveraging user generated tags as
priors in our model. We demonstrate that this model can be used as an
exploratory tool for learning about these drugs from the Web by applying it to
the task of extractive summarization. In addition to providing useful output
for this important public health task, our prior-enriched model provides a
framework for the application of f-LDA to other tasks.
Back to Papers Accepted