Recent research in computational music research, including my own, has been greatly influenced by methods in computational linguistics. But I believe the influence could also go the other way: Music may offer some interesting lessons for language research, particularly with regard to the modeling of cognition.
In this talk I will focus on an important problem in music cognition: the problem of key identification. I will argue that this problem is in some ways analogous to the problem of syntactic parsing in language. I will present a simple Bayesian model that performs well at the key-finding task. I will then consider some implications of the model for other issues. The model represents moment-to-moment changes in key over time and captures "reanalysis" effects in key perception. The model can be used to estimate the tonal ambiguity of a musical passage, and can also be used to estimate the probability of note patterns (just as a probabilistic grammar can be used to estimate the probability of word strings). An interesting question here concerns expectation: In forming expectations for the next surface element (note or word), do we consider all possible structures (syntactic structures or keys) or just the most probable one? Finally, the model sheds light on the concept of "information flow." It has been suggested that language reflects a tendency towards uniform density of information, in that less probable elements are spread out or elongated; I will suggest that the same may be true in music.
David Temperley is Associate Professor of Music Theory at Eastman School of Music. After attending Swarthmore College, he worked for several years as a free-lance accompanist and composer in New York City. Temperley earned his PhD in music theory from Columbia in 1996, and subsequently was a post-doctoral fellow in music cognition at Ohio State University. Temperley's music research has spanned a wide range of areas, including rhythm and meter, rock, and African music, but his primary focus has been computational modeling of music cognition. His first book The Cognition of Basic Musical Structures (MIT Press, 2001), which won the Society for Music Theory's Emerging Scholar award, proposed computational models of the perception of five basic kinds of musical structure: meter, grouping, contrapuntal structure, harmony, and key. His second book, Music and Probability (MIT Press, 2007), revisits some of these same issues from a probabilistic (Bayesian) perspective.
For many years, Temperley has had a strong secondary interest in language. In the early 1990s, along with Daniel Sleator and John Lafferty, he developed the link grammar parser, a wide-coverage English parser based on an original theory of dependency syntax; the parser has been used in a wide variety of applications, and currently serves as the grammar-checker for the AbiWord word-processing system. More recently, Temperley's language research has focused on corpus research and computational models of language perception and production. Recent articles in Cognition and Cognitive Science (in collaboration with Dan Gildea) explore the idea that languages reflect a preference to minimize dependency length (that is, to locate closely related words close together in the sentence) and that this principle shapes both grammars and syntactic choices. Other language studies have focused on ambiguity avoidance in language production and on the regularity of linguistic stress patterns.
This talk is about interpreting human communication in meetings using audio, video and other signals. Meetings are an interesting and challenging problem, since the communication in a meeting is conversational and involves multiple speakers and multiple modalities.
This results in significant research problems in signal processing (identify and segregate the different speakers), in speech recognition (recognize spontaneous and overlapped speech), and in meeting interpretation (take account of both individual and group behaviours).
Addressing these problems requires an interdisciplinary effort. In this talk, I'll discuss the capture and annotation of multimodal meeting recordings - resulting in the AMI meeting corpus - and how we have built on this to develop techniques and applications for the recognition, indexing and interpretation of meetings.
Steve Renals is director of the Centre for Speech Technology Research (CSTR) and professor of Speech Technology in the School of Informatics, at the University of Edinburgh. He received a BSc in Chemistry from the University of Sheffield in 1986, an MSc in Artificial Intelligence from the University of Edinburgh in 1987, and a PhD in Speech Recognition and Neural Networks, also from Edinburgh, in 1990. From 1991-92 he was a postdoctoral fellow at the International Computer Science Institute (ICSI), Berkeley, and was then an EPSRC postdoctoral fellow in Information Engineering at the University of Cambridge (1992-94). From 1994-2003 he was a lecturer, then reader, in Computer Science at the University of Sheffield, moving to Edinburgh in 2003. He is an associate editor of ACM Transactions on Speech and Language Processing and IEEE Signal Processing Letters, and a former member of the IEEE Technical Committee on Machine Learning and Signal Processing.