SPEAKER ODYSSEY 2014

Invited Speakers

Samy Bengio
Google Research

Large Scale Learning of a Joint Embedding Space

Abstract: Rich document annotation is the task of providing textual semantic to documents like images, videos, music, etc, by ranking a large set of possible annotations according to how they correspond to a given document. In the large scale setting, there could be millions of such rich documents to process and hundreds of thousands of potential distinct annotations. In order to achieve such a task we propose to build a so-called "embedding space", into which both documents and annotations can be automatically projected. In such a space, one can then find the nearest annotations to a given image/video/music, or annotations similar to a given annotation. One can even build a semantic tree from these annotations, that corresponds to how concepts (annotations) are similar to each other with respect to their rich document characteristics. We propose a new efficient learning-to-rank approach that can scale to such datasets and show some annotation results for images and music databases.

Professor Martin Cooke
University of the Basque Country

Speaking in adverse conditions: from behavioural observations to intelligibility-enhancing speech modifications

Abstract: Speech output technology is finding widespread application, including in scenarios where intelligibility might be compromised -- at least for some listeners -- by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of speaker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this talk will describe some of the extensive set of behavioural findings related to human speech modification, identify those factors which appear to be beneficial, and go on to examine recent computational attempts to apply speaker-inspired modifications to improve intelligibility in the face of both stationary and non-stationary maskers.

Joseph P. Campbell
MIT Lincoln Lab

Speaker Recognition for Forensic Applications

Abstract: In forensic speaker comparison, speech utterances are compared by humans and/or machines for use in investigation. It is a high-stakes application that can affect people's lives, therefore demanding the highest scientific standards. Unfortunately, methods used in practice vary widely --- and not always for the better. Methods and practices grounded in science are critical for proper application (and nonapplication) of speaker comparison to a variety of international investigative and forensic applications. This invited keynote, by Dr. Joseph P. Campbell of MIT Lincoln Laboratory, provides a critical analysis of current techniques employed and lessons learned. It is crucial to improve communication between automatic speaker recognition researchers, legal scholars and forensic practitioners internationally. This involves addressing, for instance, central legal, policy, and societal questions such as allowing speaker comparisons in court, requirements for expert witnesses, and requirements for specific automatic or human-based methods to be considered scientific. This keynote is intended as a roadmap in that direction.