Invited Speakers |
|
Samy Bengio
Google Research Large Scale Learning of a Joint Embedding Space |
Abstract: Rich document annotation is the task of providing
textual semantic to documents like images, videos, music, etc, by ranking a large set of possible annotations according
to how they correspond to a given document. In the large scale setting, there could be millions of such rich documents
to process and hundreds of thousands of potential distinct annotations. In order to achieve such a task we propose to
build a so-called "embedding space", into which both documents and annotations can be automatically projected. In such
a space, one can then find the nearest annotations to a given image/video/music, or annotations similar to a given
annotation. One can even build a semantic tree from these annotations, that corresponds to how concepts (annotations)
are similar to each other with respect to their rich document characteristics. We propose a new efficient learning-to-rank
approach that can scale to such datasets and show some annotation results for images and music databases.
|
Professor Martin Cooke
University of the Basque Country Speaking in adverse conditions: from behavioural observations to intelligibility-enhancing speech modifications |
Abstract: Speech output technology is finding widespread application, including in
scenarios where intelligibility might be compromised -- at least for some
listeners -- by adverse conditions. Unlike most current algorithms, talkers
continually adapt their speech patterns as a response to the immediate
context of spoken communication, where the type of interlocutor and the
environment are the dominant situational factors influencing speech
production. Observations of speaker behaviour can motivate the design of
more robust speech output algorithms. Starting with a listener-oriented
categorisation of possible goals for speech modification, this talk will
describe some of the extensive set of behavioural findings related to human
speech modification, identify those factors which appear to be beneficial,
and go on to examine recent computational attempts to apply speaker-inspired
modifications to improve intelligibility in the face of both stationary and
non-stationary maskers.
|
Joseph P. Campbell
MIT Lincoln Lab Speaker Recognition for Forensic Applications |
Abstract: In forensic speaker comparison, speech utterances are
compared by humans and/or machines for use in investigation. It is a
high-stakes application that can affect people's lives, therefore
demanding the highest scientific standards. Unfortunately, methods
used in practice vary widely --- and not always for the better.
Methods and practices grounded in science are critical for proper
application (and nonapplication) of speaker comparison to a variety of
international investigative and forensic applications. This invited
keynote, by Dr. Joseph P. Campbell of MIT Lincoln Laboratory, provides
a critical analysis of current techniques employed and lessons
learned. It is crucial to improve communication between automatic
speaker recognition researchers, legal scholars and forensic
practitioners internationally. This involves addressing, for instance,
central legal, policy, and societal questions such as allowing speaker
comparisons in court, requirements for expert witnesses, and
requirements for specific automatic or human-based methods to be
considered scientific. This keynote is intended as a roadmap in that
direction.