This website presents a collection of interactive demonstrations showcasing various speech technology applications. The demonstrations highlight the capabilities of modern speech models and provide examples of their practical use.
Speech input can be provided either through your microphone or by uploading an audio file. To use the microphone option, please grant your browser permission to access your microphone.
Once an audio sample has been provided, you can analyze either the entire recording or selected portions of it. Simply drag across the waveform to choose a specific segment for analysis.
The website is currently under active development, and you may occasionally encounter bugs or temporary service interruptions.
We invite you to explore the demonstrations and learn more about current speech technology capabilities.
This demo lets you discover celebrities and public figures whose voices are most similar to yours using three separate datasets: VoxBlink2 (containing natural, in-the-wild speech recordings from 38,000 distinct individuals curated from YouTube), VoxCeleb (recordings from 5,994 distinct celebrities collected from YouTube), and the Finnish Parliament dataset (recordings from 198 members of parliament between 2009 and 2023).
To evaluate the speaker identification system, you can upload or record speech from individuals included in any of these datasets. You can also experiment with shorter speech segments to explore how much audio is needed for reliable speaker recognition.
In this demo, you can enroll and recognize the voices of up to 8 speakers.
The analysis window size can be adjusted between 500 ms and 3000 ms, which determines the length of the speech segment used to compute each data point in the result plot. Larger analysis windows generally produce more stable recognition scores but increase system latency. Smaller windows provide faster updates but may result in greater score variability.
This demo features an interactive 3D speaker map that visualizes similarity relationships between different voices. You can populate the map with speakers from the Finnish Parliament dataset, record or upload your own custom voice, and observe its placement in the 3D space. The graph dynamically draws colored spring lines connecting your voice to the closest parliamentary matches based on speaker similarity embeddings.
This demo generates speech from text. You can synthesize speech in three modes:
This tool analyzes speech to estimate whether a recording is genuine or synthetic. It provides an overall artificiality score, along with time-resolved scores computed from two-second analysis windows with a one-second overlap.
This page allows you to clone a reference voice and immediately verify the quality of the generated audio. For both the original reference audio and the generated synthetic clone, you can see how the speech is rated by integrated speaker verification and anti-spoofing systems, providing automated feedback on both speaker similarity and voice authenticity.
We prioritize your privacy and take appropriate measures to protect your data. When you record or upload audio files, they are securely transmitted to our server for processing. Once processing is complete, the audio files are immediately and permanently deleted. We do not store personal audio data or associated information. Text inputs provided to the speech synthesizer may be temporarily stored on the server.
The celebrity voice search and real-time speaker identification demos employ the RedimNet2 B2 model.
Implementation: https://github.com/PalabraAI/redimnet2
Publication: https://arxiv.org/abs/2603.11841
The text-to-speech demo employs the VoxCPM2 model.
Implementation: https://github.com/openbmb/VoxCPM
Publication: https://arxiv.org/abs/2509.24650
The spoofing detector uses the DF-Arena 500M V1 anti-spoofing model from Speech-Arena.
Model: https://huggingface.co/Speech-Arena-2025/DF_Arena_500M_V_1
Publication: https://arxiv.org/abs/2509.02859
The VoxBlink2 dataset is a large-scale, audio-visual speaker recognition corpus containing millions of natural speech segments collected from YouTube videos. In its original form, it features over 110,000 distinct speakers and realistic acoustic environments.
Website: https://voxblink2.github.io/
Publication:
https://arxiv.org/abs/2407.11510
The VoxCeleb datasets are large-scale speaker identification datasets containing over a million utterances from celebrity voices extracted from YouTube videos.
Website: https://www.robots.ox.ac.uk/~vgg/data/voxceleb/
Publications:
https://www.robots.ox.ac.uk/~vgg/publications/2017/Nagrani17/nagrani17.pdf
https://www.robots.ox.ac.uk/~vgg/publications/2018/Chung18a/chung18a.pdf
https://www.robots.ox.ac.uk/~vgg/publications/2019/Nagrani19/nagrani19.pdf
The Finnish Parliament dataset is a self-collected dataset from the Finnish parliament plenary session broadcasts, featuring recordings from over 500 members of parliament.
Website: https://verkkolahetys.eduskunta.fi/fi/taysistunnot
Member Photos: https://www2.eduskunta.fi/FI/kansanedustajat/Sivut/edustajakuvat.aspx
Ville Vestman, PhD
Website: https://cs.uef.fi/~vvestman/
Google Scholar: https://scholar.google.com/citations?user=aPZBcWgAAAAJ
Professor Tomi H. Kinnunen, PhD
Website: http://cs.joensuu.fi/pages/tkinnu/webpage/
Google Scholar: https://scholar.google.fi/citations?user=e3SPjpoAAAAJ
Generalized Voice Anti-Spoofing and Voice Biometrics (SPEECHFAKES), Academy of Finland (09/2022 – 08/2026).
Ensure your mic is not muted. Check whether the log at the bottom of the page contains helpful information for troubleshooting and resolving the issue.
Sometimes refreshing the page fixes the problem.
The frequency spectrum visualizer becomes active when the microphone is functioning properly.