Spoken audio analysis tools

Open  Source

WaveSurfer (https://sourceforge.net/projects/wavesurfer/)is an open source tool for sound visualization and manipulation. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer may be extended by plug-ins as well as embedded in other applications. http://www.speech.kth.se/wavesurfer/

Praat: doing phonetics by computer (http://www.fon.hum.uva.nl/praat/)


Gentle (http://lowerquality.com/gentle/Forced aligners are computer programs that take media files and their transcripts and return extremely precise timing information for each word (and phoneme) in the media. Drift (http://drift-demo.lowerquality.com/upload ) output: pitch and timing.  It samples what human listeners perceive as vocal pitch.

Kaldi (http://kaldi-asr.org/)  a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

SonicVisualizer (http://www.sonicvisualiser.org/) an application  for viewing and analysis of contents of music audio files.

Audacity (http://www.audacityteam.org/)  a free, easy-to-use, multi-track audio editor and recorder for Windows, Mac OS X, GNU/Linux and other operating systems.

SIDA (https://github.com/hipstas/sida) Speaker Identification for Archives. Includes a notebook that walks through the steps of training and running a classifier that takes speaker labels and the audio, extracts features (including vowels), and trains a model and runs it.

Audio Labeler (https://github.com/hipstas/audio-labeler) An in-browser app for labeling audio clips at random, using Docker and Flask

ARLO (https://sites.google.com/site/nehhipstas/documentation) was developed for classifying bird calls and using visualizations to help scholars classify pollen grains. ARLO has the ability to extract basic prosodic features such as pitch, rhythm and timbre for discovery (clustering) and automated classification (prediction or supervised learning), as well as visualizations. The current implementation of ARLO for modeling runs in parallel on systems at the National Center for Supercomputing Applications (NCSA). The source code for ARLO is open-source and will be made available for research purposes for this and subsequent projects on sourceforge at http://sourceforge.net/projects/arlo/.

Not open source, but available for academic use:

STRAIGHT (http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html) a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. It is an always evolving system for attaining better sound quality, that is close to the original natural speech, by introducing advanced signal processing algorithms and findings in computational aspects of auditory processing.

STRAIGHT decomposes sounds into source information and resonator (filter) information. This conceptually simple decomposition makes it easy to conduct experiments on speech perception using STRAIGHT, the initial design objective of this tool, and to interpret experimental results in terms of huge body of classical studies.

Online Services:


Pop Up Archive (https://www.popuparchive.com/) is a platform of tools for organizing and searching digital spoken word. Processing sound for a wide range of customers, from large archives and universities to media companies, radio stations, and podcast networks. Drag and drop any audio file (or let us ingest your RSS, SoundCloud, or iTunes feed), and within minutes receive automatically generated transcripts and tags.