Spoken audio analysis tools

Open  Source

WaveSurfer (https://sourceforge.net/projects/wavesurfer/)is an open source tool for sound visualization and manipulation. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer may be extended by plug-ins as well as embedded in other applications. http://www.speech.kth.se/wavesurfer/

Praat: doing phonetics by computer (http://www.fon.hum.uva.nl/praat/)


Gentle (http://lowerquality.com/gentle/Forced aligners are computer programs that take media files and their transcripts and return extremely precise timing information for each word (and phoneme) in the media. Drift (http://drift-demo.lowerquality.com/upload ) output: pitch and timing.  It samples what human listeners perceive as vocal pitch.

Kaldi (http://kaldi-asr.org/)  a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

SonicVisualizer (http://www.sonicvisualiser.org/) an application  for viewing and analysis of contents of music audio files.

Audacity (http://www.audacityteam.org/)  a free, easy-to-use, multi-track audio editor and recorder for Windows, Mac OS X, GNU/Linux and other operating systems.

SIDA (https://github.com/hipstas/sida) Speaker Identification for Archives. Includes a notebook that walks through the steps of training and running a classifier that takes speaker labels and the audio, extracts features (including vowels), and trains a model and runs it.

Audio Labeler (https://github.com/hipstas/audio-labeler) An in-browser app for labeling audio clips at random, using Docker and Flask

ARLO (https://sites.google.com/site/nehhipstas/documentation) was developed for classifying bird calls and using visualizations to help scholars classify pollen grains. ARLO has the ability to extract basic prosodic features such as pitch, rhythm and timbre for discovery (clustering) and automated classification (prediction or supervised learning), as well as visualizations. The current implementation of ARLO for modeling runs in parallel on systems at the National Center for Supercomputing Applications (NCSA). The source code for ARLO is open-source and will be made available for research purposes for this and subsequent projects on sourceforge at http://sourceforge.net/projects/arlo/.

Not open source, but available for academic use:

STRAIGHT (http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html) a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. It is an always evolving system for attaining better sound quality, that is close to the original natural speech, by introducing advanced signal processing algorithms and findings in computational aspects of auditory processing.

STRAIGHT decomposes sounds into source information and resonator (filter) information. This conceptually simple decomposition makes it easy to conduct experiments on speech perception using STRAIGHT, the initial design objective of this tool, and to interpret experimental results in terms of huge body of classical studies.

Online Services:


Pop Up Archive (https://www.popuparchive.com/) is a platform of tools for organizing and searching digital spoken word. Processing sound for a wide range of customers, from large archives and universities to media companies, radio stations, and podcast networks. Drag and drop any audio file (or let us ingest your RSS, SoundCloud, or iTunes feed), and within minutes receive automatically generated transcripts and tags. 

Science/tech/university podcasts

EDX -Founded by Harvard University and MIT in 2012, edX is an online learning destination and MOOC provider, offering high-quality courses from the world’s best universities and institutions to learners everywhere.

University of Bath in the UK has this Podcasts of Public Lecture Series.


Nature has its podcast archive.

Stanford on iTunes

http://itunes.stanford.edu/ – Faculty lectures, interviews, music and sports.

University of Washington’s Cryptography Course

CSE P 590TU – Practical Aspects of Modern Cryptography – Plus related lecture slides and video archives.

Computer Science video lectures

Structure and Interpretation of Computer Programs
Introduction to Algorithms + Course website

University of Washington:
Programming Languages + Course website
Applications of Artificial Intelligence + Course website
Computer Architecture + Course website

ArsDigita University (curriculum):
Web Applications
Structure and Interpretation of Computer Programs
Object Oriented Program Design
Theory of Computation
Artificial Intelligence


 Scientific American

Scientific American (podcasts) from a popular science magazine,

the oldest continuously published magazine in the U.S., […] bringing its readers unique insights about developments in science and technology for more than 150 years.

source: sciam

more University podcasts

MIT OpenCourseWare

Princeton University: WebMedia – Lectures

Tufts OpenCourseWare

Rice University: Live Webcasts & Archives

University of British Columbia Podcasts

University of Warwick podcasts

Utah State University OpenCourseWare


Openculture Master List of 1150 Free Courses

1150 free online courses from the world’s leading universities — Stanford, Yale, MIT, Harvard, Berkeley, Oxford and more. Over 30,000 hours of free audio and video.


Findlectures: faceted index to thousands of hours of free online lectures

A curated database of free lectures, over 20,000 hours of audio, http://findlectures.com/

[Edited from Photomedia Forum post by T.Neugebauer from 2006-2016 ]