Digital Humanities Software Tools

A librarian for American studies, anthropology , and sociology, Nancy K. Herther, has recently published an article in Computers in Libraries where she includes a good list of digital humanities tools.  Here are some links to these:

DH Press ( “DH Press is a plugin for WordPress that enables scholars to visualize their humanities-oriented data and allow users to access that data from the visualizations themselves. ”

Omeka ( Omeka provides open-source web publishing platforms for sharing digital collections and creating media-rich online exhibits.

Scaler 2  ( media-rich, scholarly electronic publishing) – media-rich, scholarly electronic publishing

Chronos Timeline ( Chronos is a flexible jQuery plug developed by HyperStudio Digital Humanities at MIT.

TimelineJS ( an open-source tool that enables anyone to build visually rich, interactive timelines.

Historypin ( a community archiving platform .

QGIS ( A Free and Open Source Geographic Information System.

Concordle  ( “Concordle has one point common with Wordle: it makes word clouds. But these are only text, and in a browser in general the choice of fonts is limited, so the clouds are not so very pretty. But it is much more clever:  All the words in the cloud are clickable, i.e. they have links to concordancer function. ”

Netlytic  ( “a community-supported text and social networks analyzer that can automatically summarize and discover social networks from online conversations on social media sites”

Palladio  ( Stanford University’s online visualization tool that take CSV files and SPARQL endpoints (beta) as input.

Prism  ( a tool for “crowdsourcing interpretation.” Users are invited to provide an interpretation of a text by highlighting words according to different categories, or “facets.”

Tableau ( this is a well known data visualization tool, especially popular in business.

Umigon ( Semantic analysis on Twitter.

Voyant Tools ( One of the DH text analysis tools listed in a previous post.

IIIF Open Source Developments

IIIF (International Image Interoperability Framework)  is a community of research libraries and image repositories working on interoperable technology and community framework for image delivery with the goals of uniform and rich access to image-based resources, common APIs for image repositories that enable great user experience while viewing, comparing, manipulating and annotating images and provide uniform rich access to image resources hosted online.

The framework for IIIF development has been its Image API ( that allows for the retrieval of pixels through a REST web service and Presentation API ( that drives viewing interfaces.   In addition, there is a Search API ( and Authentication API (  The APIs use JSON-LD ( throughout.

IIIF Image Servers:

IIIF Image API Viewers:

IIIF Presentation API Viewers :

The full list of viewers is available here:

Demonstration IIIF sites:





Digital humanities text analysis tools

Distant Reading & Text Analysis

The Versioning Machine ( is a framework and an interface for displaying multiple versions of text encoded according to the Text Encoding Initiative (TEI) Guidelines

Voyant Tools ( web-based reading and analysis environment for digital texts.

Twine ( an open-source tool for telling interactive, nonlinear stories. You don’t need to write any code to create a simple story with Twine, but you can extend your stories with variables, conditional logic, images, CSS, and JavaScript when you’re ready.

Spoken audio analysis tools

Open  Source

WaveSurfer ( an open source tool for sound visualization and manipulation. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer may be extended by plug-ins as well as embedded in other applications.

Praat: doing phonetics by computer (

Gentle ( aligners are computer programs that take media files and their transcripts and return extremely precise timing information for each word (and phoneme) in the media. Drift ( ) output: pitch and timing.  It samples what human listeners perceive as vocal pitch.

Kaldi (  a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

SonicVisualizer ( an application  for viewing and analysis of contents of music audio files.

Audacity (  a free, easy-to-use, multi-track audio editor and recorder for Windows, Mac OS X, GNU/Linux and other operating systems.

SIDA ( Speaker Identification for Archives. Includes a notebook that walks through the steps of training and running a classifier that takes speaker labels and the audio, extracts features (including vowels), and trains a model and runs it.

Audio Labeler ( An in-browser app for labeling audio clips at random, using Docker and Flask

ARLO ( was developed for classifying bird calls and using visualizations to help scholars classify pollen grains. ARLO has the ability to extract basic prosodic features such as pitch, rhythm and timbre for discovery (clustering) and automated classification (prediction or supervised learning), as well as visualizations. The current implementation of ARLO for modeling runs in parallel on systems at the National Center for Supercomputing Applications (NCSA). The source code for ARLO is open-source and will be made available for research purposes for this and subsequent projects on sourceforge at

Not open source, but available for academic use:

STRAIGHT ( a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. It is an always evolving system for attaining better sound quality, that is close to the original natural speech, by introducing advanced signal processing algorithms and findings in computational aspects of auditory processing.

STRAIGHT decomposes sounds into source information and resonator (filter) information. This conceptually simple decomposition makes it easy to conduct experiments on speech perception using STRAIGHT, the initial design objective of this tool, and to interpret experimental results in terms of huge body of classical studies.

Online Services:

Pop Up Archive ( is a platform of tools for organizing and searching digital spoken word. Processing sound for a wide range of customers, from large archives and universities to media companies, radio stations, and podcast networks. Drag and drop any audio file (or let us ingest your RSS, SoundCloud, or iTunes feed), and within minutes receive automatically generated transcripts and tags. 

Library of Congress Call Number Sort in PHP

I needed a function that will compare Library of Congress call numbers. I found a function in PERL to do this on Joshua McGee’s site, but I needed it in PHP and our call numbers use a space instead of a period for some of the separators, for example: HA 1107 K49 2003.

Here is the function in PHP:


//is call number $a larger than call number $b?
function locsort ($a,$b)
$pattern =’/^([A-Z]+)\s?(\d+(?:\.\d+)?)\s?([A-Z]*)(\d*)\.?([A-Z]*)(\d*)( (?:\d{4})?)?(.*)?/’;

$i = preg_match($pattern, $a, $regsA);
$j = preg_match($pattern, $b, $regsB);

if (($i==0)||($j==0)) {
return($a > $b);
else {
//if first part greater then return that
//if first part equal, check second part, return that
if ($regsA[1] != $regsB[1]){
return($regsA[1] > $regsB[1]);
else {
if ($regsA[2] != $regsB[2]){
return ($regsA[2] > $regsB[2]);
else {
if ($regsA[3] != $regsB[3]){
return ($regsA[3] > $regsB[3]);
if ($regsA[4] != $regsB[4]){
return ((“0.”.$regsA[4]) > (“0.”.$regsB[4]));
else {
if ($regsA[5] != $regsB[5]){
return ($regsA[5] > $regsB[5]);
else {
if ($regsA[6] != $regsB[6]){
return ($regsA[6] > $regsB[6]);
else {
if ($regsA[7] != $regsB[7]){
return ($regsA[7] > $regsB[7]);
else {
return ($regsA[8] > $regsB[8]);

[Photomedia Forum post by T.Neugebauer from Jan 17, 2012  ]

from open source to RiP! A Remix Manifesto

Brett Gaylor’s RiP! A remix manifesto touches on many themes: corporate control of cultural heritage and media, copyright law, artistic creativity, remixing music. I attended the screening and discussion at RVCQ, and if I had to pick one conclusion from the many that can be made it would be that society is forever re-creating and re-interpreting its culture, media and information.

The film’s central protagonist is Girl Talk (aka Gregg Gillis), a biomedical engineer by day and a remix musician by night. There are some parallel themes to open access publishing explored in the film through Gregg Gillis’ day job. Among the many interesting people in the film are Lawrence Lessig (founder of Creative Commons) and Gilberto Gil (Brazil’s Minister of Culture). There are compelling stories too, like that of the Mouse Liberation Front, or Jammie Thomas and other unfortunate civilian targets of lawsuits by the copyright industry.

The film begins with a discussion of the birth and growth of the Internet, technology that is a foundation for much of the creative dilemmas that are presented – it is this enabling technology that makes digital remix culture possible. The ideas of openness and copyright are certainly not as new as the Internet, Phillip Davis’ recent article (How the Media Frames “Open Access”) points out that the general meaning of “unrestricted admission or access” is documented in Oxford English Dictionary as far back as 1602. However, the Internet is the enabling technology to dramatically increase access to all kinds of cultural objects, from scientific publications, to music and cinema.

Considering the importance of the Internet to the theme, open source software seems like a relevant starting point. I would have liked to see a more thorough coverage of this in the film. Open source software, with its collaborative development model and its General Public License that aims to protect the work from copyright restrictions seems deserving of special mention. The fact that the open source Apache web server, for example, is the most popular web server on the Internet since 1996 is a significant proof of concept of the power of this collaborative model. Some of the media corporations that are mentioned in the film are likely using Apache web servers, and other open source applications such as Berkeley Internet Name Domain (BIND). However, since Brett takes this collaborative concept seriously through hosting public contributions to RiP! a remix manifesto at, I can always submit a remix that includes what I think is missing from the film!

Overall, I think this is an excellent film, and I also appreciate Brett’s ability to explain his point of view in person. He questions the need and implications of having to ask for explicit permission from copyright holders to re-use or remix. The act of pleading for permission implies that the public only has permission to be consumers of media by default. Do publishers ask for permission from the public to post their media all over the city, in the newspapers, on television and the Internet? We are in search of a balance, a path that would allow artists to benefit commercially from their work while at the same time keeping culture and media accessible to more than just consumption – keeping it open to creative reuse.

In my personal opinion, the idea of finding a global and permanent “middle road” solution is seductive, but making anything permanent and compulsory in art could be self-defeating. The discussions over artistic creativity and the objects of culture will continue indefinitely, in my opinion. In the meantime, Brett Gaylor has contributed a remix of a thought provoking manifesto:

1. Culture always builds on the past
2. The past always tries to control the future
3. Our future is becoming less free
4. To build free societies we need to limit the control of the past

[Photomedia Forum post by T.Neugebauer from Mar 05, 2009]

Art and Architecture Thesaurus now available as Linked Open Data

It was informally announced during the 2013 LODLAM Summit in Montreal last year, and the official announcement was made today by Jim Cuno, the President and CEO of the Getty –

Getty Vocabularies, the Art and Architecture Thesaurus (AAT), is now available as Linked Open Data. The dataset is available at under an Open Data Commons Attribution License (ODC BY 1.0).

The SPQRQL endpoint and the documentation is found here:

Over the next 18 months, The Research Institute’s other three Getty Vocabularies – The Getty Thesaurus of Geographic Names (TGN)®, The Union List of Artist Names®, and The Cultural Objects Name Authority (CONA)® will all become available as Linked Open Data.

For general information about our Linked Open Data project see

The open availability of these valuable data sets is great news for developers working with cultural data.

[Photomedia Forum post by T.Neugebauer from Feb 23, 2014 ]

Timeline Visualization: Photography Exhibition Catalogues in e-Artexte

I’ve been working with Artexte on the development of e-artexte, a unique open access digital repository for documents in the visual arts in Canada. It is a new on-line service that caters to the needs of museums, galleries, artist-run centres and other publishers/authors in the visual arts community who are looking for ways to make their publications more widely accessible via the Internet.

The open source EPrints platform that powers e-Artexte is highly interoperable. In choosing open source technology that is capable of export of its contents using semantic web standards, a necessary condition for innovation around that content is met. E-Artexte enables researchers to leverage the open metadata exporting capabilities of the EPrints software to create customized visualizations.

As an example of such visualization, I ran an advanced search for all exhibition catalogues with the keyword “photography” or “photographie” (for those items catalogued in French language only). I then exported this result from e-Artexte using the JSON export and customized the Timeline libraries so that they will be able to display this data.

This is the result:
Timeline Visualization: Photography Exhibition Catalogues in Artexte Collection (1960-)

The interface allows for the browsing of hundreds of photography-related exhibition catalogue metadata through an interface that organizes the display by time. You can move the timeline by using one of two bands: year, and decade. Clicking on an individual title brings up a more detailed view with an abstract, and clicking on the title in that bubble opens a new window/tab with the relevant e-Artexte record. The visualization updates the latest photography exhibition information from e-artexte every 30 days.


Source Code:

[Photomedia Forum post by T.Neugebauer from Jan 18, 2013 ]