Digital Humanities Software Tools

A librarian for American studies, anthropology , and sociology, Nancy K. Herther, has recently published an article in Computers in Libraries where she includes a good list of digital humanities tools.  Here are some links to these:

DH Press ( “DH Press is a plugin for WordPress that enables scholars to visualize their humanities-oriented data and allow users to access that data from the visualizations themselves. ”

Omeka ( Omeka provides open-source web publishing platforms for sharing digital collections and creating media-rich online exhibits.

Scaler 2  ( media-rich, scholarly electronic publishing) – media-rich, scholarly electronic publishing

Chronos Timeline ( Chronos is a flexible jQuery plug developed by HyperStudio Digital Humanities at MIT.

TimelineJS ( an open-source tool that enables anyone to build visually rich, interactive timelines.

Historypin ( a community archiving platform .

QGIS ( A Free and Open Source Geographic Information System.

Concordle  ( “Concordle has one point common with Wordle: it makes word clouds. But these are only text, and in a browser in general the choice of fonts is limited, so the clouds are not so very pretty. But it is much more clever:  All the words in the cloud are clickable, i.e. they have links to concordancer function. ”

Netlytic  ( “a community-supported text and social networks analyzer that can automatically summarize and discover social networks from online conversations on social media sites”

Palladio  ( Stanford University’s online visualization tool that take CSV files and SPARQL endpoints (beta) as input.

Prism  ( a tool for “crowdsourcing interpretation.” Users are invited to provide an interpretation of a text by highlighting words according to different categories, or “facets.”

Tableau ( this is a well known data visualization tool, especially popular in business.

Umigon ( Semantic analysis on Twitter.

Voyant Tools ( One of the DH text analysis tools listed in a previous post.

IIIF Open Source Developments

IIIF (International Image Interoperability Framework)  is a community of research libraries and image repositories working on interoperable technology and community framework for image delivery with the goals of uniform and rich access to image-based resources, common APIs for image repositories that enable great user experience while viewing, comparing, manipulating and annotating images and provide uniform rich access to image resources hosted online.

The framework for IIIF development has been its Image API ( that allows for the retrieval of pixels through a REST web service and Presentation API ( that drives viewing interfaces.   In addition, there is a Search API ( and Authentication API (  The APIs use JSON-LD ( throughout.

IIIF Image Servers:

IIIF Image API Viewers:

IIIF Presentation API Viewers :

The full list of viewers is available here:

Demonstration IIIF sites:





Science/tech/university podcasts

EDX -Founded by Harvard University and MIT in 2012, edX is an online learning destination and MOOC provider, offering high-quality courses from the world’s best universities and institutions to learners everywhere.

University of Bath in the UK has this Podcasts of Public Lecture Series.


Nature has its podcast archive.

Stanford on iTunes – Faculty lectures, interviews, music and sports.

University of Washington’s Cryptography Course

CSE P 590TU – Practical Aspects of Modern Cryptography – Plus related lecture slides and video archives.

Computer Science video lectures

University of Washington:
Programming Languages + Course website
Applications of Artificial Intelligence + Course website
Computer Architecture + Course website

ArsDigita University (curriculum):
Web Applications
Structure and Interpretation of Computer Programs
Object Oriented Program Design
Theory of Computation
Artificial Intelligence


 Scientific American

Scientific American (podcasts) from a popular science magazine,

the oldest continuously published magazine in the U.S., […] bringing its readers unique insights about developments in science and technology for more than 150 years.

source: sciam

more University podcasts

MIT OpenCourseWare

Princeton University: WebMedia – Lectures

Tufts OpenCourseWare

Rice University: Live Webcasts & Archives

University of British Columbia Podcasts

University of Warwick podcasts

Utah State University OpenCourseWare


Openculture Master List of 1150 Free Courses

1150 free online courses from the world’s leading universities — Stanford, Yale, MIT, Harvard, Berkeley, Oxford and more. Over 30,000 hours of free audio and video.

Findlectures: faceted index to thousands of hours of free online lectures

A curated database of free lectures, over 20,000 hours of audio,

[Edited from Photomedia Forum post by T.Neugebauer from 2006-2016 ]

information retrieval: relevance, pertinence, precision and recall

The relevance of information in relation to some question was defined in the late 1950s when the Cranfield test was developed at the Cranfield College of Aeronautics . The two measures that were developed are precision and recall.


The extent to which information retrieved in a search of a library collection or other resource, such as an online catalog or bibliographic database, is judged by the user to be applicable to (“about”) the subject of the query. Relevance depends on the searcher’s subjective perception of the degree to which the document fulfills the information need, which may or may not have been
expressed fully or with precision in the search statement. Measures of the effectiveness of information retrieval, such as precision and recall, depend on the relevance of search results.

Compare with pertinence.

In information retrieval, the extent to which a document retrieved in response to a query actually satisfies the information need, depending on the user’s current state of knowledge–a narrower concept than relevance. Although a document may be relevant to the subject of the inquiry, it may already be known to the searcher, written in a language the user does not read, available in a format the reseacher is unable or unwilling to use, or unacceptable for some other reason.

In information retrieval, a measure of search effectiveness, expressed as the ratio of relevant records or documents retrieved from a database to the total number retrieved in response to the query;

Compare with recall.

recall In information retrieval, a measure of the effectiveness of a search, expressed as the ratio of the number of relevant records or documents retrieved in response to the query to the total number of relevant records or documents in the database;One of the main difficulties in using recall as a measure of search effectiveness is that it can be nearly impossible to determine the total number of relevant records in all but very small databases.

source: ODLIS: Online Dictionary for Library and Information Science


Fairthorne, Robert A. in “The Symmetries of Ignorance” distinguishes between two kinds of aboutness, extensional and intentional:

Robert Fairthorne writes: “The problem of helping those who are ignorant, in detail, of what people have said about things, is therefore solved by defining ‘aboutness’ in extension. That is by listing the things that are mentioned in a document. . . .” […]
(1) extensional “aboutness” takes into account the environment of the use and the production of a document (thus it is a relation, not an attribute);
and (2) intentional “aboutness,” which clearly cannot be determined from the study of the text alone: “It entails knowledge of how it is going to be used by what class of readers.”The Role of Classification in Subject Retrieval in the Future by Rolland-Thomas, Paule

[Photomedia Forum post by T.Neugebauer from Jan 13, 2007 ]

Access 2006 Conference in Ottawa

I recently participated in the Access 2006 Conference in Ottawa

Common touchstones at the conference include:* customized web applications and search interfaces
* open source software
* national and provincial consortiae initiatives
* information policy
* digital media
* library catalogue innovations
* end user searching behaviours
* metadata

source: what is Access?

Some notes from the CARL Preconference on Institutional Repositories

Benefits of institutional repositories include: impact, visibility, and reputation.

CARL Harvester, CARLCore is unqualified Dublin Core.
OAI-PMH metadata protocol = Deposit -> metadata generation -> aggregations -> end user
URIOpenURL vs. DOICrossref (proprietary)
International Federation of Library Associations and Institutions (IFLA) – search for institutional repositories

University of Toronto:
University of Toronto’s Knowledge Media Design Institute – virtual institute
Project Open Source | Open Access (

Open Access Examples:
Public Knowledge Project
Data Liberation Initiative

An online portal to full text anthropological resources, AnthroSource offers AAA members access to 40,661 articles in AAA journals, newsletters, bulletins, and monographs; a linked, searchable database containing past, present, and future AAA periodicals; centralized access to a wealth of other key anthropological resources, including text, sound, and video; and interactive services to foster communities of interest and practice throughout the discipline.


Bioline International – a not-for-profit electronic publishing service committed to providing open access to quality research journals published in developing countries
Journal of Medical Internet Research
National e-Science Centre (UK)

Examples from Europe:
University of Glasgow ePrints Service
Queensland University of Technology
CERN Document Server

Institutional Repository Software:
Fedora – Fez (web interface to Fedora) and other tools

notes from Access 2006

Canada and Ontario:
Canadian Initiative on Digital Libraries
Ontario Scholars Portal

Library Enterprise System vs. ILS
– refinement and knowledge discovery
Canadian Research Knowledge Network (CRKN)

Increase in cost, paying for publishing and access > Biomed Central

>PrestoSpace – AV Materials
>Building resources for Integrated Cultural Knowledge Services (BRICKS)
>TEL (European Library)

Lucene – indexing search engine, archives, multiple metadata (EAD, DC, Fulltext), good at merging indexes. Solr – open source search server based on Lucene. Example: National Adult Literacy Database.

Endeca – NCSU catalog (Endeca for faceted browse, relevance ranking).

Search comforts: spell (did you mean?), stemming, sort options
Search + browse: layered facets, filter across multiple dimensions, facet deselection, relevance, speed, locally managed, persistent parameters

Cocoon – XML publishing framework
Ruby on Rails – agile web development
LizardTech – for MrSID and JPEG 2000 images

Collex –

a set of tools designed to aid students and scholars working in networked archives and federated repositories of humanities materials: a sophisticated COLLections and EXhibits mechanism for the semantic web


University of Victoria -> backup catalog using PHP – Yaz

a tiny HTTP API for the few basic operations necessary to copy discrete, identified content from any kind of web application


XML Databases – alternative: SQL + Lucene

XML Catalogues / Library 2.0 – “an architecture of participation”
>eXtensible Catalog (XC) – an open-source online system that will unify access to traditional and digital library resources
>TalisKeystoneresourcestalk.talis.comdirectory.talis.comdevelopment networkProject Cenote
>Library Thing (beta)
>Library 2.0 Wiki

>LibX Firefox

Web services
xISBN service
Amazon APIAmazon Elastic Compute Cloud (Amazon EC2) – Limited Beta
Google API

> Service Oriented Architecture (SOA) – more enduring and flexible, reusable (sustainable?)
– BPEL (OASIS standard) for expression of complex processes. Active BPEL (Open Source), Active BPEL Designer (visual designer).
– services invoked with SOAP
– orchestration exposed with WSDL

> UK -> Structured Vocabularies for IR (thesauri, ontologies, etc.) > British Standard – BS8723
Controlled Vocabularies: LCSH, Rameau (Fr), SWD (de)
e-Framework for Education and Research – an initiative by the UK’s Joint Information Systems Committee (JISC) and Australia’s Department of Education, Science and Training (DEST)
Digital Library Federation – DLF Service Framework for Digital Libraries
NISO MetaSearch Initiative
NISO RP-2006 – Best Practices for Designing Web Services in the Library Context (PDF)
SOA in higher education
DELOS – Network of Excellence on Digital Libraries

Discussion lists:

>Bibliothek Hamburg

Book reference:
Putting Content Online: a practical guide for libraries” by Mark Jordan

[Photomedia Forum post by T.Neugebauer from Oct 18, 2006]

Nondeterministic Turing Machines

In theoretical computer science, there is a theorem which states that all nondeterministic Turing machines (NTM) have an equivalent deterministic Turing machine (DTM). NTMs differ from DTMs in that the former allow for the possibility of more than one next state from a given configuration.

If there is more than one next move, we do not specify which next move the machine makes, only that it chooses one such move.

source: Computability and Complexity Theory, Steven Homer & Alan Selman, p.31

The proof for the theorem that NTMs have equivalent DTMs is through construction: the DTM builds NTM’s computation tree and then performs a breadth-first search on this tree. I was never convinced by this proof. If you take time into consideration, and the fact that NTM’s computation tree approaches infinity in size due to the size of the option set from which it ‘chooses’ at each step, you get a search that takes the DTM forever (or almost forever) to complete (which, to borrow Douglas Bridges’ expression, “does not extend to an assurance that you will find the desired term before the end of the universe”). I remain unconvinced that NTMs are equivalent to DTMs.

[Photomedia Forum post by T.Neugebauer from Jun 20, 2006]

Europeana – ideas, inspiration, culture is a collaboration between universities, research institutes and content providers. It was launched this year as a beta, and is scheduled to be available as a release in 2010.

The site includes a link to a prototype of the Europeana semantic search, as well as a functional beta Timeline navigator, communities and more.

Europeana links you to 4 million digital items including images, texts, sounds and videos from museums and galleries, archives, libraries and audio-visual collections. The list of organizations contributing content includes the Rijksmuseum in Amsterdam, the British Library in London and the Louvre in Paris.

[Edited from Photomedia Forum post by T.Neugebauer from Aug 07, 2009]

Art and Architecture Thesaurus now available as Linked Open Data

It was informally announced during the 2013 LODLAM Summit in Montreal last year, and the official announcement was made today by Jim Cuno, the President and CEO of the Getty –

Getty Vocabularies, the Art and Architecture Thesaurus (AAT), is now available as Linked Open Data. The dataset is available at under an Open Data Commons Attribution License (ODC BY 1.0).

The SPQRQL endpoint and the documentation is found here:

Over the next 18 months, The Research Institute’s other three Getty Vocabularies – The Getty Thesaurus of Geographic Names (TGN)®, The Union List of Artist Names®, and The Cultural Objects Name Authority (CONA)® will all become available as Linked Open Data.

For general information about our Linked Open Data project see

The open availability of these valuable data sets is great news for developers working with cultural data.

[Photomedia Forum post by T.Neugebauer from Feb 23, 2014 ]