Digital humanities text analysis tools

Distant Reading & Text Analysis

The Versioning Machine (http://v-machine.org/) is a framework and an interface for displaying multiple versions of text encoded according to the Text Encoding Initiative (TEI) Guidelines

Voyant Tools (https://voyant-tools.org/) web-based reading and analysis environment for digital texts.

Twine (http://twinery.org/) an open-source tool for telling interactive, nonlinear stories. You don’t need to write any code to create a simple story with Twine, but you can extend your stories with variables, conditional logic, images, CSS, and JavaScript when you’re ready.

Spoken audio analysis tools

Open  Source

WaveSurfer (https://sourceforge.net/projects/wavesurfer/)is an open source tool for sound visualization and manipulation. Typical applications are speech/sound analysis and sound annotation/transcription. WaveSurfer may be extended by plug-ins as well as embedded in other applications. http://www.speech.kth.se/wavesurfer/

Praat: doing phonetics by computer (http://www.fon.hum.uva.nl/praat/)

ELAN

Gentle (http://lowerquality.com/gentle/Forced aligners are computer programs that take media files and their transcripts and return extremely precise timing information for each word (and phoneme) in the media. Drift (http://drift-demo.lowerquality.com/upload ) output: pitch and timing.  It samples what human listeners perceive as vocal pitch.

Kaldi (http://kaldi-asr.org/)  a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.

SonicVisualizer (http://www.sonicvisualiser.org/) an application  for viewing and analysis of contents of music audio files.

Audacity (http://www.audacityteam.org/)  a free, easy-to-use, multi-track audio editor and recorder for Windows, Mac OS X, GNU/Linux and other operating systems.

SIDA (https://github.com/hipstas/sida) Speaker Identification for Archives. Includes a notebook that walks through the steps of training and running a classifier that takes speaker labels and the audio, extracts features (including vowels), and trains a model and runs it.

Audio Labeler (https://github.com/hipstas/audio-labeler) An in-browser app for labeling audio clips at random, using Docker and Flask

ARLO (https://sites.google.com/site/nehhipstas/documentation) was developed for classifying bird calls and using visualizations to help scholars classify pollen grains. ARLO has the ability to extract basic prosodic features such as pitch, rhythm and timbre for discovery (clustering) and automated classification (prediction or supervised learning), as well as visualizations. The current implementation of ARLO for modeling runs in parallel on systems at the National Center for Supercomputing Applications (NCSA). The source code for ARLO is open-source and will be made available for research purposes for this and subsequent projects on sourceforge at http://sourceforge.net/projects/arlo/.

Not open source, but available for academic use:

STRAIGHT (http://www.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html) a tool for manipulating voice quality, timbre, pitch, speed and other attributes flexibly. It is an always evolving system for attaining better sound quality, that is close to the original natural speech, by introducing advanced signal processing algorithms and findings in computational aspects of auditory processing.

STRAIGHT decomposes sounds into source information and resonator (filter) information. This conceptually simple decomposition makes it easy to conduct experiments on speech perception using STRAIGHT, the initial design objective of this tool, and to interpret experimental results in terms of huge body of classical studies.

Online Services:

 

Pop Up Archive (https://www.popuparchive.com/) is a platform of tools for organizing and searching digital spoken word. Processing sound for a wide range of customers, from large archives and universities to media companies, radio stations, and podcast networks. Drag and drop any audio file (or let us ingest your RSS, SoundCloud, or iTunes feed), and within minutes receive automatically generated transcripts and tags. 

Welcome to the Photomedia BLOG

The FORUM on this site has been shut down, but my posts to the forum have been migrated/archived to this blog.  These posts are organized under the “Forum Archive” category, further classified by year of the post, and the original category, like “Film” or “Design“.  At the end of each post that was migrated over, there is a statement with the original date of the post, for example:

[Photomedia Forum post by T.Neugebauer from Feb 22, 2012 ]

The forum was initially created by customizing an early version of the open source phpBB software, around 2005. More than 12 years ago, web publishing/content management systems were still relatively new and social media platforms like Facebook and Twitter were just beginning to come into existence.  The motivation for a Forum rather than a blog at that time was the possibility of managing discussion threads.  However, I ended up using the Forum as a blog more than a discussion Forum.  I’m making the transition official now, welcome to the Photomedia BLOG!

Science/tech/university podcasts

EDX -Founded by Harvard University and MIT in 2012, edX is an online learning destination and MOOC provider, offering high-quality courses from the world’s best universities and institutions to learners everywhere.

University of Bath in the UK has this Podcasts of Public Lecture Series.

Nature

Nature has its podcast archive.

Stanford on iTunes

http://itunes.stanford.edu/ – Faculty lectures, interviews, music and sports.

University of Washington’s Cryptography Course

CSE P 590TU – Practical Aspects of Modern Cryptography – Plus related lecture slides and video archives.

Computer Science video lectures

University of Washington:
Programming Languages + Course website
Applications of Artificial Intelligence + Course website
Computer Architecture + Course website

ArsDigita University (curriculum):
Web Applications
Structure and Interpretation of Computer Programs
Object Oriented Program Design
Algorithms
Theory of Computation
Artificial Intelligence

 

 Scientific American

Scientific American (podcasts) from a popular science magazine,

the oldest continuously published magazine in the U.S., […] bringing its readers unique insights about developments in science and technology for more than 150 years.

source: sciam

more University podcasts

MIT OpenCourseWare

Princeton University: WebMedia – Lectures

Tufts OpenCourseWare

Rice University: Live Webcasts & Archives

University of British Columbia Podcasts

University of Warwick podcasts

Utah State University OpenCourseWare

 

Openculture Master List of 1150 Free Courses

1150 free online courses from the world’s leading universities — Stanford, Yale, MIT, Harvard, Berkeley, Oxford and more. Over 30,000 hours of free audio and video.

http://www.openculture.com/freeonlinecourses

Findlectures: faceted index to thousands of hours of free online lectures

A curated database of free lectures, over 20,000 hours of audio, http://findlectures.com/

[Edited from Photomedia Forum post by T.Neugebauer from 2006-2016 ]

Nobel Prize in Physics for Inventors of CCD

Willard Boyle and George Smith received the Nobel prize for their part in the invention of the charge-coupled device (CCD), the light detector used in digital cameras. The invention goes back to 1969. They used a metal oxide semiconductor to convert photons into a flow of electrons.

Smith and Boyle have already received the C&C prize for their invention in 1999. A press release from Bell Labs describes the CCD:

The device they invented stores information, represented by discrete packets of electric charge, in columns of closely spaced semiconductor capacitors. With multiple columns side by side, a CCD chip can record images. Reading out the information – for processing, display, or more permanent storage – is accomplished by shifting stored charges down the columns, one position at a time. The CCD’s sensitivity to light, coupled to this method of storing and reading out information, makes it a versatile and robust optical detector.

By 1970, the Bell Labs researchers had built the CCD into the world’s first solid-state video camera. In 1975, they demonstrated the first CCD camera with image quality sharp enough for broadcast television.

source: Bell Labs, September, 1999

The Nobel Lectures in Physics will be held on Tuesday, 8 December 2009, at the Aula Magna, Stockholm University and the lectures will be published at http://nobelprize.org/

Unfortunately, I could not find the the original 1970 paper online:

W.S. Boyle and G.E. Smith. “Charge Coupled Semiconducting Devices” Bell Sys. Tech. J. 49 (April, 1970). p.387-595

[Photomedia Forum post by T.Neugebauer from Oct 06, 2009  ]

Missing lunar camera footage from 1969 Moon walk

The lunar camera footage recorded in July of 1969 of the first Moon walk had to be converted for the live television broadcast which degraded the images. It was first reported in 2006 by NPR that the original higher quality footage preserved by engineers on tapes were missing, triggering a 3 year search by NASA. The result of the search showed that the tapes are permanently gone, likely overwritten:


And the agency was experiencing a critical shortage of magnetic tapes. So NASA started erasing old ones and reusing them.

That’s probably what happened to the original footage from the moon that the astronauts captured with their lunar camera, says Lebar. It was stored on telemetry tapes, and old tapes with telemetry data were being recycled.

“So I don’t believe that the tapes exist today at all,” says Lebar. “It was a hard thing to accept. But there was just an overwhelming amount of evidence that led us to believe that they just don’t exist anymore. And you have to accept reality.”

NPR – Houston, We Erased The Apollo 11 Tapes

To see the results of NASA’s restoration efforts based on material that they were able to find, go to http://www.nasa.gov/multimedia/hd/apollo11.html

[Photomedia Forum post by T.Neugebauer from Jul 27, 2009  ]

 

Science 2.0?

The CTWatch article The Coming Revolution in Scholarly Communications & Cyberinfrastructure predicts dramatic changes to peer-review as a result of the web


For all but a very small number of widely read titles, the day of the print journal seems to be almost over. Yet to see this development as the major impact of the web on science would be extremely narrow-minded – equivalent to viewing the web primarily as an efficient PDF distribution network. Though it will take longer to have its full effect, the web’s major impact will be on the way that science itself is practiced.

The list of references of the above article contains links to many of the new scientific communications applications.

Are social web applications capable of transforming the way in which peer-review is carried out?

The following are references to recent articles about the social web applications in science:

Science 2.0: Great New Tool, or Great Risk? Wikis, blogs and other collaborative web technologies could usher in a new era of science. Or not. By M. Mitchell Waldrop

Science happens not just because of people doing experiments, but because they’re discussing those experiments,” explains Christopher Surridge, editor of the Web-based journal, Public Library of Science On-Line Edition (PLoS ONE). Critiquing, suggesting, sharing ideas and data–communication is the heart of science, the most powerful tool ever invented for correcting mistakes, building on colleagues’ work and creating new knowledge. And not just communication in peer-reviewed papers; as important as those papers are, says Surridge, who publishes a lot of them, “they’re effectively just snapshots of what the authors have done and thought at this moment in time. They are not collaborative beyond that, except for rudimentary mechanisms such as citations and letters to the editor.


Scholarship 2.0: An Idea Whose Time Has Come

[Photomedia Forum post by T.Neugebauer from Mar 25, 2008 ]

Information theory

Fairthorne’s theory of notification is an elegant example of a theory in information science

Fairthorne’s theory of notification clarifies the foundations of information science. He defined ‘notification’ as ‘mention and delivery of recorded messages to users’, listing as the main elements of library operations: (1) Source (e.g., authors), (2) Code (e.g., language of a book), (3) Message (the signal), (4) Channel (e.g., microfilms), (5) Destination (e.g., reader) and (6) Designation (subject description).

Nitecki, Joseph Z. 1995. Philosophical Aspects of Library Information Science in Retrospect.

 


Variables

The scope of our activities and studies lie inside Discourse but outside Signaling, i.e., outside the scope of Shannon’s Information Theory. The variables involved are, in general terms, Source, Destination, Designation, and Message, Channel, Code. In the present context a Code is a symbol system used to indicate choices made from a set of Messages, and represented by patterns of physical events (signals or inscriptions) consistent with the physical mode and conditions of communication, the Channel, in the given social and physical environment.
Formally the Message set is adequately defined as an agreed finite set of distinct identifiable entities, from which choices are made by Sources. Here we regard it also as drawn from what can be told in a given recorded language. The Sources are those within the given environment who tell it, in the sense of being agreed and identifiable publishers, distributors, organizations, or accepted authors. The latter need not be actual authors. From the present point of view the works of Shakespeare, or of anonymous authors, are those records that tile local retrieval tools attribute to “Shakespeare,” or to “anon.” Tile Destinations are those within the given environment who are to be told, or wish to be told. They must be identifiable, but otherwise may be organizations, functionaries, groups, or individuals. A set of Designations is assigned to Messages, Sources, or Destinations to characterize them according to what is told, or is to be told. They are aspects of what the particular discourse is “about,” in some operational sense. For example, Subject Indexing assigns topics to the messages; author indexes may be classified by subject matter; Selective Dissemination of Information designates executives according to what they should be told about. Clearly the same set of Designations can be assigned differently according to circumstances. A reader (Destination) may well differ with the author (Source) as to the main interest (Designation) of an article (Message).

source: Morphology of “Information Flow” Robert A. Fairthorne. Journal of the ACM. Volume 14 , Issue 4 (October 1967)

[Photomedia Forum post by T.Neugebauer from Oct 12, 2007  ]

Solovyov’s Meaning of Love

While looking for the online text of Beauty in Nature and The Meaning of Love by Solovyov, I found the various online resources maintained by Michael Lee, a professor in the Department of Psychology of the University of Manitoba. I found Michael Lee’s page while looking for essays by Solovyov, and so I will add that Lee mentions him in a page called “Required Reading for Revolters“. This is what he said about Solovyov’s The Meaning of Love:

 

Solovyov lived from 1853 to 1900. I find him the most profound and prescient Christian theologian and visionary. He believed that romantic love was potentially the instrument for effecting the kind of spiritual transformation that would enable us to attain physical immortality and to realize the Kingdom of God on earth.

[Edited from Photomedia Forum post by T.Neugebauer from Apr 11, 2007  ]

Gaia theory – Earth as a living system

Gaia theory, named after the Greek Earth goddess, is a combination of hypothesis about the planet as a self-regulating living system. James Lovelock is credited with the publication of the first modern scientific article on the Gaia hypothesis in the New Scientist, he also recently published an article in The Independent where he speaks of his latest book The Revenge of Gaiaand warns of the danger of global warming. The philosophical predecessors to Gaia theory include Lewis ThomasTeilhard de ChardinBuckminster Fuller. The most common criticism of Gaia theory is the charge that it is teleological.

Although Gaia theories may seem controversial, the study of the inter-relationships between various life forms and their environment (ecology), homeostasis, and emergent properties are established and accepted.

The attempt to grasp the inter-relationships that exist among all of the components of the biosphere, oceans, geosphere and the atmosphere as an Earth ecosystem is a challenge for the human mind, but the need for scientific hypotheses is evident in an age when we attempt to take responsibility for our effect on the planet.

[Photomedia Forum post by T.Neugebauer from Mar 30, 2006 ]