Photography Media Journal
ISSN 1918-8153

Blog|Journal | Gallery|Contact|Site map|About 

Print version (full-text)

Image Indexing

by: Tomasz Neugebauer

March 2005

page: 1 of 3

next page

“It is my intention to present—through the medium of photography—intuitive observations of the natural world which may have meaning to the spectators.”[*]
Ansel Adams (1902-1984)


The theoretical difficulties in indexing images include: 1) images do not satisfy the requirements of a language whereas textual materials do 2) images contain layers of meaning that can only be converted into textual language using human indexing 3) multi-disciplinary nature of the images where the terms assigned are the only access points. Theoretical foundations for image indexing consist of distinctions between classes of terms including ‘of’ and ‘about’, syntactic and semantic, specific and generic, and answers to the questions ‘who?’, ‘what?’, ‘where?’, and ‘when?’. Content-based indexing can be used to generate terms for the color, texture and basic spatial attributes of images. Image searchers use textual descriptor search terms that require human description-based indexing of the semantic attributes of images.

"To the complaint, 'There are no people in these photographs,' I respond, 'There are always two people: the photographer and the viewer.'"
Ansel Adams


Jacobs argues that the importance of providing improved image access has increased due to the democratization of images as legitimate sources of information and education. Managed use of images “is no longer the fairly elite domain of art historians or specialized archives.” (Jacobs 119) The popularization of the image-rich World Wide Web as a communication and education medium certainly confirms this. Efficient image search tools by Google (, MSN ( and Yahoo! ( that index millions of freely available images online is clear evidence that image indexing and searching is common and widespread in today’s visual culture.


The main difference and difficulty in indexing and classifying images as opposed to textual materials is that images do not satisfy the requirements of a language. Nelson Goodman defines the requirements of a language as satisfying at least “the syntactic requirements of disjointness and differentiation. “ (Goodman 226) Disjointness is the requirement that “no mark may belong to more than one character” (Goodman 133) whereas differentiation is defined as “For every two characters K and K’ and every mark m that does not actually belong to both, determination either that m does not belong to K or that m does not belong to K’ is theoretically possible.” (Goodman 136) It is especially this requirement of differentiation that is not satisfied in the case of images as opposed to text: “Nonlinguistic systems differ from languages, depiction from description, the representational from the verbal, paintings from poems, primarily through the lack of differentiation – indeed through density (and consequent total absence of articulation) – of the symbol system.” (Goodman 226) The disjointness and differentiation of textual language systems allows for simple decomposition into its component symbols: letters, words, paragraphs, and this represents at least a syntactic ease in indexing textual material. The semantic ambiguities inherent in indexing the meaning of a text ensure that indexing remains an art form without absolute answers, but at least the syntax lends itself to analysis. This is not the case with the syntax of images. Thus, as Jacobs observes, images “are usually absorbed holistically by the viewer.” (Jacobs 120)

Furthermore, as Besser points out, textual material is usually written with a clearly defined purpose that is explicitly stated, summarized and abstracted by the publishers in introductions, prefaces and book covers, whereas images are not (Besser 788). Images are “decidedly multidisciplinary in nature: they contain a variety of features, each of which may be of potential interest to researchers from somewhat diverse fields of study” (Baxter & Anderson). For example:

Short Description:  
Long Description:

A photograph such as the one above may be of interest to a student of sculpture, but it may also be “useful to historians wanting a snapshot of the times, to architects looking at buildings, to urban planners looking at traffic patterns or building shadows, to cultural historians looking at changes in fashion, to medical researchers looking at female smoking habits, to sociologists looking at class distinctions, or to students looking at the use of certain photographic processes or techniques” (Besser 788) This requires a high number of access points for each image, depending on the audience, purpose of the database, and the user characteristics. The cultural historian will not be using attributes such as composition, lighting and perspective, whereas a student of art history might. These difficult requirements are compounded by the symbolic and allegorical meaning of images that is highly subjective and interpretive, leading to low levels of interindexer consistency (Baxter & Anderson).

Presumably when indexing images for a particular database, we have a notion of who the end-users are so that we can try to anticipate the most appropriate access points for each image. However, image collections are especially multidisciplinary and targeting the indexing for a subset of the end-users truly excludes the others. The image is not textually expressive in itself and the assignment of terms is a purely interpretive exercise by the indexer that ends up being the most important access point for retrieval by the user. In the case of textual information, the user can always switch to natural language full-text searching if frustrated by exclusive descriptors, whereas this option is simply not available in image searching.

in this section: