ISSN 1918-8153
Theoretical Foundations for Content Analysis of Images
The theoretical difficulties in creating metadata schemes for images as opposed to textual documents are related to the properties of the visual medium. Goodman argues that the visual medium does not posses the properties of a language such as “disjointness and differentiation” (Goodman 226). A symbol system is disjoint if an expression (i.e., a mark) can only validly belong to a single character (Goodman 133) and differentiated if it is theoretically possible to determine the character to which a mark belongs to (Goodman 136). Text has an advantage over the visual medium in that its syntax lends itself to analysis: the component words, sentences and paragraphs can be extracted from the content and mapped to a topology or ontology. The components and properties of images, on the other hand, can not be extracted with the same ease, for one, they are rarely explicitly labeled. It is meaningless to consider only the top 10 percent of an image (at least for a human), whereas it makes perfect sense to read the introduction of a paper. The non-linguistic ambiguity of the visual medium makes it difficult to understand what the classes, components and properties (e.g., relation to physical objects) of an image are.
Erwin Panofsky distinguishes between three categories of information in works of art in general: pre-iconography, iconography, and iconology. Pre-iconography functions at the descriptive level of basic generic objects and primary subject matter; iconography is an analytical class which requires knowledge of culture and convention; and iconology is a synthetic level that requires knowledge of art and criticism. (Choi & Rasmussen 2003, 499; Shatford 1986: 43) The pre-iconographic level is divided into the factual (e.g., window, flower, star) and expressional (e.g., anger, sadness, greed). All three levels contain Of and About facets for answers to the questions: who?, what?, when?, and where? (Shatford 1986: 43-53) Layne distinguishes between four facets of attributes that play a role in image retrieval: biographical, subject, exemplified and relationship (Layne 1994). Metadata schemes ought to be able to capture the whole spectrum of subject attributes, content related classes of terms, as well Layne’s biographical (creation and travel history), exemplified (e.g., photograph or poster) and relationship (e.g., preliminary drawing and finished painting) attributes.