next up previous
Next: SearchBrowse and Retrieval Up: Searching for Images and Previous: Image and Video Collection

Subject Classification and Indexing

 

Utilization of text is essential to the cataloging process. In particular, every image and video on the Web has a unique Web address and possibly other HTML tags, which provide for valuable interpretation of the visual information. We process the Web addresses, or URLs, and HTML tags in the following ways to index the images and videos:

Text Processing

Images and videos are published on the Web in two forms: inlined and referenced. The HTML syntax differs in the two cases. To inline, or embed, an image or video in a Web document, the following code is included in the document: <img src=URL alt=[alt text]>, where URL gives the relative or absolute address of the image or video. The optional alt tag specifies the text that may appear in place of the image or video when the browser is loading the image/video or has trouble finding or displaying the visual data. Alternatively, images and videos may be referenced from parent Web pages using the following code: <a href=URL>[hyperlink text]</a>, where the optional [hyperlink text] provides the high-lighted text that describes the object pointed to by the hyperlink, in this case, an image or video.

Term Extraction

The terms are extracted from the image and video URLs, alt tags and hyperlink text by chopping the text at non-alpha characters. For example, the URL of an image or video has the following form

displaymath1232

where [...] denotes an optional argument. For example, several typical URLs are

displaymath1233

Terms are extracted from the directory and file strings using tex2html_wrap_inline1240 and tex2html_wrap_inline1242 where

displaymath1234

where tex2html_wrap_inline1244 . For example,

displaymath1235

For one, the terms allow text-based searching via string-matching. After extracting the terms, the system indexes the images and videos directly using inverted files. The process of file-inversion is illustrated in Tables 2. For example, if the user enters the query term ``animal'', the images and videos with IMID = 259503 and 106441 are retrieved, respectively. In addition, certain terms, key-terms, are used to map the images and videos to subject classes, as we explain shortly.

 

IMID Terms
121216 nasa, clipart
259503 animal, dog
151285 astronomy, nasa
106441 animal, clipart
Table 2:   File inversion of terms for images/videos.
Terms IMID
animal 259503, 106441
astronomy 151285
clipart 121216, 106441
dog 259503
nasa 121216, 151285

Directory Name Extraction

A directory name is a phrase extracted from the URLs that groups images and videos by location on the Web. The directory name consists of the directory portion of the URL, namely, tex2html_wrap_inline1246 . For example, tex2html_wrap_inline1248 . The directory names are also used by the system to map images and videos to subject classes.

Key-term Dictionary and Directory Name to Subject Mappings

A key-term is a manually identified term that corresponds to one or more subject classes. The key-term dictionary contains the set of key-terms and their corresponding mappings to subject classes. We build the key-term dictionary in a semi-automated process. In the first stage, the term histogram for the image and video archive is computed. Then the terms are ranked by frequency and are presented for manual assessment. Ranking the terms in order of highest frequency prioritizes them for inspection. The goal of the manual assessment is to determine if a term can be assigned to the key-term dictionary. To make the decision, we consider the descriptive ability of the term and its possible correspondence to one or more subject classes. Terms with multiple meanings make poor key-terms. For example, the term ``rock'' is a not a good key-term due to its possible disparate references to either stone, or rock music, or several other things. Once a term and its mappings are added to the key-term dictionary, it applies to all existing and new images and videos.

 

Non-descriptive
term count
image 86380
gif 28580
icon 14798
pic 14035
img 14011
graphic 10320
picture 10026
small 9442
art 8577
gallery 6989
thumb 6669
Table 3:   Sample (a) term counts and (b) key-terms, counts and subject mappings for 500,000 images and videos.
Descriptive key-terms and mappings
key-term count mapping to subject
planet 1175 astronomy/planets
music 922 entertainment/music
texture 831 graphics/textures
aircraft 458 transportation/aircraft
travel 344 travel
astronomy320 astronomy
gorilla 273 animals/gorillas
starwars 204 entertainment/movies/films/starwars
soccer 195 sports/soccer
dinosaur 180 animals/dinosaurs
porsche 139 transportation/automobiles/porsches
(a)(b)

From the initial experiments of cataloging 500,000 images and videos, the terms listed in Table 3 are a sample of those extracted. Notice in Table 3(a) that some of the most common terms are not sufficiently descriptive of the visual information, i.e., terms ``image'', ``picture''. However, the terms in Table 3(b) clearly indicate the subject of the images and videos, i.e., terms ``aircraft'', ``gorilla'', ``porsche''. These key-terms are extremely useful for classifying the images and videos into subject classes. For example, we added the key-terms and corresponding subject mappings illustrated in Table 3(b) to the key-term dictionary.

In a similar process, the directory names are inspected and manually mapped to subject classes. Very often an entire directory of images/videos corresponds to a particular topic and can be mapped to one or more subject classes. Similar to the process for key-term identification, the system computes the histogram of directory names and presents it for manual inspection. A directory that sufficiently groups images and videos related to a particular topic is then mapped to the appropriate subject classes.

In Section 6.1, we demonstrate that these methods of key-term and directory name identification and subject mapping provide excellent performance in classifying the images and videos by subject. We also hope that by incorporating some results of natural language processing [9], in addition to using visual features, we can further improve and automate the subject classification process.

Image and Video Subject Taxonomy

A subject class or subject is an ontological concept that represents the semantic content of an image or video, i.e., ``basketball''. A subject taxonomy is an arrangement of subject classes into an is-a hierarchy. We are developing a new subject taxonomy for image and video subject matter, a portion is illustrated in Figure 4, in the process of inspecting the terms for key-term mappings, as described above. For example, when a new and descriptive term, such as ``basketball'' is detected and added to the key-term dictionary, we add a corresponding subject class to the taxonomy if it does not already exist, i.e., ``sports/basketball''.

 

figure222


Figure 4:   Portion of the image and video subject taxonomy.

Catalog Database

  

As described above, each retrieved image and video is processed and the following information tables are populated:

displaymath1256

where special (non-alphanumeric) data types are given as follows:

displaymath1257

The automated assignment of TYPE to the images and videos using visual features is explained in Section 5.2. Queries on the database tables: IMAGES, TYPES, SUBJECTS and TEXT are performed using standard relational algebra. For example, the query: Give me all records with TYPE = ``video'', SUBJECT = ``news'' and TERM = ``basketball'' can be carried in SQL as follows:

SELECT IMID
* FROM TYPES, SUBJECTS, TEXT
* WHERE TYPE = ``video'' AND SUBJECT = ``news'' AND TERM = ``basketball''.

However, content-based queries, which involve table FV, require special processing, which is discussed in more detail in Sections 4.2 and 5.


next up previous
Next: SearchBrowse and Retrieval Up: Searching for Images and Previous: Image and Video Collection

John Smith
Fri Aug 16 11:09:46 EDT 1996