In the initial trials, the system has catalogued 513,323 images and videos from 46,551 directories on 16,773 distinct Web sites. The process required several months, which was performed simultaneously with the development of the user application. Various information about the catalog process is summarized in Table 4. In all the system has catalogued over 129 Gigabytes of visual information. The local storage of information, which includes coarse versions of the data and color histogram feature vectors, requires approximately 2 Gigabytes.
As indicated in Table 4, the catalog process assigned 68.23% of the images and videos into subject classes using automated mapping for key-terms and semi-automated mapping for directory names. We assessed the subject classification rates for several classes, which is summarized in Table 5(a). The overall performance is excellent,
classification precision. For this assessment, as illustrated in Table 5(a), we chose the classes at random from the subject taxonomy of 941 classes. We established the ground-truth by manually verifying the subject of each image and video in the test sets.
We observed that errors in classification result from several occurrences: (1) key-terms being used out of context by the publishers of the images or videos, (2) the system's reliance on some key-terms that have multiple meanings and contexts, i.e., ``madonna'' and (3) the system's reliance on key-terms extracted from directory names. For example, in Table 5(a), the precision of subject class ``animals/possums'' is low because five out of the nine items are not images or videos of possums. These items were classified incorrectly because the key-term ``possum'' appeared in the directory name. While some of the images in that directory depict possums, others depict only the forests to which the possum are indigenous. When viewed outside of the context of the ``possum'' web site, the images of forests should not be assigned to the class ``animals/possums.''
We assessed the precision of the automated type classification system, which is summarized in Table 5(b). For this evaluation, both the Training and Test samples consisted of 200 images from each type class. We found the automated type assessment for these five simple classes is quite satisfactory, overall
rate of successful classification. In future work, we will try to extend this system to include a larger number of classes, including new type classes, such as Fractal images, Cartoons, Faces, Art paintings and subject classes.
| Type | Rate |
| Color photo | 0.914 |
| Color graphic | 0.923 |
| Gray image | 0.967 |
| B/w image | 1.000 |
Another important factor in the image and video search system is the speed at which user operations and queries are performed. In particular, as the archive grows it is imperative that queries do not take so long that they inhibit the user from effectively using the system. In the initial system, the overall efficiency of various database manipulation operations is excellent, even on the large catalog, see Table 6 (server platform = SGI Onyx). In particular, the good performance of the content-based visual query tools is given by the strategies of indexing the 166 bin color histograms described in Section 5.1. For example, the system identifies the
most similar visual scenes in the catalog of 513,323 images and videos to a selected query scene in only 1.83 seconds.