The image and video collection process is conducted by several autonomous Web agents or spiders. The agents traverse the Web by following the hyperlinks between documents. They detect images and videos, download and process them and add the new information to the catalog. The overall collection process, illustrated in Figure 1, is carried out by several distinct spiders: (1) Spider 1 - assembles lists of candidate Web pages that may include images, videos or hyperlinks to them, (2) Spider 2 - extracts the URLs of the images and videos, (3) Spider 3 - retrieves and analyzes the images and videos.
Figure 1: Image and video gathering process via three spiders.
The first phase of the process consists of the two spiders that traverse the Web looking for images and videos, as illustrated in Figure 2. Starting from seed URLs, Spider 1 follows a breadth-first search across the Web. It downloads pages via the Hypertext Transfer Protocol (HTTP) protocol and passes the Hypertext Markup Language (HTML) code to Spider 2. In turn, Spider 2, detects new URLs, encoded as HTML hyperlinks, and adds them back to the queue of Web pages to be downloaded by Spider 1. In this sense, Spider 1 is similar to many of the conventional spiders or robots that follow hyperlinks in some fashion across the Web. [7].
Figure 2: Spider 1 and Spider 2 traverse the Web and assemble lists of URLs of images and videos.
Spider 2 detects all hyperlinks in the Web documents and converts the relative URLs to absolute addresses. By examining the types of the hyperlinks and the filename extensions of the URLs, Spider 2 assigns each URL to one of several categories: image, video or HTML. The mapping between filename extensions and Web object type is given by the Multipurpose Internet Mail Extensions (MIME) content type labels, as illustrated in Table 1.
In the second phase, the list of image and video URLs from Spider 2 is input into Spider 3. Spider 3 retrieves the images and videos, processes them and adds them to the catalog. Three important functions of the Spider 3 are to
Figure 3: Spider 3 processes each image/video.
For images, the coarse versions are obtained by simply subsampling and compressing the originals where the compression format, either JPEG or GIF, is chosen to match the original image format. For video, the coarse versions are generated by subsampling the original video both spatially and temporally. The temporal subsampling is achieved in a two step process: first, one frame is kept every one second of video. Next, scene change detection is performed on the frames to detect the key frames of the sequence [8]. This allows for the elimination of duplicate scenes in the coarse version. Finally, the video is re-animated from the key frames and packaged as an animated GIF file. Upon retrieval from a query, the coarse videos appear to the user as animated samples of the original video.