spooky's blog: Efficient Visual Search of Videos Cast as Text Retrieval

This paper introduces a method to transform the image retrieval problem into text retrieval problem.

Research in text domain retrieval has a long history and a relatively well studied problem, with many promising techniques available. These techniques have been applied to many application, such as search engine, and are proven to be very success. Image retrieval, on the other hand, is a relatively novel problem and is not as mature as text retrieval. So it will be very helpful if we can utilize the technique of text retrieval in image retrieval, which is the aim of this paper.

The key of the paper is to transform low-level visual feature into "Visual Word". Low-level visual features are first obtained by identifying interesting regions and using SIFT feature descriptor to represent these regions. Visual vocabulary are then build by performing clustering, with the center of each cluster being a "Visual Word", and each feature point extracted is cast into a particular "Visual Word". These visual words are analogy to regular words in text, and techniques in text retrieval can then be applied.

After visual words are obtained, they are processed as if they were words in a text, and the structure of the retrieval system follows that of regular text retrieval system:

Building Vocabulary -
- Term weighting using tf-idf
- Stop word removal
Indexing -
- Inverted file
Retrieval Model -
- Vector space model

The main benefit of the proposed method is its time efficiency. With very simple technique, it can greatly reduce the retrieval time while returning results comparable to pure visual matching method.

spooky's blog

2011年3月2日星期三

Efficient Visual Search of Videos Cast as Text Retrieval

沒有留言:

張貼留言

2011年3月2日 星期三

Efficient Visual Search of Videos Cast as Text Retrieval

沒有留言:

張貼留言

2011年3月2日星期三