spooky's blog: Aggregating local descriptors into a compact image representation

This paper has two main contributions

VLAD - A new image descriptor based on local feature -
Based on SIFT local descriptor, VLAD aggregate local descriptors into a fixed-dimension representation for image.
The idea is an outgrow of BOF representation, but it retains the structure of SIFT descriptor. Unlike BOF representation that aggregate SIFT descriptors into a single number using tf-idf manner, VLAD describe each "Visual Word" by accumulating the difference of local descriptors with the centroid of the visual word. Experiment result shows that it outperforms BOF representation in image retrieval when they are comparable in size.
VLAD can also be viewed as a simplification of Fisher kernel, which utilize GMM model to describe "Visual Word".

PCA-ADC approach to "compress" descriptor -
Vector quantization is a widely used approach to reduce the size of descriptors.
ADC approach reduce the vector quantization complexity by dividing a vector into multiple small vectors and quantize each of them separately. The approach is combined with PCA, which is widely used in dimension reduction, to further reduce the data size.
The paper makes a formal analysis about the error caused by these descriptor compression methods, and concluded that the performance loss is acceptable. The analysis is supported by its experiment.

spooky's blog