2011年6月1日 星期三

Fast Concurrent Object Localization and Recognition

Goal - Given a set of candidate objects and an image, one want to simultaneously locate and recognize the object in reasonable time.

Object recognition and object localization are two important problem in computer vision. Existing method, although effective and efficient for a particular problem, can't solve them concurrently effectively as in human vision.

For object localization, the problem for applying existing method is that they handle only one single object at a time. Therefore, linear search over all possible objects is necessary, making it infeasible when the number of candidate object is large, which is very likely when we want to apply it to real application.
For object recognition, existing method improve the recognition speed with an implicit assumption that localization is either given or irrelevant, making them not suitable to apply to object localization.

In this work, local feature of image is used. The key for speed up concurrent localization and recognition (CLR) lies in the following observation: while multiple local features are used for localizing and recognizing an object, these local feature points can be processed independently and then aggregate together using a linear aggregating function. This means the local feature points can be processed in an incremental or decremental manner.

Definition -
Subspace: Defined by a five candidate set {X+,X-,Y+,Y-,C}.
Candidate Set: X+ and Y+ are the possible feature point for the upper bound of bounding box, and X-, Y- are for the lower bound. C is the set of possible label.
Upper Bound of Subspace: The highest prediction score of a subspace. The score is that of a subset of local feature on a possible label of the subspace.

The algorithm works as follow -
For each iteration, the subspace with highest upper bound is processed. The subspace is split into two, where the largest candidate set is split. That is, suppose Sc is the target subspace and Sc = {Xc+,Xc-,Yc+,Yc-,Cc}, the largest candidate set Ac will be pick up and split into Ac+ and Ac-. For Xc and Yc, they are split horizontally or vertically in the coordinate. For Cc, it is split according to the prediction score of each label, where those with higher score are in one set and those with lower score in the other. Sc is then split into two (denote Sc+ and Sc-) according to the split of candidate set, where all points in Ac+ is removed from all candidate set in Sc- and vice versa. The upper bound of Sc+ and Sc- are then computed, and go to the next iteration. The algorithm stop when the subspace with highest score can't be divided.

沒有留言:

張貼留言