Assumption - Rate of photo acquisition is fast compared to the rate of movement of people, which implies we can find a crowd of people at the same location in multiple images.
Challenges - Large amount of people in one image, low resolution, change of pose of target person.
The problem is solved by considering both
- Visual appearance
- Contextual Cues, which takes the co-appearance of people and time-stamp of images into account
Visual appearance -
The visual appearance is captured by part based approach using pixel color as the visual feature.
Upon input, user will specify a person using two vertical points, along with body parts of the person. The "body part model" is the color model of the body part, which is a classifier that discriminates whether a pixel belongs to the body part.
The candidate location of the person in other images is located by projecting the two points into target image. For every candidate location, a score is calculated using the learnt body part model, where the score is the number of positive classified pixel. In a particular target image, the score of the candidate location with highest score is taken as the score of the image.
Contextual Cues -
Contextual Cues are based on two observation
- If a group of people appears together in several images, they are likely to appear together in other images
- Images taken from the same place in a short period of time tend to contain the same group of people
沒有留言:
張貼留言