2011年5月4日 星期三

Where's Waldo: Matching People in Images of Crowds

Problem - Given a set of images taken from a social event where crowds of people are in each images, the work aims to find the appearance of a particular person in each image after user specified the person in one of the images.

Assumption - Rate of photo acquisition is fast compared to the rate of movement of people, which implies we can find a crowd of people at the same location in multiple images.

Challenges - Large amount of people in one image, low resolution, change of pose of target person.

The problem is solved by considering both
  1. Visual appearance
  2. Contextual Cues, which takes the co-appearance of people and time-stamp of images into account

Visual appearance -
The visual appearance is captured by part based approach using pixel color as the visual feature.
Upon input, user will specify a person using two vertical points, along with body parts of the person. The "body part model" is the color model of the body part, which is a classifier that discriminates whether a pixel belongs to the body part.
The candidate location of the person in other images is located by projecting the two points into target image. For every candidate location, a score is calculated using the learnt body part model, where the score is the number of positive classified pixel. In a particular target image, the score of the candidate location with highest score is taken as the score of the image.

Contextual Cues -
Contextual Cues are based on two observation
  1. If a group of people appears together in several images, they are likely to appear together in other images
  2. Images taken from the same place in a short period of time tend to contain the same group of people
Based on the two observation, the "affinity" of people and images, which is defined using the co-occurence and seperation in time respectively, are taken into account along with the score obtained from visual appearance to determined whether a person appears in an image.

沒有留言:

張貼留言