2011年4月30日 星期六

Learning to Rank: From Pairwise Approach to Listwise Approach

Previous works on "Learning to Rank" usually use pairwise-based loss functional, which has several disadvantage:
  1. The objective function does not guarantee the optimization of some important measure in retrieval such as AP. These measure better capture the user experience than the pairwise ordering.
  2. Document pairs are not generated independently, which violate the assumption of classification methods that is applied.
  3. The number of document pairs, which grows in n square order, from query to query is highly unbalanced. This will lead to a biased model in training.

A listwise approach is proposed as follow:
  1. For each query, a list of score over all documents is given as ground truth.
  2. A ranking function f, which is what we want to learn, takes both query and document as input, and gives a score for each document-query pair.
  3. For each query, a list of score is generated using the ranking function
  4. The loss function is defined over the "ground truth list" and the "ranking list" of each query as input.

The ranking of the documents given a query can be viewed as a permutation, which is relevant to the list of score.
Now the loss function is defined as followed:
  1. Defined the probability of each permutation based on the ordering and the score of each documents.
  2. Given two lists of score, the probability of the permutation can be calculated, and the cross entropy of the ground truth and the score generated by ranking function is used as the loss function.

To reduce the computational cost, only the top k ranking documents are considered in the learning to rank process.

2011年4月20日 星期三

Support Vector Learning for Ordinal Regression

Problem definition -
Given a set of samples S = {(xi,yi)}, where xi being the feature vector of sample and yi being the label, most machine learning problem can be classified into one of the three categories, depending on the property of label:
  1. Classification: yi belongs to a finite set of discrete value, where the value is unordered.
  2. Regression: yi is in a continues metric space, where the value of yi contains relevancy information about the sample.
  3. Ordinal regression: yi belongs to a finite set of discrete value like classification, but there exist ordering relationship in the value as the regression. It can be interpreted as a ranking problem.
This paper gives a formal formulation and solution of the ordinal regression problem.

Problem formulation -
The formulation of ordinal regression problem is consist of two theorem:
  1. Theorem 1: Given a mapping function h: X->Y, we can transform the ranking problem into a classification problem. The classification problem is to find a mapping p: X*X->{-1,0,1}, where the input of the mapping is arbitrary two samples from X, and the output is relationship of h(x). The label of the classification problem is determined by the relationship between the actual rank of the two input samples. The empirical risk of the ranking problem is proportional to the classification problem, so the ranking problem is equivalent to the classification problem.
  2. Theorem 2: Given the mapping function h, The empirical risk is upper bounded. Suppose there exist a mapping U: X->R where samples of different label are divided in R, and the distribution of samples in R is ordered according to the order of original label. Then for every pair of consecutive labels, there exist a margin of samples in R. Suppose gamma equals the smallest margin, than the larger the margin is, the smaller the upper bound of the empirical risk is. So the optimal mapping h should be the one with largest gamma value, or say, with largest margin in R.

Problem solution -
According to the above theorems, the ordinal regression problem can be transformed into a classification problem with largest margin criteria. This is the same criteria as SVM, and a formulation of the problem which is identical to SVM is provided. So the problem can be solved by first solve the classification problem using SVM solver, and the mapping U and boundary of different labels in R can be found accordingly. Combining mapping U and the boundary, we can obtain the desired mapping h.

2011年4月6日 星期三

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Locally Linear Embedding (LLE) is an unsupervised algorithm that maps data points from the original high dimensional space into a low dimensional space.

Dimension reduction is beneficial in several aspect, for example
  1. Compact data representation
  2. Data visualization
  3. Relationship between data points are better described in some particular low dimensional space

The key of this paper is that: given a nonlinear manifold, a local patch of the manifold can be approximated by a hyperplane. The local structure of the patch on the hyperplane is invariant to the projection from the original space to the manifold.

The main assumption, or requirement of the algorithm, is that data points are dense enough that each data point and its neighbors lies in a small patch that can be well approximated by a hyperplane.

So the algorithm can be summarized as followed:
  1. Extract local structure information.
    Because local patch can be approximated by hyperplane, each point can be reconstructed by linear combination of its neighbors. Such reconstruction contains local geometry information that is invariant to the projection form the original space to the low dimensional manifold.
  2. Projection to the manifold.
    A mapping that preserves the local structure information is found. Finding such mapping can be transformed to a optimization problem with close form solution. It can be solved using linear algebra method.