Survey of Zero Shot Learning

Views

The aim of zero-shot learning is to classify instances belonging to the class that have no labeled instances. In zero-shot learning, the classes covered by the training instances are referred to as the seen classes, and the classes not covered by the training instances which are unlabeled testing instances are referred to as the unseen classes.

Zero-shot learning is a special kind of transfer learning. Based on whether the feature spaces and label spaces in the source and target domains/tasks are the same, transfer learning can be classified into homogeneous transfer learning and heterogeneous transfer learning. In zero-shot learning, the source feature space is the feature space of training instances, and the target feature space is the feature space of testing instances. They are the same. However, the source label space is the seen class, while the target label space is the unseen class. They are different. So, zero-shot learning belongs to heterogeneous transfer learning.

Learning Settings

Learning Settings	Involved in Model Learning	Characteristic
CIII	$D^{t r}, T^{s}$	severe domain shift, but more general
CTII	$D^{t r}, T^{s}, T^{u}$	soft domain shift
CTIT	$D^{t r}, T^{s}, T^{u}, X^{t e}$	soft domain shift, but lower general

$D^{t r}$ is the labeled training instances, $T^{s}$ is the seen class prototypes, $X^{t e}$ is the unlabeled testing instances, $T^{u}$ is the unseen class prototypes.
In machine learning methods, as the distributions of the training and the testing instances are different, the performance of the model learned with the training instances will decrease when applied to the testing instances. This phenomenon is more severe in zero-shot learning, and it is usually referred to as domain shift.
Under CIII setting, as the model are not optimized for specific unseen classes and testing instances, when new unseen classes or testing instances need to be classified, the generalization ability of models learned is usually better than model learned under other settings.

Semantic Spaces

Semantic spaces contain semantic information about classes and are important part of zero-shot learning. According to how a semantic space is constructed, we categorize them as engineered semantic spaces and learned semantic spaces.

Semantic Spaces Type	Advantages	Disadvantages
Engineered Semantic Spaces	easy to encode human domain knowledge	heavily rely on humans
Learned Semantic Spaces	less labor intensive, contain more information	model's unpredictable and hard to understand

Engineered Semantic Spaces

In engineered semantic spaces, each dimension of the semantic space is designed by humans.

Attribute Spaces

Attribute spaces are kinds of semantic spaces that are constructed by a set of attributes of the classes. For example, in the problem of animal recognition in images, there are three attributes: “having stripes”, “living on land” and “plant eating”. For the class “tiger”, it’s semantic space is $[1, 1, 0]$ , and for the class “horse”, it’s semantic space is $[0, 1, 1]$ . The above attribute space is binary attribute space, there are also other attribute spaces, like continuous attribute spaces.

Lexical Spaces

Lexical spaces are based on the labels of the classes and datasets that can provide semantic information. The dataset can be some structured lexical databases, such as WordNet. You can use the hierarchical relationships in WordNet to construct the semantic space by setting the prototype of label $i$ with label $j$ $t_{ij}$ to 1 if label i and j have a relationship else to 0.

Text-keyword Spaces

Text-keyword spaces are kinds of semantic spaces that are constructed by a set of keywords extracted from the text descriptions of each class. The most common source of the text descriptions is websites such as Wikipedia.

Learned Semantic Spaces

In learned semantic spaces, the prototypes of each class are obtained from the output of some machine-learning models. So each dimension does not have an explicit semantic meaning.

Label-embedding Spaces

This kind of space is introduced in view of the development and wide utilization of word embedding techniques in natural language processing. It’s prototypes are obtained through the embedding of class labels.

Text-embedding Spaces

The prototypes of text-embedding spaces are obtained by embedding the text descriptions for each class.

Image-embedding Spaces

The prototype of image-embedding spaces are obtained from images belonging to each class.

Methods

Classifier-Based Methods

Existing classifier-based methods usually take a one-versus-rest solution for learning the multiclass zero-shot classifier. Therefor, the eventual zero-shot classifier $f^{u} (\cdot)$ for the unseen classes consists of $N_{u}$ binary one-versus-rest classifiers $f_{i}^{u} ∣ i = 1, \dots, N_{u}$ .

Correspondence Methods

It’s insight is to construct the classifier for unseen class via the correspondence between the binary one-versus-rest classifier for each class and its corresponding class prototype.

For each class, there is just one corresponding prototype and one corresponding binary one-versus-rest classifier. Correspondence methods aim to learn a correspondence function between these two.

First, with the available data, the correspondence function $ϕ (\cdot; θ)$ is learned, which take the prototype $t_{i}$ of the class $c_{i}$ as input and outputs the parameter $w_{i}$ of the binary one-versus-rest classifier $f_{i} (\cdot; w_{i})$ . Then, for each unseen class $c_{i}^{u}$ , with its prototype $t_{i}^{u}$ and the learned correspondence function $ϕ (\cdot; θ)$ , the classifier can be constructed.

学习一个从 class 到 classifier 的映射。

Relationship Methods

It’s insight is to construct the classifier for unseen class via the relationships among the seen and the unseen class as well as the binary one-versus-rest classifier of the seen class.

First, with the available data, the binary one-versus-rest classifier of the seen classes can be learned. Then, the relationships among the seen and the unseen classes are calculated via the corresponding class prototypes or obtained through other approaches.

学习 unseen lable 和 seen lable 之间的关系，学习 seen label 的 classifier。

Combination Methods

Its insight is to construct the classifier for unseen class via the combination of classifiers for basic elements that are used to constitute the classes.

First, with the available data, the attribute binary one-versus-rest classifiers can be learned. Than the classifier for the unseen classes can be constructed.

For example,

f^{u} (\cdot) = F mk (f_{1}^{a} (\cdot), \dots, f_{M}^{a} (\cdot))

学习属性的分类器。

Instance-Based Methods

These methods aim to first obtain labeled instances for the unseen classes, then with these instances to learn the zero-shot classifier.

Projection Methods

Its insight is to obtain labeled instances for the unseen classes by projecting both the feature space instances and the semantic space prototypes into a common space.

First, instances $x_{i}$ in the feature space $χ$ and the prototypes $t_{j}$ in the semantic space $T$ are projected into the projection space $P$ . As for each unseen class, it does not have labeled instance in the feature space, so it’s difficult to learn classifier like SVM or logistic regression classifiers with so few labeled instances. As a result, in existing projection methods, the classification is usually performed by nearest neighbor classification or some variants of it.

Instance-Borrowing Methods

Its insight is to obtain labeled instances for the unseen classes by borrowing from the training instances.

Synthesizing Methods

Its insight is to obtain labeled instances for the unseen classes by synthesizing some pseudo instances.

FF's Roam Notes

Explorer