Views
The aim of zero-shot learning is to classify instances belonging to the class that have no labeled instances. In zero-shot learning, the classes covered by the training instances are referred to as the seen classes, and the classes not covered by the training instances which are unlabeled testing instances are referred to as the unseen classes.
Zero-shot learning is a special kind of transfer learning. Based on whether the feature spaces and label spaces in the source and target domains/tasks are the same, transfer learning can be classified into homogeneous transfer learning and heterogeneous transfer learning. In zero-shot learning, the source feature space is the feature space of training instances, and the target feature space is the feature space of testing instances. They are the same. However, the source label space is the seen class, while the target label space is the unseen class. They are different. So, zero-shot learning belongs to heterogeneous transfer learning.
Learning Settings
Learning Settings | Involved in Model Learning | Characteristic |
---|---|---|
CIII | severe domain shift, but more general | |
CTII | soft domain shift | |
CTIT | soft domain shift, but lower general |
- is the labeled training instances, is the seen class prototypes, is the unlabeled testing instances, is the unseen class prototypes.
- In machine learning methods, as the distributions of the training and the testing instances are different, the performance of the model learned with the training instances will decrease when applied to the testing instances. This phenomenon is more severe in zero-shot learning, and it is usually referred to as domain shift.
- Under CIII setting, as the model are not optimized for specific unseen classes and testing instances, when new unseen classes or testing instances need to be classified, the generalization ability of models learned is usually better than model learned under other settings.
Semantic Spaces
Semantic spaces contain semantic information about classes and are important part of zero-shot learning. According to how a semantic space is constructed, we categorize them as engineered semantic spaces and learned semantic spaces.
Semantic Spaces Type | Advantages | Disadvantages |
Engineered Semantic Spaces | easy to encode human domain knowledge | heavily rely on humans |
Learned Semantic Spaces | less labor intensive, contain more information | model's unpredictable and hard to understand |
Engineered Semantic Spaces
In engineered semantic spaces, each dimension of the semantic space is designed by humans.
Attribute Spaces
Attribute spaces are kinds of semantic spaces that are constructed by a set of attributes of the classes. For example, in the problem of animal recognition in images, there are three attributes: “having stripes”, “living on land” and “plant eating”. For the class “tiger”, it’s semantic space is , and for the class “horse”, it’s semantic space is . The above attribute space is binary attribute space, there are also other attribute spaces, like continuous attribute spaces.
Lexical Spaces
Lexical spaces are based on the labels of the classes and datasets that can provide semantic information. The dataset can be some structured lexical databases, such as WordNet. You can use the hierarchical relationships in WordNet to construct the semantic space by setting the prototype of label with label to 1 if label i and j have a relationship else to 0.
Text-keyword Spaces
Text-keyword spaces are kinds of semantic spaces that are constructed by a set of keywords extracted from the text descriptions of each class. The most common source of the text descriptions is websites such as Wikipedia.
Learned Semantic Spaces
In learned semantic spaces, the prototypes of each class are obtained from the output of some machine-learning models. So each dimension does not have an explicit semantic meaning.
Label-embedding Spaces
This kind of space is introduced in view of the development and wide utilization of word embedding techniques in natural language processing. It’s prototypes are obtained through the embedding of class labels.
Text-embedding Spaces
The prototypes of text-embedding spaces are obtained by embedding the text descriptions for each class.
Image-embedding Spaces
The prototype of image-embedding spaces are obtained from images belonging to each class.
Methods
Classifier-Based Methods
Existing classifier-based methods usually take a one-versus-rest solution for learning the multiclass zero-shot classifier. Therefor, the eventual zero-shot classifier for the unseen classes consists of binary one-versus-rest classifiers .
Correspondence Methods
It’s insight is to construct the classifier for unseen class via the correspondence between the binary one-versus-rest classifier for each class and its corresponding class prototype.
For each class, there is just one corresponding prototype and one corresponding binary one-versus-rest classifier. Correspondence methods aim to learn a correspondence function between these two.
First, with the available data, the correspondence function is learned, which take the prototype of the class as input and outputs the parameter of the binary one-versus-rest classifier . Then, for each unseen class , with its prototype and the learned correspondence function , the classifier can be constructed.
学习一个从 class 到 classifier 的映射。
Relationship Methods
It’s insight is to construct the classifier for unseen class via the relationships among the seen and the unseen class as well as the binary one-versus-rest classifier of the seen class.
First, with the available data, the binary one-versus-rest classifier of the seen classes can be learned. Then, the relationships among the seen and the unseen classes are calculated via the corresponding class prototypes or obtained through other approaches.
学习 unseen lable 和 seen lable 之间的关系,学习 seen label 的 classifier。
Combination Methods
Its insight is to construct the classifier for unseen class via the combination of classifiers for basic elements that are used to constitute the classes.
First, with the available data, the attribute binary one-versus-rest classifiers can be learned. Than the classifier for the unseen classes can be constructed.
For example,
学习属性的分类器。
Instance-Based Methods
These methods aim to first obtain labeled instances for the unseen classes, then with these instances to learn the zero-shot classifier.
Projection Methods
Its insight is to obtain labeled instances for the unseen classes by projecting both the feature space instances and the semantic space prototypes into a common space.
First, instances in the feature space and the prototypes in the semantic space are projected into the projection space . As for each unseen class, it does not have labeled instance in the feature space, so it’s difficult to learn classifier like SVM or logistic regression classifiers with so few labeled instances. As a result, in existing projection methods, the classification is usually performed by nearest neighbor classification or some variants of it.
Instance-Borrowing Methods
Its insight is to obtain labeled instances for the unseen classes by borrowing from the training instances.
Synthesizing Methods
Its insight is to obtain labeled instances for the unseen classes by synthesizing some pseudo instances.