Understanding and analyzing the effectiveness of uncertainty sampling
Active learning techniques attempt to reduce the amount of data required to learn a classifier by leveraging adaptivity. In particular, an algorithm iteratively selects and labels points from an unlabeled pool of data points. Over the history of active learning, many algorithms have been developed, though one heuristic algorithm, uncertainty sampling, stands out by its popularity, effectiveness, simplicity, and intuitiveness. Despite this, uncertainty sampling has known failure modes and lacks the theoretical underpinnings of some other algorithms such as those based on disagreement. Here, we present a few analyses of uncertainty sampling. First, we find that uncertainty sampling iterations implicitly optimizes the (generally non-convex) zero-one loss, explaining how uncertainty sampling can achieve lower error than labeling the entire unlabeled pool and highlighting the importance of a good initialization. Second, for logistic regression, we show that the extent to which uncertainty sampling outperforms random sampling is inversely proportional to the asymptotic error, both theoretically and empirically. Finally, we use the previous insights to show uncertainty sampling works very well on a particular NLP task due to extreme label imbalance. Taken together, these results provide a sturdier foundation for understanding and using uncertainty sampling.