Strong Learners vs. Weak Learners in Ensemble Learning



It is worldwide to describe ensemble learning techniques in terms of weak and strong learners.

For example, we may desire to construct a strong learner from the predictions of many weak learners. In fact, this is the explicit goal of the boosting matriculation of ensemble learning algorithms.

Although we may describe models as weak or strong generally, the terms have a explicit formal definition and are used as the understructure for an important finding from the field of computational learning theory.

In this tutorial, you will discover weak and strong learners and their relationship with ensemble learning.

After completing this tutorial, you will know:

  • Weak learners are models that perform slightly largest than random guessing.
  • Strong learners are models that have summarily good accuracy.
  • Weak and strong learners are tools from computational learning theory and provide the understructure for the minutiae of the boosting matriculation of ensemble methods.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Strong Learners vs. Weak Learners for Ensemble Learning

Strong Learners vs. Weak Learners for Ensemble Learning
Photo by G. Lamar, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Weak Learners
  2. Strong Learners
  3. Weak vs. Strong Learners and Boosting

Weak Learners

A weak classifier is a model for binary nomenclature that performs slightly largest than random guessing.

A weak learner produces a classifier which is only slightly increasingly well-judged than random classification.

— Page 21, Pattern Nomenclature Using Ensemble Methods, 2010.

This ways that the model will make predictions that are known to have some skill, e.g. making the capabilities of the model weak, although not so weak that the model has no skill, e.g. performs worse than random.

  • Weak Classifier: Formally, a classifier that achieves slightly largest than 50 percent accuracy.

A weak classifier is sometimes tabbed a “weak learner” or “base learner” and the concept can be generalized vastitude binary classification.

Although the concept of a weak learner is well understood in the context of binary classification, it can be taken colloquially to midpoint any model that performs slightly largest than a naive prediction method. In this sense, it is a useful tool for thinking well-nigh the sufficiency of classifiers and the sonnet of ensembles.

  • Weak Learner: Colloquially, a model that performs slightly largest than a naive model.

More formally, the notion has been generalized to multi-class nomenclature and has a variegated meaning vastitude largest than 50 percent accuracy.

For binary classification, it is well known that the word-for-word requirement for weak learners is to be largest than random guess. […] Notice that requiring wiring learners to be largest than random guess is too weak for multi-class problems, yet requiring largest than 50% verism is too stringent.

— Page 46, Ensemble Methods, 2012.

It is based on formal computational learning theory that proposes a matriculation of learning methods that possess weakly learnability, meaning that they perform largest than random guessing. Weak learnability is proposed as a simplification of the increasingly desirable strong learnability, where a learnable achieved wrong-headed good nomenclature accuracy.

A weaker model of learnability, tabbed weak learnability, drops the requirement that the learner be worldly-wise to unzip summarily upper accuracy; a weak learning algorithm needs only output an proposition that performs slightly largest (by an inverse polynomial) than random guessing.

— The Strength of Weak Learnability, 1990.

It is a useful concept as it is often used to describe the capabilities of contributing members of ensemble learning algorithms. For example, sometimes members of a bootstrap team are referred to as weak learners as opposed to strong, at least in the vernacular meaning of the term.

More specifically, weak learners are the understructure for the boosting matriculation of ensemble learning algorithms.

The term boosting refers to a family of algorithms that are worldly-wise to convert weak learners to strong learners.

— Page 23, Ensemble Methods, 2012.

The most wontedly used type of weak learning model is the visualization tree. This is considering the weakness of the tree can be controlled by the depth of the tree during construction.

The weakest visualization tree consists of a each node that makes a visualization on one input variable and outputs a binary prediction, for a binary nomenclature task. This is often referred to as a “decision stump.”

Here the weak classifier is just a “stump”: a two terminal-node nomenclature tree.

— Page 339, The Elements of Statistical Learning, 2016.

It is used as a weak learner so often that visualization stump and weak learner are practically synonyms.

  • Decision Stump: A visualization tree with a each node operating on one input variable, the output of which makes a prediction directly.

Nevertheless, other models can moreover be configured to be weak learners.

Because boosting requires a weak learner, scrutinizingly any technique with tuning parameters can be made into a weak learner. Trees, as it turns out, make an spanking-new wiring learner for boosting …

— Page 205, Applied Predictive Modeling, 2013.

Although not formally known as weak learners, we can consider the pursuit as candidate weak learning models:

  • k-Nearest Neighbors, with k=1 operating on one or a subset of input variables.
  • Multi-Layer Perceptron, with a each node operating on one or a subset of input variables.
  • Naive Bayes, operating on a each input variable.

Now that we are familiar with a weak learner, let’s take a closer squint at strong learners.

Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

Download Your FREE Mini-Course

Strong Learners

A strong classifier is a model for binary nomenclature that performs with wrong-headed performance, much largest than random guessing.

A matriculation of concepts is learnable (or strongly learnable) if there exists a polynomial-time algorithm that achieves low error with upper conviction for all concepts in the class.

— The Strength of Weak Learnability, 1990.

This is sometimes interpreted to midpoint perfect skill on a training or holdout dataset, although increasingly likely refers to a “good” or “usefully skillful” model.

  • Strong Classifier: Formally, a classifier that achieves summarily good accuracy.

We seek strong classifiers for predictive modeling problems. It is the goal of the modeling project to develop a strong classifier that makes mostly correct predictions with upper confidence.

Again, although the concept of a strong classifier is well understood for binary classification, it can be generalized to other problem types and we can interpret the concept less formally as a well-performing model, perhaps near-optimal.

  • Strong Learner: Colloquially, a model that performs very well compared to a naive model.

We are attempting to develop a strong model when we fit a machine learning model directly on a dataset. For example, we might consider the pursuit algorithms as techniques for fitting a strong model in the vernacular sense, where the hyperparameters of each method are tuned for the target problem:

  • Logistic Regression.
  • Support Vector Machine.
  • k-Nearest Neighbors.

And many increasingly methods listed in the previous section or with which you may be familiar.

Strong learning is what we seek, and we can unrelatedness their sufficiency with weak learners, although we can moreover construct strong learners from weak learners.

Weak vs. Strong Learners and Boosting

We have established that weak learners perform slightly largest than random, and that strong learners are good or plane near-optimal and it is the latter that we seek for a predictive modeling project.

In computational learning theory, specifically PAC learning, the formal classes of weak and strong learnability were specified with the unshut question as to whether the two were equivalent or not.

The proof presented here is constructive; an explicit method is described for directly converting a weak learning algorithm into one that achieves wrong-headed accuracy. The construction uses filtering to modify the distribution of examples in such a way as to gravity the weak learning algorithm to focus on the harder-to-learn parts of the distribution.

— The Strength of Weak Learnability, 1990.

Later, it was discovered that they are indeed equivalent. Increasingly so that a strong learner can be synthetic from many weak learners, formally defined. This provided the understructure for the boosting matriculation of ensemble learning methods.

The main result is a proof of the perhaps surprising equivalence of strong and weak learnability.

— The Strength of Weak Learnability, 1990.

Although this theoretical finding was made, it still took years surpassing the first viable boosting methods were developed, implementing the procedure.

Most notably Adaptive Boosting, referred to as AdaBoost, was the first successful boosting method, later leading to a large number of methods, culminating today in highly successful techniques such as gradient boosting and implementations such as Extreme Gradient Boosting (XGBoost).

Ensembles of weak learners was mostly studied in the machine learning community. In this thread, researchers often work on weak learners and try to diamond powerful algorithms to uplift the performance from weak to strong. This thread of work has led to the lineage of famous ensemble methods such as AdaBoost, Bagging, etc., and theoretical understanding on why and how weak learners can be boosted to strong ones.

— Page 16, Ensemble Methods, 2012.

Generally, the goal of boosting ensembles is to develop a large number of weak learners for a predictive learning problem, then weightier combine them in order to unzip a strong learner. This is good goal as weak learners are easy to prepare but not desirable, and strong learners are nonflexible to prepare and highly desirable.

Since strong learners are desirable yet difficult to get, while weak learners are easy to obtain in real practice, this result opens a promising direction of generating strong learners by ensemble methods.

— Pages 16-17, Ensemble Methods, 2012.

  • Weak Learner: Easy to prepare, but not desirable due to their low skill.
  • Strong Learner: Nonflexible to prepare, but desirable considering of their upper skill.

The procedure that was found to unzip this is to sequentially develop weak learners and add them to the ensemble, where each weak learner is trained in a way to pay increasingly sustentation to parts of the problem domain that prior models got wrong. Although all boosting techniques follow this unstipulated procedure with explicit differences and optimizations, the notion of weak and strong learners is a useful concept increasingly often for machine learning and ensemble learning.

For example, we have once seen how we can describe the goal of a predictive model is to develop a strong model. It is worldwide practice to evaluate the performance of a model versus a baseline or naive model, such as random predictions for binary classification. A weak learner is very much like the naive model, although slightly skillful and using a minimum of information from the problem domain, as opposed to completely naive.

Consider that although we do not technically construct weak learners in bootstrap team (bagging), meaning the members are not visualization stumps, we do aim to create weaker visualization trees to subsume the ensemble. This is often achieved by fitting the trees on sampled subsets of the data and not pruning the trees, permitting them to overfit the training data slightly.

For nomenclature we can understand the bagging effect in terms of a consensus of self-sustaining weak learners

— Page 286, The Elements of Statistical Learning, 2016.

Both changes are made to seek less correlated trees but have the effect of training weaker, but perhaps not weak, models to subsume the ensemble.

  • Bagging: explicitly trains weaker (but not weak) learners.

Consider stacked generalization (stacking) that trains a model to weightier combine the predictions from multiple variegated models fit on the same training dataset. Each contributing level-0 model is in effect a strong learner, and the meta level-1 model seeks to make a stronger model by combining the predictions from the strong models.

  • Stacking: explicitly combines the predictions from strong learners.

Mixture of experts (MoE) operates in a similar way, training multiple strong models (the experts) that are combined into hopefully stronger models via a meta-model, the gating network, and combing method.

Mixture-of-experts can moreover be seen as a classifier selection algorithm, where individual classifiers are trained to wilt experts in some portion of the full-length space. In this setting, individual classifiers are indeed trained to wilt experts, and hence are usually not weak classifiers

— Page 16, Ensemble Machine Learning, 2012.

This highlights that although weak and strong learnability and learners are an important theoretical finding and understructure for boosting, that the increasingly generalized ideas of these classifiers are useful tools for designing and selecting ensemble methods.

Further Reading

This section provides increasingly resources on the topic if you are looking to go deeper.

Papers

Books

Articles

Summary

In this tutorial, you discovered weak and strong learners and their relationship with ensemble learning.

Specifically, you learned:

  • Weak learners are models that perform slightly largest than random guessing.
  • Strong learners are models that have summarily good accuracy.
  • Weak and strong learners are tools from computational learning theory and provide the understructure for the minutiae of the boosting matriculation of ensemble methods.

Do you have any questions?
Ask your questions in the comments unelevated and I will do my weightier to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner,
and much more…

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects

See What’s Inside



Author: Shantun Parmar

Leave a Reply

Your email address will not be published.