Last Updated on April 27, 2021

Ensemble learning is a unstipulated meta tideway to machine learning that seeks largest predictive performance by combining the predictions from multiple models.

Although there are a seemingly unlimited number of ensembles that you can develop for your predictive modeling problem, there are three methods that dominate the field of ensemble learning. So much so, that rather than algorithms per se, each is a field of study that has spawned many increasingly specialized methods.

The three main classes of ensemble learning methods are bagging, stacking, and boosting, and it is important to both have a detailed understanding of each method and to consider them on your predictive modeling project.

But, surpassing that, you need a gentle introduction to these approaches and the key ideas overdue each method prior to layering on math and code.

In this tutorial, you will discover the three standard ensemble learning techniques for machine learning.

After completing this tutorial, you will know:

  • Bagging involves fitting many visualization trees on variegated samples of the same dataset and averaging the predictions.
  • Stacking involves fitting many variegated models types on the same data and using flipside model to learn how to weightier combine the predictions.
  • Boosting involves subtracting ensemble members sequentially that correct the predictions made by prior models and outputs a weighted stereotype of the predictions.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

A Gentle Introduction to Ensemble Learning Algorithms

A Gentle Introduction to Ensemble Learning Algorithms
Photo by Rajiv Bhuttan, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. Standard Ensemble Learning Strategies
  2. Bagging Ensemble Learning
  3. Stacking Ensemble Learning
  4. Boosting Ensemble Learning

Standard Ensemble Learning Strategies

Ensemble learning refers to algorithms that combine the predictions from two or increasingly models.

Although there is nearly an unlimited number of ways that this can be achieved, there are perhaps three classes of ensemble learning techniques that are most wontedly discussed and used in practice. Their popularity is due in large part to their ease of implementation and success on a wide range of predictive modeling problems.

A rich hodgepodge of ensemble-based classifiers have been ripened over the last several years. However, many of these are some variation of the select few well- established algorithms whose capabilities have moreover been extensively tested and widely reported.

— Page 11, Ensemble Machine Learning, 2012.

Given their wide use, we can refer to them as “standard” ensemble learning strategies; they are:

  1. Bagging.
  2. Stacking.
  3. Boosting.

There is an algorithm that describes each approach, although increasingly importantly, the success of each tideway has spawned a myriad of extensions and related techniques. As such, it is increasingly useful to describe each as a matriculation of techniques or standard approaches to ensemble learning.

Rather than swoop into the specifics of each method, it is useful to step through, summarize, and unrelatedness each approach. It is moreover important to remember that although discussion and use of these methods are pervasive, these three methods vacated do not pinpoint the extent of ensemble learning.

Next, let’s take a closer squint at bagging.

Bagging Ensemble Learning

Bootstrap aggregation, or bagging for short, is an ensemble learning method that seeks a diverse group of ensemble members by varying the training data.

The name Bagging came from the reducing of Bootstrap AGGregatING. As the name implies, the two key ingredients of Bagging are bootstrap and aggregation.

— Page 48, Ensemble Methods, 2012.

This typically involves using a each machine learning algorithm, scrutinizingly unchangingly an unpruned visualization tree, and training each model on a variegated sample of the same training dataset. The predictions made by the ensemble members are then combined using simple statistics, such as voting or averaging.

The diversity in the ensemble is ensured by the variations within the bootstrapped replicas on which each classifier is trained, as well as by using a relatively weak classifier whose visualization boundaries measurably vary with respect to relatively small perturbations in the training data.

— Page 11, Ensemble Machine Learning, 2012.

Key to the method is the manner in which each sample of the dataset is prepared to train ensemble members. Each model gets its own unique sample of the dataset.

Examples (rows) are drawn from the dataset at random, although with replacement.

Bagging adopts the bootstrap distribution for generating variegated wiring learners. In other words, it applies bootstrap sampling to obtain the data subsets for training the wiring learners.

— Page 48, Ensemble Methods, 2012.

Replacement ways that if a row is selected, it is returned to the training dataset for potential re-selection in the same training dataset. This ways that a row of data may be selected zero, one, or multiple times for a given training dataset.

This is tabbed a bootstrap sample. It is a technique often used in statistics with small datasets to estimate the statistical value of a data sample. By preparing multiple variegated bootstrap samples and estimating a statistical quantity and gingerly the midpoint of the estimates, a largest overall estimate of the desired quantity can be achieved than simply estimating from the dataset directly.

In the same manner, multiple variegated training datasets can be prepared, used to estimate a predictive model, and make predictions. Averaging the predictions wideness the models typically results in largest predictions than a each model fit on the training dataset directly.

We can summarize the key elements of bagging as follows:

  • Bootstrap samples of the training dataset.
  • Unpruned visualization trees fit on each sample.
  • Simple voting or averaging of predictions.

In summary, the contribution of bagging is in the varying of the training data used to fit each ensemble member, which, in turn, results in skillful but variegated models.

Bagging Ensemble

Bagging Ensemble

It is a unstipulated tideway and hands extended. For example, increasingly changes to the training dataset can be introduced, the algorithm fit on the training data can be replaced, and the mechanism used to combine predictions can be modified.

Many popular ensemble algorithms are based on this approach, including:

  • Bagged Visualization Trees (canonical bagging)
  • Random Forest
  • Extra Trees

Next, let’s take a closer squint at stacking.

Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

Download Your FREE Mini-Course

Stacking Ensemble Learning

Stacked Generalization, or stacking for short, is an ensemble method that seeks a diverse group of members by varying the model types fit on the training data and using a model to combine predictions.

Stacking is a unstipulated procedure where a learner is trained to combine the individual learners. Here, the individual learners are tabbed the first-level learners, while the combiner is tabbed the second-level learner, or meta-learner.

— Page 83, Ensemble Methods, 2012.

Stacking has its own nomenclature where ensemble members are referred to as level-0 models and the model that is used to combine the predictions is referred to as a level-1 model.

The two-level hierarchy of models is the most worldwide approach, although increasingly layers of models can be used. For example, instead of a each level-1 model, we might have 3 or 5 level-1 models and a each level-2 model that combines the predictions of level-1 models in order to make a prediction.

Stacking is probably the most-popular meta-learning technique. By using a meta-learner, this method tries to induce which classifiers are reliable and which are not.

— Page 82, Pattern Nomenclature Using Ensemble Methods, 2010.

Any machine learning model can be used to volume the predictions, although it is worldwide to use a linear model, such as linear regression for regression and logistic regression for binary classification. This encourages the complexity of the model to reside at the lower-level ensemble member models and simple models to learn how to harness the variety of predictions made.

Using trainable combiners, it is possible to determine which classifiers are likely to be successful in which part of the full-length space and combine them accordingly.

— Page 15, Ensemble Machine Learning, 2012.

We can summarize the key elements of stacking as follows:

  • Unchanged training dataset.
  • Different machine learning algorithms for each ensemble member.
  • Machine learning model to learn how to weightier combine predictions.

Diversity comes from the variegated machine learning models used as ensemble members.

As such, it is desirable to use a suite of models that are learned or synthetic in very variegated ways, ensuring that they make variegated assumptions and, in turn, have less correlated prediction errors.

Stacking Ensemble

Stacking Ensemble

Many popular ensemble algorithms are based on this approach, including:

  • Stacked Models (canonical stacking)
  • Blending
  • Super Ensemble

Next, let’s take a closer squint at boosting.

Boosting Ensemble Learning

Boosting is an ensemble method that seeks to transpiration the training data to focus sustentation on examples that previous fit models on the training dataset have gotten wrong.

In boosting, […] the training dataset for each subsequent classifier increasingly focuses on instances misclassified by previously generated classifiers.

— Page 13, Ensemble Machine Learning, 2012.

The key property of boosting ensembles is the idea of correcting prediction errors. The models are fit and widow to the ensemble sequentially such that the second model attempts to correct the predictions of the first model, the third corrects the second model, and so on.

This typically involves the use of very simple visualization trees that only make a each or a few decisions, referred to in boosting as weak learners. The predictions of the weak learners are combined using simple voting or averaging, although the contributions are weighed proportional to their performance or capability. The objective is to develop a so-called “strong-learner” from many purpose-built “weak-learners.”

… an iterative tideway for generating a strong classifier, one that is capable of achieving summarily low training error, from an ensemble of weak classifiers, each of which can barely do largest than random guessing.

— Page 13, Ensemble Machine Learning, 2012.

Typically, the training dataset is left unchanged and instead, the learning algorithm is modified to pay increasingly or less sustentation to explicit examples (rows of data) based on whether they have been predicted correctly or incorrectly by previously widow ensemble members. For example, the rows of data can be weighed to indicate the value of focus a learning algorithm must requite while learning the model.

We can summarize the key elements of boosting as follows:

  • Bias training data toward those examples that are nonflexible to predict.
  • Iteratively add ensemble members to correct predictions of prior models.
  • Combine predictions using a weighted stereotype of models.

The idea of combining many weak learners into strong learners was first proposed theoretically and many algorithms were proposed with little success. It was not until the Adaptive Boosting (AdaBoost) algorithm was ripened that boosting was demonstrated as an constructive ensemble method.

The term boosting refers to a family of algorithms that are worldly-wise to convert weak learners to strong learners.

— Page 23, Ensemble Methods, 2012.

Since AdaBoost, many boosting methods have been ripened and some, like stochastic gradient boosting, may be among the most constructive techniques for nomenclature and regression on tabular (structured) data.

Boosting Ensemble

Boosting Ensemble

To summarize, many popular ensemble algorithms are based on this approach, including:

  • AdaBoost (canonical boosting)
  • Gradient Boosting Machines
  • Stochastic Gradient Boosting (XGBoost and similar)

This completes our tour of the standard ensemble learning techniques.

Further Reading

This section provides increasingly resources on the topic if you are looking to go deeper.

Books

Articles

Summary

In this tutorial, you discovered the three standard ensemble learning techniques for machine learning.

Specifically, you learned:

  • Bagging involves fitting many visualization trees on variegated samples of the same dataset and averaging the predictions.
  • Stacking involves fitting many variegated models types on the same data and using flipside model to learn how to weightier combine the predictions.
  • Boosting involves subtracting ensemble members sequentially that correct the predictions made by prior models and outputs a weighted stereotype of the predictions.

Do you have any questions?
Ask your questions in the comments unelevated and I will do my weightier to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects


See What's Inside