Bootstrap aggregation, or bagging, is a popular ensemble method that fits a visualization tree on variegated bootstrap samples of the training dataset.

It is simple to implement and constructive on a wide range of problems, and importantly, modest extensions to the technique result in ensemble methods that are among some of the most powerful techniques, like random forest, that perform well on a wide range of predictive modeling problems.

As such, we can generalize the bagging method to a framework for ensemble learning and compare and unrelatedness a suite of worldwide ensemble methods that vest to the “bagging family” of methods. We can moreover use this framework to explore remoter extensions and how the method can be remoter tailored to a project dataset or chosen predictive model.

In this tutorial, you will discover the essence of the bootstrap team tideway to machine learning ensembles.

After completing this tutorial, you will know:

  • The bagging ensemble method for machine learning using bootstrap samples and visualization trees.
  • How to slaver the essential elements from the bagging method and how popular extensions like random forest are directly related to bagging.
  • How to devise new extensions to bagging by selecting new procedures for the essential elements of the method.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Essence of Bootstrap Team Ensembles

Essence of Bootstrap Team Ensembles
Photo by GPA Photo Archive, some rights reserved.

Tutorial Overview

This tutorial is divided into four parts; they are:

  1. Bootstrap Aggregation
  2. Essence of Bagging Ensembles
  3. Bagging Ensemble Family
    1. Random Subspace Ensemble
    2. Random Forest Ensemble
    3. Extra Trees Ensemble
  4. Customized Bagging Ensembles

Bootstrap Aggregation

Bootstrap Aggregation, or bagging for short, is an ensemble machine learning algorithm.

The techniques involve creating a bootstrap sample of the training dataset for each ensemble member and training a visualization tree model on each sample, then combining the predictions directly using a statistic like the stereotype of the predictions.

Breiman’s bagging (short for Bootstrap Aggregation) algorithm is one of the primeval and simplest, yet effective, ensemble-based algorithms.

— Page 12, Ensemble Machine Learning, 2012.

The sample of the training dataset is created using the bootstrap method, which involves selecting examples randomly with replacement.

Replacement ways that the same example is metaphorically returned to the pool of candidate rows and may be selected then or many times in any each sample of the training dataset. It is moreover possible that some examples in the training dataset are not selected at all for some bootstrap samples.

Some original examples towards increasingly than once, while some original examples are not present in the sample.

— Page 48, Ensemble Methods, 2012.

The bootstrap method has the desired effect of making each sample of the dataset quite different, or usefully variegated for creating an ensemble.

A visualization tree is then fit on each sample of data. Each tree will be a little variegated given the differences in the training dataset. Typically, the visualization tree is configured to have perhaps an increased depth or to not use pruning. This can make each tree increasingly specialized to the training dataset and, in turn, remoter increase the differences between the trees.

Differences in trees are desirable as they will increase the “diversity” of the ensemble, which ways produce ensemble members that have a lower correlation in their prediction or prediction errors. It is often wonted that ensembles well-balanced of ensemble members that are skillful and diverse (skillful in variegated ways or make variegated errors) perform better.

The diversity in the ensemble is ensured by the variations within the bootstrapped replicas on which each classifier is trained, as well as by using a relatively weak classifier whose visualization boundaries measurably vary with respect to relatively small perturbations in the training data.

— Page 12, Ensemble Machine Learning, 2012.

A goody of bagging is that it often does not overfit the training dataset, and the number of ensemble members can protract to be increased until performance on a holdout dataset stops improving.

This is a high-level summary of the bagging ensemble method, yet we can generalize the tideway and pericope the essential elements.

Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

Download Your FREE Mini-Course

Essence of Bagging Ensembles

The essence of bagging is well-nigh leveraging self-sustaining models.

In this way, it might be the closest realization of the “wisdom of the crowd” metaphor, expressly if we consider that performance continues to modernize with the wing of self-sustaining contributors.

Unfortunately, we cannot develop truly self-sustaining models as we only have one training dataset. Instead, the bagging tideway approximates self-sustaining models using randomness. Specifically, by using randomness in the sampling of the dataset used to train each model, forcing some semi-independence between the models.

Though it is practically untellable to get really self-sustaining wiring learners since they are generated from the same training data set, wiring learners with less dependence can be obtained by introducing randomness in the learning process, and a good generalization worthiness can be expected by the ensemble.

— Page 48, Ensemble Methods, 2012.

The structure of the bagging procedure can be divided into three essential elements; they are:

  • Different Training Datasets: Create a variegated sample of the training dataset for each ensemble model.
  • High-Variance Model: Train the same high-variance model on each sample of the training dataset.
  • Average Predictions: Use statistics to combine predictions.

We can map the canonical bagging method onto these elements as follows:

  • Different Training Datasets: Bootstrap sample.
  • High-Variance Model: Visualization tree.
  • Average Predictions: Mean for regression, mode for classification.

This provides a framework where we could consider unorganized methods for each essential element of the model.

For example, we could transpiration the algorithm to flipside high-variance technique that has somewhat unstable learning behavior, perhaps like k-nearest neighbors with a modest value for the k hyperparameter.

Often, bagging produces a combined model that outperforms the model that is built using a each instance of the original data. […] this is true expressly for unstable inducers since bagging can eliminate their instability. In this context, an inducer is considered unstable if perturbations in the learning set can produce significant changes in the synthetic classifier.

— Page 28, Pattern Nomenclature Using Ensemble Methods, 2010.

We might moreover transpiration the sampling method from the bootstrap to flipside sampling technique, or increasingly generally, a variegated method entirely. In fact, this is a understructure for many of the extensions of bagging described in the literature. Specifically, to struggle to get ensemble members that are increasingly independent, yet remain skillful.

We know that the combination of self-sustaining wiring learners will lead to a dramatic subtract of errors and therefore, we want to get wiring learners as self-sustaining as possible.

— Page 48, Ensemble Methods, 2012.

Let’s take a closer squint at other ensemble methods that may be considered a part of the bagging family.

Bagging Ensemble Family

Many ensemble machine learning techniques may be considered descendants of bagging.

As such, we can map them onto our framework of essential bagging. This is a helpful exercise as it both highlights the differences between methods and the uniqueness of each technique. Perhaps increasingly importantly, it could moreover spark ideas for spare variations that you may want to explore on your own predictive modeling project.

Let’s take a closer squint at three of the increasingly popular ensemble methods related to bagging.

Random Subspace Ensemble

The random subspace method, or random subspace ensemble, involves selecting random subsets of the features (columns) in the training dataset for each ensemble member.

Each training dataset has all rows as it is only the columns that are randomly sampled.

  • Different Training Datasets: Randomly sample columns.

Random Forest Ensemble

The random forest method is perhaps one of the most successful and widely used ensemble methods, given its ease of implementation and often superior performance on a wide range of predictive modeling problems.

The method often involves selecting a bootstrap sample of the training dataset and a small random subset of columns to consider when choosing each split point in each ensemble member.

In this way, it is like a combination of bagging with the random subspace method, although the random subspaces are used uniquely for the way visualization trees are constructed.

  • Different Training Datasets: Bootstrap sample.
  • High-Variance Model: Visualization tree with split points on random subsets of columns.

Extra Trees Ensemble

The uneaten trees ensemble uses the unshortened training dataset, although it configures the visualization tree algorithm to select the split points at random.

  • Different Training Datasets: Whole dataset.
  • High-Variance Model: Visualization tree with random split points.

Customized Bagging Ensembles

We have transiently looked the canonical random subspace, random forest, and uneaten trees methods, although there is no reason that the methods could not share increasingly implementation details.

In fact, modern implementations of algorithms like bagging and random forest proved sufficient configuration to combine many of these features.

Rather than exhausting the literature, we can devise our own extensions that map into the bagging framework. This may inspire you to explore a less worldwide method or devise your own bagging tideway targeted on your dataset or nomination of model.

There are perhaps tens or hundreds of extensions of bagging with minor modifications to the manner in which the training dataset for each ensemble member is prepared or the specifics of how the model is synthetic from the training dataset.

The changes are built virtually the three main elements of the essential bagging method and often seek largest performance by exploring the wastefulness between skillful-enough ensemble members whilst maintaining unbearable diversity between predictions or prediction errors.

For example, we could transpiration the sampling of the training dataset to be a random sample without replacement, instead of a bootstrap sample. This is referred to as “pasting.”

  • Different Training Dataset: Random subsample of rows.

We could go remoter and select a random subsample of rows (like pasting) and a random subsample of columns (random subsample) for each visualization tree. This is known as “random patches.”

  • Different Training Dataset: Random subsample of rows and columns.

We can moreover consider our own simple extensions of the idea.

For example, it is worldwide to use full-length selection techniques to segregate a subset of input variables in order to reduce the complexity of a prediction problem (fewer columns) and unzip largest performance (less noise). We could imagine a bagging ensemble where each model is fit on a variegated “view” of the training dataset selected by a variegated full-length selection or full-length importance method.

  • Different Training Dataset: Columns chosen by variegated full-length selection methods.

It is moreover worldwide to test a model with many variegated data transforms as part of a modeling pipeline. This is washed-up considering we cannot know earlier which representation of the training dataset will weightier expose the unknown underlying structure of the dataset to the learning algorithms. We could imagine a bagging ensemble where each model is fit on a variegated transform of the training dataset.

  • Different Training Dataset: Data transforms of the raw training dataset.

These are a few perhaps obvious examples of how the essence of the bagging method can be explored, hopefully inspiring remoter ideas. I would encourage you to spitball how you might transmute the methods to your own explicit project.

Further Reading

This section provides increasingly resources on the topic if you are looking to go deeper.

Tutorials

Books

Articles

Summary

In this tutorial, you discovered the essence of the bootstrap team tideway to machine learning ensembles.

Specifically, you learned:

  • The bagging ensemble method for machine learning using bootstrap samples and visualization trees.
  • How to slaver the essential elements from the bagging method and how popular extensions like random forest are directly related to bagging.
  • How to devise new extensions to bagging by selecting new procedures for the essential elements of the method.

Do you have any questions?
Ask your questions in the comments unelevated and I will do my weightier to answer.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects


See What's Inside