Ensemble Learning Algorithms With Python Crash Course.
Get on top of ensemble learning with Python in 7 days.

Ensemble learning refers to machine learning models that combine the predictions from two or increasingly models.

Ensembles are an wide tideway to machine learning that are often used when the sufficiency and skill of the predictions are increasingly important than using a simple and understandable model. As such, they are often used by top and winning participants in machine learning competitions like the One Million Dollar Netflix Prize and Kaggle Competitions.

Modern machine learning libraries like scikit-learn Python provide a suite of wide ensemble learning methods that are easy to configure and use correctly without data leakage, a worldwide snooping when using ensemble algorithms.

In this crash course, you will discover how you can get started and confidently bring ensemble learning algorithms to your predictive modeling project with Python in seven days.

This is a big and important post. You might want to bookmark it.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Ensemble Machine Learning With Python (7-Day Mini-Course)

Ensemble Machine Learning With Python (7-Day Mini-Course)
Photo by anoldent, some rights reserved.

Who Is This Crash-Course For?

Before we get started, let’s make sure you are in the right place.

This undertow is for developers who may know some unromantic machine learning. Maybe you know how to work through a predictive modeling problem end to end, or at least most of the main steps, with popular tools.

The lessons in this undertow do seem a few things well-nigh you, such as:

  • You know your way virtually vital Python for programming.
  • You may know some vital NumPy for variety manipulation.
  • You may know some vital scikit-learn for modeling.

You do NOT need to be:

  • A math wiz!
  • A machine learning expert!

This crash undertow will take you from a developer who knows a little machine learning to a developer who can powerfully and competently wield ensemble learning algorithms on a predictive modeling project.

Note: This crash undertow assumes you have a working Python 3 SciPy environment with at least NumPy installed. If you need help with your environment, you can follow the step-by-step tutorial here:

Crash-Course Overview

This crash undertow is wrenched lanugo into seven lessons.

You could well-constructed one lesson per day (recommended) or well-constructed all of the lessons in one day (hardcore). It really depends on the time you have misogynist and your level of enthusiasm.

Below is a list of the seven lessons that will get you started and productive with data preparation in Python:

  • Lesson 01: What Is Ensemble Learning?
  • Lesson 02: Bagging Ensembles
  • Lesson 03: Random Forest Ensemble
  • Lesson 04: AdaBoost Ensemble
  • Lesson 05: Gradient Boosting Ensemble
  • Lesson 06: Voting Ensemble
  • Lesson 07: Stacking Ensemble

Each lesson could take you 60 seconds or up to 30 minutes. Take your time and well-constructed the lessons at your own pace. Ask questions, and plane post results in the comments below.

The lessons might expect you to go off and find out how to do things. I will requite you hints, but part of the point of each lesson is to gravity you to learn where to go to squint for help with and well-nigh the algorithms and the best-of-breed tools in Python. (Hint: I have all of the answers on this blog; use the search box.)

Post your results in the comments; I’ll cheer you on!

Hang in there; don’t requite up.

Lesson 01: What Is Ensemble Learning?

In this lesson, you will discover what ensemble learning is and why it is important.

Applied machine learning often involves fitting and evaluating models on a dataset.

Given that we cannot know which model will perform weightier on the dataset beforehand, this may involve a lot of trial and error until we find a model that performs well or weightier for our project.

An unorganized tideway is to prepare multiple variegated models, then combine their predictions.

This is tabbed an ensemble machine learning model, or simply an ensemble, and the process of finding a well-performing ensemble model is referred to as “ensemble learning.”

Although there is nearly an unlimited number of ways that this can be achieved, there are perhaps three classes of ensemble learning techniques that are most wontedly discussed and used in practice.

Their popularity is due in large part to their ease of implementation and success on a wide range of predictive modeling problems.

They are:

  • Bagging, e.g. unsober visualization trees and random forest.
  • Boosting, e.g. adaboost and gradient boosting
  • Stacking, e.g. voting and using a meta-model.

There are two main reasons to use an ensemble over a each model, and they are related; they are:

  • Reliability: Ensembles can reduce the variance of the predictions.
  • Skill: Ensembles can unzip largest performance than a each model.

These are both important concerns on a machine learning project and sometimes we may prefer one or both properties from a model.

Your Task

For this lesson, you must list three applications of ensemble learning.

These may be famous examples, like a machine learning competition, or examples you have come wideness in tutorials, books, or research papers.

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate a bagging ensemble.

Lesson 02: Bagging Ensembles

In this lesson, you will discover the bootstrap aggregation, or bagging, ensemble.

Bagging works by creating samples of the training dataset and fitting a visualization tree on each sample.

The differences in the training datasets result in differences in the fit visualization trees, and in turn, differences in predictions made by those trees. The predictions made by the ensemble members are then combined using simple statistics, such as voting or averaging.

Key to the method is the manner in which each sample of the dataset is prepared to train ensemble members. Examples (rows) are drawn from the dataset at random, although with replacement. Replacement ways that if a row is selected, it is returned to the training dataset for potential re-selection in the same training dataset.

This is tabbed a bootstrap sample, giving the technique its name.

Bagging is misogynist in scikit-learn via the BaggingClassifier and BaggingRegressor classes, which use a visualization tree as the base-model by default and you can specify the number of trees to create via the “n_estimators” argument.

The well-constructed example of evaluating a bagging ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of using increasingly visualization trees in the ensemble or plane transpiration the wiring learner that is used.

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate a random forest ensemble.

Lesson 03: Random Forest Ensemble

In this lesson, you will discover the random forest ensemble.

Random forest is an extension of the bagging ensemble.

Like bagging, the random forest ensemble fits a visualization tree on variegated bootstrap samples of the training dataset.

Unlike bagging, random forest will moreover sample the features (columns) of each dataset.

Specifically, split points are chosen in the data while constructing each visualization tree. Rather than considering all features when choosing a split point, random forest limits the features to a random subset of features, such as 3 if there were 10 features.

The random forest ensemble is misogynist in scikit-learn via the RandomForestClassifier and RandomForestRegressor classes. You can specify the number of trees to create via the “n_estimators” treatise and the number of randomly selected features to consider at each split point via the “max_features” argument, which is set to the square root of the number of features in your dataset by default.

The well-constructed example of evaluating a random forest ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of using increasingly visualization trees in the ensemble or tuning the number of features to consider at each split point.

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate an AdaBoost ensemble.

Lesson 04: AdaBoost Ensemble

In this lesson, you will discover the adaptive boosting or AdaBoost ensemble.

Boosting involves subtracting models sequentially to the ensemble where new models struggle to correct the errors made by prior models once widow to the ensemble. As such, the increasingly ensemble members that are added, the fewer errors the ensemble is expected to make, at least to a limit supported by the data and surpassing overfitting the training dataset.

The idea of boosting was first ripened as a theoretical idea, and the AdaBoost algorithm was the first successful tideway to realizing a boosting-based ensemble algorithm.

AdaBoost works by fitting visualization trees on versions of the training dataset weighted so that the tree pays increasingly sustentation to examples (rows) that the prior members got wrong, and less sustentation to those that the prior models got correct.

Rather than full visualization trees, AdaBoost uses very simple trees that make a each visualization on one input variable surpassing making a prediction. These short trees are referred to as visualization stumps.

AdaBoost is misogynist in scikit-learn via the AdaBoostClassifier and AdaBoostRegressor classes, which use a visualization tree (decision stump) as the base-model by default and you can specify the number of trees to create via the “n_estimators” argument.

The well-constructed example of evaluating an AdaBoost ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of using increasingly visualization trees in the ensemble or plane transpiration the wiring learner that is used (note, it must support weighted training data).

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate a gradient boosting ensemble.

Lesson 05: Gradient Boosting Ensemble

In this lesson, you will discover the gradient boosting ensemble.

Gradient boosting is a framework for boosting ensemble algorithms and an extension to AdaBoost.

It re-frames boosting as an ingredient model under a statistical framework and allows for the use of wrong-headed loss functions to make it increasingly flexible and loss penalties (shrinkage) to reduce overfitting.

Gradient boosting moreover introduces ideas of bagging to the ensemble members, such as sampling of the training dataset rows and columns, referred to as stochastic gradient boosting.

It is a very successful ensemble technique for structured or tabular data, although it can be slow to fit a model given that models are widow sequentially. Increasingly efficient implementations have been developed, such as the popular lattermost gradient boosting (XGBoost) and light gradient boosting machines (LightGBM).

Gradient boosting is misogynist in scikit-learn via the GradientBoostingClassifier and GradientBoostingRegressor classes, which use a visualization tree as the base-model by default. You can specify the number of trees to create via the “n_estimators” treatise and the learning rate that controls the contribution from each tree via the “learning_rate” treatise that defaults to 0.1.

The well-constructed example of evaluating a gradient boosting ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of using increasingly visualization trees in the ensemble or try variegated learning rate values.

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate a voting ensemble.

Lesson 06: Voting Ensemble

In this lesson, you will discover the voting ensemble.

Voting ensembles use simple statistics to combine the predictions from multiple models.

Typically, this involves fitting multiple variegated model types on the same training dataset, then gingerly the stereotype prediction in the specimen of regression or the matriculation label with the most votes for classification, tabbed nonflexible voting.

Voting can moreover be used when predicting the probability of matriculation labels on nomenclature problems by summing predicted probabilities and selecting the label with the largest summed probability. This is tabbed soft voting and is preferred when the base-models used in the ensemble natively support predicting matriculation probabilities as it can result in largest performance.

Voting ensembles are misogynist in scikit-learn via the VotingClassifier and VotingRegressor classes. A list of base-models can be provided as an treatise to the model and each model in the list must be a tuple with a name and the model, e.g. (‘lr’, LogisticRegression()). The type of voting used for nomenclature can be specified via the “voting” treatise and set to either ‘soft‘ or ‘hard‘.

The well-constructed example of evaluating a voting ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of trying variegated types of models in the ensemble or plane transpiration the type of voting from soft voting to nonflexible voting.

Post your wordplay in the comments below. I would love to see what you come up with.

In the next lesson, you will discover how to develop and evaluate a stacking ensemble.

Lesson 07: Stacking Ensemble

In this lesson, you will discover the stacked generalization or stacking ensemble.

Stacking involves combining the predictions of multiple variegated types of base-models, much like voting.

The important difference from voting is that flipside machine learning model is used to learn how to weightier combine the predictions of the base-models. This is often a linear model, such as a linear regression for regression problems or logistic regression for classification, but can be any machine learning model you like.

The meta-model is trained on the predictions made by base-models on out-of-sample data.

This involves using k-fold cross-validation for each base-model and storing all of the out-of-fold predictions. The base-models are then trained on the unshortened training dataset, and the meta-model is trained on the out-of-fold predictions and learns which model to trust, the stratum to trust them, and under which circumstances.

Although internally stacking uses k-fold cross-validation to train the meta model, you can evaluate stacking models any way you like, such as via a train-test split or k-fold cross-validation. The evaluation of the model is separate from this internal resampling-for-training process.

Stacking ensembles are misogynist in scikit-learn via the StackingClassifier and StackingRegressor classes. A list of base-models can be provided as an treatise to the model and each model in the list must be a tuple with a name and the model, e.g. (‘lr’, LogisticRegression()). The meta-learner can be specified via the “final_estimator” treatise and the resampling strategy can be specified via the “cv” treatise and can be simply set to an integer indicating the number of cross-validation folds.

The well-constructed example of evaluating a stacking ensemble for nomenclature is listed below.

Your Task

For this lesson, you must run the example and review the results of the evaluated model.

For bonus points, evaluate the effect of trying variegated types of models in the ensemble and variegated meta-models to combine the predictions.

Post your wordplay in the comments below. I would love to see what you come up with.

This was the final lesson.

The End!
(Look How Far You Have Come)

You made it. Well done!

Take a moment and squint when at how far you have come.

You discovered:

  • What ensemble learning is and why you would use it on a predictive modeling project.
  • How to use a bootstrap aggregation, or bagging, ensemble.
  • How to use a random forest ensemble as an extension to bagging.
  • How to use an adaptive boosting or adaboost ensemble.
  • How to use a gradient boosting ensemble.
  • How to combine the predictions of models using a voting ensemble.
  • How to learn how to combine the predictions of models using a stacking ensemble.

Summary

How did you do with the mini-course?
Did you enjoy this crash course?

Do you have any questions? Were there any sticking points?
Let me know. Leave a scuttlebutt below.

Get a Handle on Modern Ensemble Learning!

Ensemble Learning Algorithms With Python

Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

Bring Modern Ensemble Learning Techniques to
Your Machine Learning Projects


See What's Inside