Last Updated on April 27, 2021

Ensemble methods involve combining the predictions from multiple models.

The **combination of the predictions** is a inside part of the ensemble method and depends heavily on the types of models that contribute to the ensemble and the type of prediction problem that is stuff modeled, such as a nomenclature or regression.

Nevertheless, there are worldwide or standard techniques that can be used to combine predictions that can be hands implemented and often result in good or weightier predictive performance.

In this post, you will discover worldwide techniques for combining predictions for ensemble learning.

After reading this post, you will know:

- Combining predictions from contributing models is a key property of an ensemble model.
- Voting techniques are most wontedly used when combining predictions for classification.
- Statistical techniques are most wontedly used when combining predictions for regression.

**Kick-start your project** with my new typesetting Ensemble Learning Algorithms With Python, including *step-by-step tutorials* and the *Python source code* files for all examples.

Let’s get started.

## Tutorial Overview

This tutorial is divided into three parts; they are:

- Combining Predictions for Ensemble Learning
- Combining Nomenclature Predictions
- Combining Predicted Matriculation Labels
- Combining Predicted Matriculation Probabilities

- Combining Regression Predictions

## Combining Predictions for Ensemble Learning

A key part of an ensemble learning method involves combining the predictions from multiple models.

It is through the combination of the predictions that the goody of the ensemble learning method is achieved, namely largest predictive performance. As such, there are many ways that predictions can be combined, so much so that it is an unshortened field of study.

After generating a set of wiring learners, rather than trying to find the weightier each learner, ensemble methods resort to combination to unzip a strong generalization ability, where the combination method plays a crucial role.

— Page 67, Ensemble Methods, 2012.

Standard ensemble machine learning algorithms do prescribe how to combine predictions; nevertheless, it is important to consider the topic in isolation for a number of reasons, such as:

- Interpreting the predictions made by standard ensemble algorithms.
- Manually specifying a custom prediction combination method for an algorithm.
- Developing your own ensemble methods.

Ensemble learning methods are typically not very ramified and developing your own ensemble method or specifying the manner in which predictions are combined is relatively easy and worldwide practice.

The way that predictions are combined depends on the models that are making predictions and the type of prediction problem.

The strategy used in this step depends, in part, on the type of classifiers used as ensemble members. For example, some classifiers, such as support vector machines, provide only discrete-valued label outputs.

— Page 6, Ensemble Machine Learning, 2012.

For example, the form of the predictions made by the models will match the type of prediction problem, such as regression for predicting numbers and nomenclature for predicting matriculation labels. Additionally, some model types may be only worldly-wise to predict a matriculation label or matriculation probability distribution, whereas others may be worldly-wise to support both for a nomenclature task.

We will use this semester of prediction type based on problem type as the understructure for exploring the worldwide techniques used to combine predictions from contributing models in an ensemble.

In the next section, we will take a squint at how to combine predictions for nomenclature predictive modeling tasks.

## Combining Nomenclature Predictions

Classification refers to predictive modeling problems that involve predicting a matriculation label given an input.

The prediction made by a model may be a well-done matriculation label directly or may be a probability that an example belongs to each class, referred to as the probability of matriculation membership.

The performance of a nomenclature problem is often measured using verism or a related count or ratio of correct predictions. In the specimen of evaluating predicted probabilities, they may be converted to well-done matriculation labels by selecting a cut-off threshold, or evaluated using specialized metrics such as cross-entropy.

We will review combining predictions for nomenclature separately for both matriculation labels and probabilities.

### Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

Download Your FREE Mini-Course

### Combining Predicted Matriculation Labels

A predicted matriculation label is often mapped to something meaningful to the problem domain.

For example, a model may predict a verisimilitude such as “*red*” or “*green*“. Internally though, the model predicts a numerical representation for the matriculation label such as 0 for “*red*“, 1 for “*green*“, and 2 for “*blue*” for our verisimilitude nomenclature example.

Methods for combining matriculation labels are perhaps easier to consider if we work with the integer encoded matriculation labels directly.

Perhaps the simplest, most common, and often most constructive tideway is to combine the predictions by voting.

Voting is the most popular and fundamental combination method for nominal outputs.

— Page 71, Ensemble Methods, 2012.

Voting often involves each model that makes a prediction assigning a vote for the matriculation that was predicted. The votes are tallied and an outcome is then chosen using the votes or tallies in some way.

There are many types of voting, so let’s squint at the four most common:

- Plurality Voting.
- Majority Voting.
- Unanimous Voting.
- Weighted Voting.

Simple voting, tabbed **plurality voting**, selects the matriculation label with the most votes.

If two or increasingly classes have the same number of votes, then the tie is wrenched arbitrarily, although in a resulting manner, such as sorting the matriculation labels that have a tie and selecting the first, instead of selecting one randomly. This is important so that the same model with the same data unchangingly makes the same prediction.

Given ties, it is worldwide to have an odd number of ensemble members in an struggle to automatically unravel ties, as opposed to an plane number of ensemble members where ties may be increasingly likely.

From a statistical perspective, this is tabbed the mode or the most worldwide value from the hodgepodge of predictions.

For example, consider the three predictions made by a model for a three-class verisimilitude prediction problem:

- Model 1 predicts “
*green*” or 1. - Model 2 predicts “
*green*” or 1. - Model 3 predicts “
*red*” or 0.

The votes are, therefore:

- Red Votes: 1
- Green Votes: 2
- Blue Votes: 0

The prediction would be “*green*” given it has the most votes.

**Majority voting** selects the matriculation label that has increasingly than half the votes. If no matriculation has increasingly than half the votes, then a “*no prediction*” is made. Interestingly, majority voting can be proven to be an optimal method for combining classifiers, if they are independent.

If the classifier outputs are independent, then it can be shown that majority voting is the optimal combination rule.

— Page 1, Ensemble Machine Learning, 2012.

**Unanimous voting** is related to majority voting in that instead of requiring half the votes, the method requires all models to predict the same value, otherwise, no prediction is made.

**Weighted voting** weighs the prediction made by each model in some way. One example would be to weigh predictions based on the stereotype performance of the model, such as nomenclature accuracy.

The weight of each classifier can be set proportional to its verism performance on a validation set.

— Page 67, Pattern Nomenclature Using Ensemble Methods, 2010.

Assigning weights to classifiers can wilt a project in and of itself and could involve using an optimization algorithm and a holdout dataset, a linear model, or plane flipside machine learning model entirely.

So, how do we assign the weights? If we knew, a priori, which classifiers would work better, we would only use those classifiers. In the sparsity of such information, a plausible and wontedly used strategy is to use the performance of a classifier on a separate validation (or plane training) dataset, as an estimate of that classifier’s generalization performance.

— Page 8, Ensemble Machine Learning, 2012.

The idea of weighted voting is that some classifiers are increasingly likely to be well-judged than others and we should reward them by giving them a larger share of the votes.

If we have reason to believe that some of the classifiers are increasingly likely to be correct than others, weighting the decisions of those classifiers increasingly heavily can remoter modernize the overall performance compared to that of plurality voting.

— Page 7, Ensemble Machine Learning, 2012.

### Combining Predicted Matriculation Probabilities

Probabilities summarize the likelihood of an event as a numerical value between 0.0 and 1.0.

When predicted for matriculation membership, it involves a probability prescribed for each class, together summing to the value 1.0; for example, a model may predict:

- Red: 0.75
- Green: 0.10
- Blue: 0.15

We can see that matriculation “*red*” has the highest probability or is the most likely outcome predicted by the model and that the distribution of probabilities wideness the classes (0.75 0.10 0.15) sum to 1.0.

The way that the probabilities are combined depends on the outcome that is required.

For example, if probabilities are required, then the self-sustaining predicted probabilities can be combined directly.

Perhaps the simplest tideway for combining probabilities is to sum the probabilities for each matriculation and pass the predicted values through a softmax function. This ensures that the scores are thus normalized, meaning the probabilities wideness the matriculation labels sum to 1.0.

… such outputs – upon proper normalization (such as softmax normalization […]) – can be interpreted as the stratum of support given to that class

— Page 8, Ensemble Machine Learning, 2012.

More wontedly we wish to predict a matriculation label from predicted probabilities.

The most worldwide tideway is to use voting, where the predicted probabilities represent the vote made by each model for each class. Votes are then summed and a voting method from the previous section can be used, such as selecting the label with the largest summed probabilities or the largest midpoint probability.

- Vote Using Midpoint Probabilities
- Vote Using Sum Probabilities
- Vote Using Weighted Sum Probabilities

Generally, this tideway to treating probabilities as votes for choosing a matriculation label is referred to as soft voting.

If all the individual classifiers are treated equally, the simple soft voting method generates the combined output by simply averaging all the individual outputs …

— Page 76, Ensemble Methods, 2012.

## Combining Regression Predictions

Regression refers to predictive modeling problems that involve predicting a numeric value given an input.

The performance for a regression problem is often measured using stereotype error, such as midpoint wool error or root midpoint squared error.

Combining numerical predictions often involves using simple statistical methods; for example:

- Mean Predicted Value
- Median Predicted Value

Both requite the inside tendency of the distribution of predictions.

Averaging is the most popular and fundamental combination method for numeric outputs.

— Page 68, Ensemble Methods, 2012.

The mean, moreover tabbed the average, is the normalized sum of the predictions. The Midpoint Predicted Value is increasingly towardly when the distribution of predictions is Gaussian or nearly Gaussian.

For example, the midpoint is calculated as the sum of predicted values divided by the total number of predictions. If three models predicted the pursuit prices:

- Model 1: 99.00
- Model 2: 101.00
- Model 3: 98.00

The midpoint predicted would be calculated as:

- Mean Prediction = (99.00 101.00 98.00) / 3
- Mean Prediction = 298.00 / 3
- Mean Prediction = 99.33

Owing to its simplicity and effectiveness, simple averaging is among the most popularly used methods and represents the first nomination in many real applications.

— Page 69, Ensemble Methods, 2012.

The median is the middle value if all predictions were ordered and is moreover referred to as the fifty-th percentile. The Median Predicted Value is increasingly towardly to use when the distribution of predictions is not known or does not follow a Gaussian probability distribution.

Depending on the nature of the prediction problem, a inobtrusive prediction may be desired, such as the maximum or the minimum. Additionally, the distribution can be summarized to requite a measure of uncertainty, such as reporting three values for each prediction:

- Minimum Predicted Value
- Median Predicted Value
- Maximum Predicted Value

As with classification, the predictions made by each model can be weighted by expected model performance or some other value, and the weighted midpoint of the predictions can be reported.

## Further Reading

This section provides increasingly resources on the topic if you are looking to go deeper.

### Books

### Articles

## Summary

In this post, you discovered worldwide techniques for combining predictions for ensemble learning.

Specifically, you learned:

- Combining predictions from contributing models is a key property of an ensemble model.
- Voting techniques are most wontedly used when combining predictions for classification.
- Statistical techniques are most wontedly used when combining predictions for regression.

**Do you have any questions?**

Ask your questions in the comments unelevated and I will do my weightier to answer.