Last Updated on May 8, 2021

Weighted stereotype ensembles seem that some models in the ensemble have increasingly skill than others and requite them increasingly contribution when making predictions.

The weighted stereotype or weighted sum ensemble is an extension over voting ensembles that seem all models are equally skillful and make the same proportional contribution to predictions made by the ensemble.

Each model is prescribed a stock-still weight that is multiplied by the prediction made by the model and used in the sum or stereotype prediction calculation. The rencontre of this type of ensemble is how to calculate, assign, or search for model weights that result in performance that is largest than any contributing model and an ensemble that uses equal model weights.

In this tutorial, you will discover how to develop Weighted Stereotype Ensembles for nomenclature and regression.

After completing this tutorial, you will know:

• Weighted Stereotype Ensembles are an extension to voting ensembles where model votes are proportional to model performance.
• How to develop weighted stereotype ensembles using the voting ensemble from scikit-learn.
• How to evaluate the Weighted Stereotype Ensembles for nomenclature and regression and personize the models are skillful.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

• Updated May/2021: Stock-still definition of weighted average.

How to Develop a Weighted Stereotype Ensemble With Python
Photo by Alaina McDavid, some rights reserved.

## Tutorial Overview

This tutorial is divided into four parts; they are:

1. Weighted Stereotype Ensemble
2. Develop a Weighted Stereotype Ensemble
3. Weighted Stereotype Ensemble for Classification
4. Weighted Stereotype Ensemble for Regression

## Weighted Stereotype Ensemble

Weighted stereotype or weighted sum ensemble is an ensemble machine learning tideway that combines the predictions from multiple models, where the contribution of each model is weighted proportionally to its sufficiency or skill.

The weighted stereotype ensemble is related to the voting ensemble.

Voting ensembles are well-balanced of multiple machine learning models where the predictions from each model are averaged directly. For regression, this involves gingerly the arithmetic midpoint of the predictions made by ensemble members. For classification, this may involve gingerly the statistical mode (most worldwide matriculation label) or similar voting scheme or summing the probabilities predicted for each matriculation and selecting the matriculation with the largest summed probability.

For increasingly on voting ensembles, see the tutorial:

A limitation of the voting ensemble technique is that it assumes that all models in the ensemble are equally effective. This may not be the specimen as some models may be largest than others, expressly if variegated machine learning algorithms are used to train each model ensemble member.

An volitional to voting is to seem that ensemble members are not all equally capable and instead some models are largest than others and should be given increasingly votes or increasingly of a seat when making a prediction. This provides the motivation for the weighted sum or weighted stereotype ensemble method.

In regression, an stereotype prediction is calculated using the arithmetic mean, such as the sum of the predictions divided by the total predictions made. For example, if an ensemble had three ensemble members, the reductions may be:

• Model 1: 97.2
• Model 2: 100.0
• Model 3: 95.8

The midpoint prediction would be calculated as follows:

• yhat = (97.2 100.0 95.8) / 3
• yhat = 293 / 3
• yhat = 97.666

A weighted stereotype prediction involves first assigning a stock-still weight coefficient to each ensemble member. This could be a floating-point value between 0 and 1, representing a percentage of the weight. It could moreover be an integer starting at 1, representing the number of votes to requite each model.

For example, we may have the stock-still weights of 0.84, 0.87, 0.75 for the ensemble member. These weights can be used to summate the weighted stereotype by multiplying each prediction by the model’s weight to requite a weighted sum, then dividing the value by the sum of the weights. For example:

• yhat = ((97.2 * 0.84) (100.0 * 0.87) (95.8 * 0.75)) / (0.84 0.87 0.75)
• yhat = (81.648 87 71.85) / (0.84 0.87 0.75)
• yhat = 240.498 / 2.46
• yhat = 97.763

We can see that as long as the scores have the same scale, and the weights have the same scale and are maximizing (meaning that larger weights are better), the weighted sum results in a sensible value, and in turn, the weighted stereotype is moreover sensible, meaning the scale of the outcome matches the scale of the scores.

This same tideway can be used to summate the weighted sum of votes for each well-done matriculation label or the weighted sum of probabilities for each matriculation label on a nomenclature problem.

The challenging speciality of using a weighted stereotype ensemble is how to segregate the relative weighting for each ensemble member.

There are many approaches that can be used. For example, the weights may be chosen based on the skill of each model, such as the nomenclature verism or negative error, where large weights midpoint a better-performing model. Performance may be calculated on the dataset used for training or a holdout dataset, the latter of which may be increasingly relevant.

The scores of each model can be used directly or converted into a variegated value, such as the relative ranking for each model. Flipside tideway might be to use a search algorithm to test variegated combinations of weights.

Now that we are familiar with the weighted stereotype ensemble method, let’s squint at how to develop and evaluate them.

### Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

## Develop a Weighted Stereotype Ensemble

In this section, we will develop, evaluate, and use weighted stereotype or weighted sum ensemble models.

We can implement weighted stereotype ensembles manually, although this is not required as we can use the voting ensemble in the scikit-learn library to unzip the desired effect. Specifically, the VotingRegressor and VotingClassifier classes can be used for regression and nomenclature respectively and both provide a “weights” treatise that specifies the relative contribution of each ensemble member when making a prediction.

A list of base-models is provided via the “estimators” argument. This is a Python list where each element in the list is a tuple with the name of the model and the configured model instance. Each model in the list must have a unique name.

For example, we can pinpoint a weighted stereotype ensemble for nomenclature with two ensemble members as follows:

Additionally, the voting ensemble for nomenclature provides the “voting” treatise that supports both nonflexible voting (‘hard‘) for combining well-done matriculation labels and soft voting (‘soft‘) for combining matriculation probabilities when gingerly the weighted sum for prediction; for example:

Soft voting is often preferred if the contributing models support predicting matriculation probabilities, as it often results in largest performance. The same holds for the weighted sum of predicted probabilities.

Now that we are familiar with how to use the voting ensemble API to develop weighted stereotype ensembles, let’s squint at some worked examples.

## Weighted Stereotype Ensemble for Classification

In this section, we will squint at using Weighted Stereotype Ensemble for a nomenclature problem.

First, we can use the make_classification() function to create a synthetic binary nomenclature problem with 10,000 examples and 20 input features.

The well-constructed example is listed below.

Running the example creates the dataset and summarizes the shape of the input and output components.

Next, we can evaluate a Weighted Stereotype Ensemble algorithm on this dataset.

First, we will split the dataset into train and test sets with a 50-50 split. We will then split the full training set into a subset for training the models and a subset for validation.

Next, we will pinpoint a function to create a list of models to use in the ensemble. In this case, we will use a diverse hodgepodge of nomenclature models, including logistic regression, a visualization tree, and naive Bayes.

Next, we need to weigh each ensemble member.

In this case, we will use the performance of each ensemble model on the training dataset as the relative weighting of the model when making predictions. Performance will be calculated using nomenclature verism as a percentage of correct predictions between 0 and 1, with larger values meaning a largest model, and in turn, increasingly contribution to the prediction.

Each ensemble model will first be fit on the training set, then evaluated on the validation set. The verism on the validation set will be used as the model weighting.

The evaluate_models() function unelevated implements this, returning the performance of each model.

We can then undeniability this function to get the scores and use them as a weighting for the ensemble.

We can then fit the ensemble on the full training dataset and evaluate it on the holdout test set.

Tying this together, the well-constructed example is listed below.

Running the example first evaluates each standalone model and reports the verism scores that will be used as model weights. Finally, the weighted stereotype ensemble is fit and evaluated on the test reporting the performance.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the stereotype outcome.

In this case, we can see that the voting ensemble achieved a nomenclature verism of well-nigh 90.960 percent.

Our expectation is that the ensemble will perform largest than any of the contributing ensemble members. The problem is the verism scores for the models used as weightings cannot be directly compared to the performance of the ensemble considering the members were evaluated on a subset of training and the ensemble was evaluated on the test dataset.

We can update the example and add an evaluation of each standalone model for comparison.

We moreover expect the weighted stereotype ensemble to perform largest than an equally weighted voting ensemble.

This can moreover be checked by explicitly evaluating the voting ensemble.

Tying this together, the well-constructed example is listed below.

Running the example first prepares and evaluates the weighted stereotype ensemble as before, then reports the performance of each contributing model evaluated in isolation, and finally the voting ensemble that uses an equal weighting for the contributing models.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the stereotype outcome.

In this case, we can see that the weighted stereotype ensemble performs largest than any contributing ensemble member.

We can moreover see that an equal weighting ensemble (voting) achieved an verism of well-nigh 90.620, which is less than the weighted ensemble that achieved the slightly higher 90.760 percent accuracy.

Next, let’s take a squint at how to develop and evaluate a weighted stereotype ensemble for regression.

## Weighted Stereotype Ensemble for Regression

In this section, we will squint at using Weighted Stereotype Ensemble for a regression problem.

First, we can use the make_regression() function to create a synthetic regression problem with 1,000 examples and 20 input features.

The well-constructed example is listed below.

Running the example creates the dataset and summarizes the shape of the input and output components.

Next, we can evaluate a Weighted Stereotype Ensemble model on this dataset.

First, we can split the dataset into train and test sets, then remoter split the training set into train and validation sets so that we can estimate the performance of each contributing model.

We can pinpoint the list of models to use in the ensemble. In this case, we will use k-nearest neighbors, visualization tree, and support vector regression.

Next, we can update the evaluate_models() function to summate the midpoint wool error (MAE) for each ensemble member on a hold out validation dataset.

We will use the negative MAE scores as a weight where large error values closer to zero indicate a largest performing model.

We can then undeniability this function to get the scores and use them to pinpoint the weighted stereotype ensemble for regression.

We can then fit the ensemble on the unshortened training dataset and evaluate the performance on the holdout test dataset.

We expect the ensemble to perform largest than any contributing ensemble member, and this can be checked directly by evaluating each member model on the full train and test sets independently.

Finally, we moreover expect the weighted stereotype ensemble to perform largest than the same ensemble with an equal weighting. This too can be confirmed.

Tying this together, the well-constructed example of evaluating a weighted stereotype ensemble for regression is listed below.

Running the example first reports the negative MAE of each ensemble member that will be used as scores, followed by the performance of the weighted stereotype ensemble. Finally, the performance of each self-sustaining model is reported withal with the performance of an ensemble with equal weight.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the stereotype outcome.

In this case, we can see that the weighted stereotype ensemble achieved a midpoint wool error of well-nigh 105.158, which is worse (large error) than the standalone kNN model that achieved an error of well-nigh 100.169. We can moreover see that the voting ensemble that assumes an equal weight for each model moreover performs largest than the weighted stereotype ensemble with an error of well-nigh 102.706.

The worse-than-expected performance for the weighted stereotype ensemble might be related to the nomination of how models were weighted.

An unorganized strategy for weighting is to use a ranking to indicate the number of votes that each ensemble has in the weighted average.

For example, the worst-performing model has 1 vote the second-worst 2 votes and the weightier model 3 votes, in the specimen of three ensemble members.

This can be achieved using the argsort() numpy function.

The argsort function returns the indexes of the values in an variety if they were sorted. So, if we had the variety [300, 100, 200], the alphabetize of the smallest value is 1, the alphabetize of the next largest value is 2, and the alphabetize of the next largest value is 0.

Therefore, the argsort of [300, 100, 200] is [1, 2, 0].

We can then argsort the result of the argsort to requite a ranking of the data in the original array. To see how, an argsort of [1, 2, 0] would indicate that alphabetize 2 is the smallest value, followed by alphabetize 0 and ending with alphabetize 1.

Therefore, the argsort of [1, 2, 0] is [2, 0, 1]. Put flipside way, the argsort of the argsort of [300, 100, 200] is [2, 0, 1], which is the relative ranking of each value in the variety if values were sorted in ascending order. That is:

• 300: Has rank 2
• 100: Has rank 0
• 200: Has rank 1

We can make this well-spoken with a small example, listed below.

Running the example first reports the raw data, then the argsort of the raw data and the argsort of the argsort of the raw data.

The results match our transmission calculation.

We can use the argsort of the argsort of the model scores to summate a relative ranking of each ensemble member. If negative midpoint wool errors are sorted in ascending order, then the weightier model would have the largest negative error, and in turn, the highest rank. The worst performing model would have the smallest negative error, and in turn, the lowest rank.

Again, we can personize this with a worked example.

Running the example, we can see that the first model has the weightier score (-10) and the second model has the worst score (-100).

The argsort of the argsort of the scores shows that the weightier model gets the highest rank (most votes) with a value of 2 and the worst model gets the lowest rank (least votes) with a value of 0.

In practice, we don’t want any model to have zero votes considering it would be excluded from the ensemble. Therefore, we can add 1 to all rankings.

After gingerly the scores, we can summate the argsort of the argsort of the model scores to requite the rankings. Then use the model rankings as the model weights for the weighted stereotype ensemble.

Tying this together, the well-constructed example of a weighted stereotype ensemble for regression with model ranking used as model weighs is listed below.

Running the example first scores each model, then converts the scores into rankings. The weighted stereotype ensemble using ranking is then evaluated and compared to the performance of each standalone model and the ensemble with equally weighted models.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the stereotype outcome.

In this case, we can see that the ranking was performed as expected, with the best-performing member kNN with a score of 101 is prescribed the rank of 3, and the other models are ranked accordingly. We can see that the weighted stereotype ensemble achieved the MAE of well-nigh 96.692, which is largest than any individual model and the unweighted voting ensemble.

This highlights the importance of exploring volitional approaches for selecting model weights in the ensemble.

This section provides increasingly resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered how to develop Weighted Stereotype Ensembles for nomenclature and regression.

Specifically, you learned:

• Weighted Stereotype Ensembles are an extension to voting ensembles where model votes are proportional to model performance.
• How to develop weighted stereotype ensembles using the voting ensemble from scikit-learn.
• How to evaluate the Weighted Stereotype Ensembles for nomenclature and regression and personize the models are skillful.

Do you have any questions?

## Get a Handle on Modern Ensemble Learning!

#### Improve Your Predictions in Minutes

...with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner, and much more...

#### Bring Modern Ensemble Learning Techniques to Your Machine Learning Projects

See What's Inside