# A Gentle Introduction to Multiple-Model Machine Learning An ensemble learning method involves combining the predictions from multiple contributing models.

Nevertheless, not all techniques that make use of multiple machine learning models are ensemble learning algorithms.

It is worldwide to divide a prediction problem into subproblems. For example, some problems naturally subdivide into self-sustaining but related subproblems and a machine learning model can be prepared for each. It is less well-spoken whether these represent examples of ensemble learning, although we might distinguish these methods from ensembles given the inability for a contributing ensemble member to produce a solution (however weakly) to the overall prediction problem.

In this tutorial, you will discover multiple-model techniques for machine learning and their relationship to ensemble learning.

After completing this tutorial, you will know:

• Multiple-model machine learning refers to techniques that use multiple models in some way that closely resembles ensemble learning.
• Use of multiple models for multi-class nomenclature and multi-output regression differ from ensembles in that no contributing member can solve the problem.
• Mixture of experts might be considered a true ensemble method, although hybrid machine learning models are probably not ensemble learning methods.

Kick-start your project with my new typesetting Ensemble Learning Algorithms With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started. A Gentle Introduction to Multiple-Model Machine Learning
Photo by Tobias Begemann, some rights reserved.

## Tutorial Overview

This tutorial is divided into five parts; they are:

1. Multiple-Model Techniques
2. Multiple Models for Multi-Class Classification
3. Multiple Models for Multi-Output Regression
4. Multiple Expert Models
5. Hybrids Constructed From Multiple Models

## Multiple-Model Techniques

Ensemble learning is concerned with approaches that combine predictions from two or increasingly models.

We can typify a model as an ensemble learning technique if it has two properties, such as:

• Comprising two or increasingly models.
• Predictions are combined.

We might moreover suggest that the goal of an ensemble model is to modernize predictions over any contributing member. Although a lesser goal might be to modernize the stability of the model, e.g. reduce the variance in the predictions or prediction errors.

Nevertheless, there are models and model architectures that contain elements of ensemble learning methods, but it is not well-spoken as to whether they may be considered ensemble learning or not.

For example, we might pinpoint an ensemble learning technique as stuff well-balanced of two or increasingly models. The problem is that there may be techniques that have increasingly than two models, yet do not combine their predictions. Alternatively, they may combine their predictions in unexpected ways.

There are some methods which try to make use of multiple learners, yet in a strict sense they can not be recognized as ensemble combination methods.

— Page 89, Ensemble Methods, 2012.

For a lack of a largest name, we will refer to these as “multiple-model techniques” to help differentiate them from ensemble learning methods. Yet, as we will see, the line that separates these two types of machine learning methods is not that clear.

• Multiple-Model Techniques: Machine learning algorithms that are well-balanced of multiple models and combine the techniques but might not be considered ensemble learning.

As such, it is important to review and explore multiple-model techniques that sit on the verge of ensemble learning to both largest understand ensemble learning and to yank upon related ideas that may modernize the ensemble learning models we create.

There are predictive modeling problems where the structure of the problem itself may suggest the use of multiple models.

Typically, these are problems that can be divided naturally into sub-problems. This does not midpoint that dividing the problems into subproblems is the weightier solution for a given example; it only ways that the problem naturally lends itself to decomposition.

Two examples are multi-class nomenclature and multiple-output regression.

### Want to Get Started With Ensemble Learning?

Take my self-ruling 7-day email crash undertow now (with sample code).

Click to sign-up and moreover get a self-ruling PDF Ebook version of the course.

## Multiple Models for Multi-Class Classification

Classification problems involve assigning a matriculation label to input examples.

Binary nomenclature tasks are those that have two classes. One visualization is made for each example, either assigning it to one matriculation or another. If modeled using probability, a each probability of the example stuff to one matriculation is predicted, where the inverse is the probability for the second class, tabbed a binomial probability distribution.

More than two classes can introduce a challenge. The techniques planned for two classes can be extended to multiple classes, and sometimes, this is straightforward.

• Multi-Class Classification: Assign one among increasingly than matriculation labels to a given input example.

Alternatively, the problem can be naturally partitioned into multiple binary nomenclature tasks.

There are many ways this can be achieved.

For example, the classes can be grouped into multiple one-vs-rest prediction problems. A model can then be fit for each subproblem and typically the same algorithm type is used for each model. When a prediction is required for a new example, then the model that responds increasingly strongly than the other models can assign a prediction. This is tabbed a one-vs-rest (OvR) or one-vs-all (OvA) approach.

• OvR: A technique that splits a multi-class nomenclature into one binary nomenclature problem per class.

The multi-class nomenclature problem can be divided into multiple pairs of classes, and a model fit on each. Again, a prediction for a new example can be chosen from the model that responds increasingly strongly. This is referred to as one-vs-one (OvO).

• OvR: A technique that splits a multi-class nomenclature into one binary nomenclature problem per each pair of classes.

For increasingly on one-vs-rest and one-vs-one classification, see the tutorial:

This tideway of partitioning a multi-class nomenclature problem into multiple binary nomenclature problems can be generalized. Each matriculation can be mapped to a unique binary string for the matriculation with summarily length. One classifier can then be fit to predict each bit in the bit string, permitting an wrong-headed number of classifiers to be used.

The bit string can then be mapped to the matriculation label with the closest match. The spare shit act like error-correcting codes, improving the performance of the tideway in some cases over simpler OvR and OvO methods. This tideway is referred to as Error-Correcting Output Codes, ECOC.

• ECOC: A technique that splits a multi-class nomenclature into an wrong-headed number of binary nomenclature problems.

For increasingly on this approach, see the tutorial:

In each of these cases, multiple models are used, just like an ensemble. Predictions are moreover combined, like an ensemble method, although in a winner-take-all method rather than a vote or weighted sum. Technically, this is a combination method, but unlike most typical ensemble learning methods.

Unlike ensemble learning, these techniques are planned to explore the natural decomposition of the prediction problem and leverage binary nomenclature problems that may not hands scale to multiple classes.

Whereas ensemble learning is not concerned with opening up new capabilities and is typically only focused on improved predictive performance over contributing models. With techniques like OvR, OvR, and ECOC, the contributing models cannot be used to write the prediction problem in isolation, by definition.

## Multiple Models for Multi-Output Regression

Regression problems involve predicting a numerical value given an input example.

Typically, a each output value is predicted. Nevertheless, there are regression problems where multiple numeric values must be predicted for each input example. These problems are referred to as multiple-output regression problems.

• Multiple-Output Regression: Predict two or increasingly numeric outputs given an input.

Models can be ripened to predict all target values at once, although a multi-output regression problem is flipside example of a problem that can be naturally divided into subproblems.

Like binary nomenclature in the previous section, most techniques for regression predictive modeling were planned to predict a each value. Predicting multiple values can pose a problem and requires the modification of the technique. Some techniques cannot be reasonably modified for multiple values.

One tideway is to develop a separate regression model to predict each target value in a multi-output regression problem. Typically, the same algorithm type is used for each model. For example, a multi-output regression with three target values would involve fitting three models, one for each target.

When a prediction is required, the same input pattern is provided to each model and the explicit target for each model is predicted and together represent the vector output of the method.

• Multi-Output Regression: A technique where one regression model is used for each target in a multi-output regression problem.

Another related tideway is to create a sequential uniting of regression models. The difference is that the output of the first model predicts the first output target value, but this value is used as part of the input to the second model in the uniting in order to predict the second output target value, and so on.

As such, the uniting introduces a linear dependence between the regression models, permitting the outputs of models later in the uniting to be provisionary on the outputs of prior models in the chain.

• Regression Chain: A technique where a sequential uniting of regression models is used to predict each target in a multi-output regression problem, one model later in the uniting uses values predicted by models older in the chain.

For increasingly on multiple-output regression, see the tutorial:

In each case, multiple regression models are used, just like an ensemble.

A possible difference from ensembles is that the predictions made by each model are not combined directly. We could stretch the definition of “combining predictions” to imbricate this approach, however. For example, the predictions are concatenated in the specimen of multi-output regression models and indirectly via the provisionary tideway in chained regression.

The key difference from ensemble learning methods is that no contributing ensemble member can solve the prediction problem alone. A solution can only be achieved by combining the predictions from all members.

## Multiple Expert Models

So far, we have looked at dividing problems into subtasks based on the structure of what is stuff predicted.

There are moreover problems that can be naturally divided into subproblems based on the input data. This might be as simple as partitions of the input full-length space, or something increasingly elaborate, such as dividing an image into the foreground and preliminaries and developing a model for each.

A increasingly unstipulated tideway for this from the field of neural networks is referred to as a mixture of experts(MoE).

The tideway involves first dividing the learning task into subtasks, developing an expert model for each subtask, using a gating model to decide or learn which expert to use for each example and the pool the outputs of the experts, and gating model together to make a final prediction.

• MoE: A technique that develops an expert model for each subtask and learns how much to trust each expert when making a prediction for explicit examples.

For increasingly on mixture of experts, see the tutorial:

Two aspects of MoE make the method unique. The first is the explicit partitioning of the input full-length space, and the second is the use of a gating network or gating model that learns which expert to trust in each situation, e.g, each input case.

Unlike the previous examples for multi-class nomenclature and multi-output regression that divided the target into subproblems, the contributing members in a mixture of experts model can write the whole problem, at least partially or to some degree. Although an expert may not be tailored to a explicit input, it can still be used to make a prediction on something outside of its zone of expertise.

Also, unlike those previously looked methods, a mixture of experts moreover combines the predictions of all contributing members using a weighted sum, albeit measured by the gating network.

As such, it increasingly closely resembles increasingly familiar ensemble learning techniques, such as stacked generalization, known as stacking.

## Hybrids Constructed From Multiple Models

Another type of machine learning that involves the use of multiple models and is loosely related to ensemble learning is hybrid models.

Hybrid models are those models that combine two or increasingly models explicitly. As such, the definition of what does and does not constitute a hybrid model can be vague.

• Hybrid Model: A technique that combines two or increasingly variegated machine learning models in some way.

For example, an autoencoder neural network that learns how to shrink input patterns to a stickup layer, the output of which is then fed to flipside model, such as a support vector machine, would be considered a hybrid machine learning model.

This example has two machine learning models, a neural network and a support vector machine. It just so happens that the models are linearly stacked one on top of flipside into a pipeline and the last model in the pipeline makes a prediction.

Consider an ensemble learning method that has multiple contributing ensemble members of variegated types (e.g. a logistic regression and a support vector machine) and uses voting to stereotype their predictions. This ensemble too might be considered a hybrid machine learning model, under the broader definition.

Perhaps the key difference between ensemble learning and hybrid machine learning is the need in hybrid models to use models of differing types. Whereas in ensemble learning, contributing members to the ensemble may be of any type.

Further, it is increasingly likely that a hybrid machine learning will graft one or increasingly models onto flipside wiring model, which is quite variegated from fitting separate models and combining their predictions as we do in ensemble learning.

This section provides increasingly resources on the topic if you are looking to go deeper.

## Summary

In this tutorial, you discovered multiple-model techniques for machine learning and their relationship to ensemble learning.

Specifically, you learned:

• Multiple-model machine learning refers to techniques that use multiple models in some way that closely resembles ensemble learning.
• Use of multiple models for multi-class nomenclature and multi-output regression differ from ensembles in that no contributing member can solve the problem.
• Mixture of experts might be considered a true ensemble method, although hybrid machine learning models are probably not ensemble learning methods.

Do you have any questions?

## Get a Handle on Modern Ensemble Learning! #### Improve Your Predictions in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Ensemble Learning Algorithms With Python

It provides self-study tutorials with full working code on:
Stacking, Voting, Boosting, Bagging, Blending, Super Learner,
and much more…

#### Bring Modern Ensemble Learning Techniques to Your Machine Learning Projects

See What’s Inside 