How to Choose an Optimization Algorithm

 

Optimization is the problem of finding a set of inputs to an objective function that results in a maximum or minimum function evaluation.

It is the challenging problem that underlies many machine learning algorithms, from fitting logistic regression models to training artificial neural networks.

There are perhaps hundreds of popular optimization algorithms, and perhaps tens of algorithms to choose from in popular scientific code libraries. This can make it challenging to know which algorithms to consider for a given optimization problem.

In this tutorial, you will discover a guided tour of different optimization algorithms.

After completing this tutorial, you will know:

  • Optimization algorithms may be grouped into those that use derivatives and those that do not.
  • Classical algorithms use the first and sometimes second derivative of the objective function.
  • Direct search and stochastic algorithms are designed for objective functions where function derivatives are unavailable.

Let’s get started.

How to Choose an Optimization Algorithm

 

How to Choose an Optimization Algorithm
Photo by Matthewjs007, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

  1. Optimization Algorithms
  2. Differentiable Objective Function
  3. Non-Differential Objective Function

Optimization Algorithms

Optimization refers to a procedure for finding the input parameters or arguments to a function that result in the minimum or maximum output of the function.

The most common type of optimization problems encountered in machine learning are continuous function optimization, where the input arguments to the function are real-valued numeric values, e.g. floating point values. The output from the function is also a real-valued evaluation of the input values.

We might refer to problems of this type as continuous function optimization, to distinguish from functions that take discrete variables and are referred to as combinatorial optimization problems.

There are many different types of optimization algorithms that can be used for continuous function optimization problems, and perhaps just as many ways to group and summarize them.

One approach to grouping optimization algorithms is based on the amount of information available about the target function that is being optimized that, in turn, can be used and harnessed by the optimization algorithm.

Generally, the more information that is available about the target function, the easier the function is to optimize if the information can effectively be used in the search.

Perhaps the major division in optimization algorithms is whether the objective function can be differentiated at a point or not. That is, whether the first derivative (gradient or slope) of the function can be calculated for a given candidate solution or not. This partitions algorithms into those that can make use of the calculated gradient information and those that do not.

  • Differentiable Target Function?
    • Algorithms that use derivative information.
    • Algorithms that do not use derivative information.

We will use this as the major division for grouping optimization algorithms in this tutorial and look at algorithms for differentiable and non-differentiable objective functions.

Note: this is not an exhaustive coverage of algorithms for continuous function optimization, although it does cover the major methods that you are likely to encounter as a regular practitioner.

Differentiable Objective Function

A differentiable function is a function where the derivative can be calculated for any given point in the input space.

The derivative of a function for a value is the rate or amount of change in the function at that point. It is often called the slope.

  • First-Order Derivative: Slope or rate of change of an objective function at a given point.

The derivative of the function with more than one input variable (e.g. multivariate inputs) is commonly referred to as the gradient.

  • Gradient: Derivative of a multivariate continuous objective function.

A derivative for a multivariate objective function is a vector, and each element in the vector is called a partial derivative, or the rate of change for a given variable at the point assuming all other variables are held constant.

  • Partial Derivative: Element of a derivative of a multivariate objective function.

We can calculate the derivative of the derivative of the objective function, that is the rate of change of the rate of change in the objective function. This is called the second derivative.

  • Second-Order Derivative: Rate at which the derivative of the objective function changes.

For a function that takes multiple input variables, this is a matrix and is referred to as the Hessian matrix.

  • Hessian matrix: Second derivative of a function with two or more input variables.

Simple differentiable functions can be optimized analytically using calculus. Typically, the objective functions that we are interested in cannot be solved analytically.

Optimization is significantly easier if the gradient of the objective function can be calculated, and as such, there has been a lot more research into optimization algorithms that use the derivative than those that do not.

Some groups of algorithms that use gradient information include:

  • Bracketing Algorithms
  • Local Descent Algorithms
  • First-Order Algorithms
  • Second-Order Algorithms

Note: this taxonomy is inspired by the 2019 book “Algorithms for Optimization.”

Let’s take a closer look at each in turn.

Bracketing Algorithms

Bracketing optimization algorithms are intended for optimization problems with one input variable where the optima is known to exist within a specific range.

Bracketing algorithms are able to efficiently navigate the known range and locate the optima, although they assume only a single optima is present (referred to as unimodal objective functions).

Some bracketing algorithms may be able to be used without derivative information if it is not available.

Examples of bracketing algorithms include:

  • Fibonacci Search
  • Golden Section Search
  • Bisection Method

Local Descent Algorithms

Local descent optimization algorithms are intended for optimization problems with more than one input variable and a single global optima (e.g. unimodal objective function).

Perhaps the most common example of a local descent algorithm is the line search algorithm.

  • Line Search

There are many variations of the line search (e.g. the Brent-Dekker algorithm), but the procedure generally involves choosing a direction to move in the search space, then performing a bracketing type search in a line or hyperplane in the chosen direction.

This process is repeated until no further improvements can be made.

The limitation is that it is computationally expensive to optimize each directional move in the search space.

First-Order Algorithms

First-order optimization algorithms explicitly involve using the first derivative (gradient) to choose the direction to move in the search space.

The procedures involve first calculating the gradient of the function, then following the gradient in the opposite direction (e.g. downhill to the minimum for minimization problems) using a step size (also called the learning rate).

The step size is a hyperparameter that controls how far to move in the search space, unlike “local descent algorithms” that perform a full line search for each directional move.

A step size that is too small results in a search that takes a long time and can get stuck, whereas a step size that is too large will result in zig-zagging or bouncing around the search space, missing the optima completely.

First-order algorithms are generally referred to as gradient descent, with more specific names referring to minor extensions to the procedure, e.g.:

  • Gradient Descent
  • Momentum
  • Adagrad
  • RMSProp
  • Adam

The gradient descent algorithm also provides the template for the popular stochastic version of the algorithm, named Stochastic Gradient Descent (SGD) that is used to train artificial neural networks (deep learning) models.

The important difference is that the gradient is appropriated rather than calculated directly, using prediction error on training data, such as one sample (stochastic), all examples (batch), or a small subset of training data (mini-batch).

The extensions designed to accelerate the gradient descent algorithm (momentum, etc.) can be and are commonly used with SGD.

  • Stochastic Gradient Descent
  • Batch Gradient Descent
  • Mini-Batch Gradient Descent

Second-Order Algorithms

Second-order optimization algorithms explicitly involve using the second derivative (Hessian) to choose the direction to move in the search space.

These algorithms are only appropriate for those objective functions where the Hessian matrix can be calculated or approximated.

Examples of second-order optimization algorithms for univariate objective functions include:

  • Newton’s Method
  • Secant Method

Second-order methods for multivariate objective functions are referred to as Quasi-Newton Methods.

  • Quasi-Newton Method

There are many Quasi-Newton Methods, and they are typically named for the developers of the algorithm, such as:

  • Davidson-Fletcher-Powell
  • Broyden-Fletcher-Goldfarb-Shanno (BFGS)
  • Limited-memory BFGS (L-BFGS)

Now that we are familiar with the so-called classical optimization algorithms, let’s look at algorithms used when the objective function is not differentiable.

Non-Differential Objective Function

Optimization algorithms that make use of the derivative of the objective function are fast and efficient.

Nevertheless, there are objective functions where the derivative cannot be calculated, typically because the function is complex for a variety of real-world reasons. Or the derivative can be calculated in some regions of the domain, but not all, or is not a good guide.

Some difficulties on objective functions for the classical algorithms described in the previous section include:

  • No analytical description of the function (e.g. simulation).
  • Multiple global optima (e.g. multimodal).
  • Stochastic function evaluation (e.g. noisy).
  • Discontinuous objective function (e.g. regions with invalid solutions).

As such, there are optimization algorithms that do not expect first- or second-order derivatives to be available.

These algorithms are sometimes referred to as black-box optimization algorithms as they assume little or nothing (relative to the classical methods) about the objective function.

A grouping of these algorithms include:

  • Direct Algorithms
  • Stochastic Algorithms
  • Population Algorithms

Let’s take a closer look at each in turn.

Direct Algorithms

Direct optimization algorithms are for objective functions for which derivatives cannot be calculated.

The algorithms are deterministic procedures and often assume the objective function has a single global optima, e.g. unimodal.

Direct search methods are also typically referred to as a “pattern search” as they may navigate the search space using geometric shapes or decisions, e.g. patterns.

Gradient information is approximated directly (hence the name) from the result of the objective function comparing the relative difference between scores for points in the search space. These direct estimates are then used to choose a direction to move in the search space and triangulate the region of the optima.

Examples of direct search algorithms include:

  • Cyclic Coordinate Search
  • Powell’s Method
  • Hooke-Jeeves Method
  • Nelder-Mead Simplex Search

Stochastic Algorithms

Stochastic optimization algorithms are algorithms that make use of randomness in the search procedure for objective functions for which derivatives cannot be calculated.

Unlike the deterministic direct search methods, stochastic algorithms typically involve a lot more sampling of the objective function, but are able to handle problems with deceptive local optima.

Stochastic optimization algorithms include:

  • Simulated Annealing
  • Evolution Strategy
  • Cross-Entropy Method

Population Algorithms

Population optimization algorithms are stochastic optimization algorithms that maintain a pool (a population) of candidate solutions that together are used to sample, explore, and hone in on an optima.

Algorithms of this type are intended for more challenging objective problems that may have noisy function evaluations and many global optima (multimodal), and finding a good or good enough solution is challenging or infeasible using other methods.

The pool of candidate solutions adds robustness to the search, increasing the likelihood of overcoming local optima.

Examples of population optimization algorithms include:

  • Genetic Algorithm
  • Differential Evolution
  • Particle Swarm Optimization

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

Articles

Summary

In this tutorial, you discovered a guided tour of different optimization algorithms.

Specifically, you learned:

  • Optimization algorithms may be grouped into those that use derivatives and those that do not.
  • Classical algorithms use the first and sometimes second derivative of the objective function.
  • Direct search and stochastic algorithms are designed for objective functions where function derivatives are unavailable.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

 

Author: Shantun Parmar

130 thoughts on “How to Choose an Optimization Algorithm

  1. I am only writing to make you understand what a remarkable experience our princess undergone viewing your webblog. She picked up such a lot of things, with the inclusion of what it is like to possess an ideal teaching spirit to make the mediocre ones without difficulty master specified tortuous topics. You undoubtedly surpassed readers’ expected results. Thanks for rendering those useful, trustworthy, revealing as well as unique tips on your topic to Sandra.

  2. I’m just commenting to let you be aware of of the extraordinary discovery my cousin’s daughter enjoyed going through your site. She discovered lots of issues, most notably how it is like to have a very effective giving mindset to have many others just gain knowledge of certain extremely tough matters. You really did more than our desires. Many thanks for distributing the important, healthy, educational and as well as easy thoughts on the topic to Gloria.

  3. Greetings from Idaho! I’m bored to death at work so I decided to browse your
    blog on my iphone during lunch break. I enjoy the knowledge you provide here and can’t wait to take
    a look when I get home. I’m amazed at how fast your blog loaded on my phone ..
    I’m not even using WIFI, just 3G .. Anyhow, excellent blog!

  4. I savor, lead to I discovered just what I was looking for.
    You have ended my 4 day long hunt! God Bless you man.
    Have a nice day. Bye

  5. I have read so many posts on the topic of the blogger lovers but this
    piece of writing is actually a pleasant paragraph,
    keep it up.

  6. This is the right website for anybody who wishes to understand
    this topic. You know so much its almost hard to argue with you (not that I actually will need to…HaHa).
    You certainly put a new spin on a subject that has been written about
    for many years. Excellent stuff, just great!

  7. This is the right site for anybody who really wants to
    find out about this topic. You realize so much its almost hard to argue with you (not that I personally would want to?HaHa).
    You definitely put a new spin on a topic that’s been written about for ages.
    Excellent stuff, just wonderful!

    Also visit my web page … CamHandy

  8. Please let me know if you’re looking for a article writer for
    your blog. You have some really good posts and I believe I would be
    a good asset. If you ever want to take some of the load off, I’d really like to write
    some articles for your blog in exchange for a link
    back to mine. Please send me an e-mail if interested.
    Many thanks!

  9. Hello there! This post could not be written any
    better! Reading through this post reminds me of my good old room
    mate! He always kept talking about this. I will forward this write-up to him.
    Pretty sure he will have a good read. Many thanks for sharing!

    Feel free to visit my web blog: BioDefy Cream

  10. After checking out a number of the articles on your web page, I seriously like your technique of writing a blog.
    I book-marked it to my bookmark website list and will be checking back soon. Please check out my web site as
    well and tell me your opinion.

    Feel free to surf to my web-site … CamHandy Review

  11. Do you have a spam issue on this site; I also am a
    blogger, and I was wondering your situation; we have created some nice procedures and we are looking to exchange solutions with others, please shoot me an e-mail if interested.

  12. I really love your site.. Excellent colors & theme.

    Did you develop this website yourself? Please reply back as I’m planning to create my own personal website and would love to learn where you
    got this from or just what the theme is named. Kudos!

  13. Hey there! Do you know if they make any plugins to
    help with SEO? I’m trying to get my blog to rank for some targeted keywords but I’m not seeing very good results.
    If you know of any please share. Kudos!

  14. Helpful information. Fortunate me I found your web
    site by chance, and I’m stunned why this accident didn’t
    came about earlier! I bookmarked it.

  15. Pretty nice post. I simply stumbled upon your blog and wished to
    say that I have really loved browsing your weblog posts.
    After all I’ll be subscribing to your feed and I’m hoping you write once more very soon!

  16. Hi! I could have sworn I?ve been to this site before but after going
    through a few of the posts I realized it?s new to me.
    Nonetheless, I?m definitely happy I came across
    it and I?ll be bookmarking it and checking back often!

    Here is my website :: CamHandy Dashcam

  17. Interesting blog! Is your theme custom made or
    did you download it from somewhere? A design like yours with a few simple
    adjustements would really make my blog jump out.
    Please let me know where you got your design. Thanks

  18. certainly like your web-site but you have to test the spelling on several of your posts.
    Many of them are rife with spelling problems and I in finding
    it very bothersome to tell the truth nevertheless I will definitely come
    again again.

  19. Hello would you mind stating which blog platform you’re using?
    I’m planning to start my own blog soon but I’m having a
    difficult time choosing between BlogEngine/Wordpress/B2evolution and Drupal.
    The reason I ask is because your design and style seems different then most blogs
    and I’m looking for something unique.
    P.S Sorry for getting off-topic but I had to ask!

  20. Howdy! This is kind of off topic but I need
    some guidance from an established blog. Is it hard to
    set up your own blog? I’m not very techincal but I can figure things out pretty quick.

    I’m thinking about making my own but I’m not sure where to begin. Do you have any points or suggestions?
    Thanks

  21. Hi there! This post couldn’t be written any better! Reading through this
    post reminds me of my previous room mate! He always kept talking about this.
    I will forward this post to him. Pretty sure he will have a good read.
    Thank you for sharing!

    Here is my homepage: Quick Shred Keto

  22. Fantastic goods from you, man. I’ve understand your stuff
    previous to and you’re just too fantastic. I actually like what you’ve acquired here, certainly like what you’re stating and the way in which you say it.

    You make it entertaining and you still care for to keep it smart.
    I cant wait to read much more from you. This is really a terrific web site.

    Feel free to surf to my web site: Mackenzie

  23. Great goods from you, man. I’ve take note your stuff previous to and you’re simply too excellent.
    I really like what you’ve received right here, really like what you are stating and
    the best way wherein you are saying it. You’re making it
    enjoyable and you continue to take care of to stay it smart.
    I can’t wait to read far more from you. This is actually a great site.

    my homepage :: Dirstop.Com

  24. Hey! This post couldn’t be written any better! Reading this post
    reminds me of my old room mate! He always kept talking about this.
    I will forward this post to him. Fairly certain he will have
    a good read. Thank you for sharing!

    Also visit my site: best smartwatch

  25. We are a group of volunteers and starting a new scheme in our community.
    Your web site offered us with valuable information to work on.
    You’ve done a formidable job and our entire community will be thankful to you.

  26. According to Curbed, the 50-story tower is decorated with copious amounts of Baccarat crystal,
    from the countless chandeliers in the building to the 1,
    800 crystal glasses that adorn the lobby.

Thanks for your support, You may click on ads to encourage us which assits to writers.

Leave a Reply

Your email address will not be published.