In this article, you’ll learn about the profound impact of the 80/20 principle on your life as a programmer. It’s based on a first draft of a chapter from my upcoming book “From 1 to 0: A Minimalistic Approach to Programming”.
The 80/20 principle has many names but the second most famous is the Pareto principle, named after his discoverer Vilfredo Pareto. So, how does the principle work and why should you care?
Pareto Principle Basics
The principle says that the majority of effects come from the minority of causes. For example, the majority of income is earned by the minority of people, the majority of innovations come from the minority of researchers, the majority of books is written by the minority of authors, the majority of the sales come from a minority of the clients, and the majority of goals are shot by the minority of soccer players.
Most likely, you’ve already heard about the 80/20 principle—it’s everywhere in personal productivity literature. The reason for its popularity is two-fold. First, the principle allows you to be lazy and productive at the same time—if you can figure out the things that matter and focus on those relentlessly. Second, the principle is observable everywhere in the real world. It’s very difficult to even come up with some phenomenon where the effects come equally from the causes. Go ahead and try to find some examples of 50/50 distributions where 50% of the effects come from 50% of causes! Sure, the distribution is not always 80/20. The concrete numbers can change to 70/30, 90/10, or even 95/5. However, the distribution is always heavily skewed towards the minority of the causes that produce the majority of effects.
Here’s an example of a Pareto distribution:
Figure: Example of a Pareto distribution: the causes are ordered according to the results they produce.
You can see the mathematical plot of a Pareto distribution that plots the generalized results against the causes—assuming the causes are ordered according to the results they produce.
Application Software Optimization
The following figure shows the application of the Pareto distribution to a software project: the minority of the code is responsible for the majority of the runtime. The x-axis shows code functions sorted by the runtime they incur. The y-axis shows the runtime of those code functions. The units don’t really matter here, but you should realize that the shaded area dominates the overall area under the plot. Most code functions contribute much less to the overall runtime than a few selected code functions. Spending a lot of time optimizing the “trivial many” barely produces any improvement of the overall runtime.
Figure: Example of a Pareto distribution in software engineering: most functions contribute little to the overall runtime but some functions contribute heavily.
While the principle is easily understandable, most people don’t intuitively understand the relevance of the principle in their own lives. How can you make use of the principle to get more done in less time?
Few people know that the principle was successfully employed by large computing companies such as IBM, Microsoft, and Apple to build computers that feel much faster and to create a user experience that has previously been unheard of. How did they do this? They channeled their focus on the “Top 20%”—by repeatedly optimizing the 20% of the code that was executed most often by the average user. Not all code is created equal. A minority of code has a dominating impact on the user experience while much of the code has little impact on it. For example, you double-click on an icon multiple times per day—programs should load very fast for great user experience—but you change the access rights of a file only seldomly, if at all. The 80/20 principle tells you where to focus your optimization efforts!
In fact, the 80/20 principle is a principle of focus. By focusing on the vital few rather than the trivial many, you can 10x, even 100x your productivity at work. Don’t believe me? Let’s calculate where these numbers come from, assuming an underlying 80/20 distribution.
Figure: The average output of the 20% top performers is 16x the average output of the 80% bottom performers.
The real world tells us that a minority of people produces the majority of results. This fundamental principle is observable in a wide variety of different applications. Let’s plug in some numbers to get an intuition how large the performance difference is. For instance, let’s use the conservative 80/20 parameters: 80% of the results come from 20% of the people. In some fields (like programming), the distribution is probably much more skewed.
The previous figure shows that in a company of 10 persons, only two persons produce 80% of the results while eight produce 20% of the results—a direct consequence of the 80/20 principle. If you divide 80% by two, you get an average output of 40% per top-performing person in the company. At the same time, if you divide the 20% of results generated by the eight persons, you obtain an average of 2.5% per bottom-performing person. The difference in performance is 16x!
And note that this is not a theoretical difference that could be achieved under some impractical settings—this 16x difference in average performance is already a fact in millions of organizations throughout the world.
The performance difference exists: there are two persons in your organization who create more than 10x higher output than a “normal” employee. The question is: how can you become one of those two? Or, to formulate it more generally: how can you “move to the left” on the Pareto distribution curve in your organization (see figure)?
Figure: To create more output, you need to constantly move to the left of the curve.
On the y-axis in our 80/20 world, I used the label “Output” to keep it general. You may want to optimize for income (20% of the people earn 80% of the incomes). You may want to optimize for happiness (20% of the people enjoy 80% of the happiness at work). You may want to optimize for monthly active users (20% of the websites have 80% of the monthly active users). You may want to optimize for book sales (20% of the books receive 80% of the sales). Or you may want to optimize for citations (20% of the researchers receive 80% of the citations).
This shows a critical take-away that follows from the 80/20 principle: be clear what you want to optimize.
Let’s say, we want to optimize for income as a proxy for happiness. How can you move to the left in the Pareto curve?
Now, you’re leaving exact science because you need to find the reasons why some people succeed: which of their expertise generates most of the success? You need to find a plausible, simplifying argument in your specific industry and develop actionable success metrics you can control. If you do more of them, you’ll become more successful. If you do less of them, you’ll become less successful. The tricky thing is that the success metrics are different in most fields. In fact, the 80/20 principle also applies to success metrics: some success metrics have a dominating impact on your performance in a field, while others barely matter at all.
For example, when working as a doctoral researcher, I soon realized that it’s all about citations. The more citations you have as a researcher, the more credibility, visibility, and opportunities you’ll experience in science. How can we influence citations (“today, I’ll increase the number of citations” is hardly an actionable success metric)? Citations come from high-class papers. If you publish more high-class papers, you’ll receive more citations. So, writing high-class papers is the most important activity for most scientists. However, many researchers get distracted by secondary activities such as preparing presentations, organizing, teaching, drinking coffee, the most successful researchers focus heavily on generating a maximal number of high-quality papers. For researchers, the Pareto distribution of the success metric for researchers may look like this:
Figure: Success metric in research: number of words written towards high-class paper.
By replacing the generalized “Output” with the new success metric “Number of words written towards high-class paper”, you have gained crystal clear insight into what you must do every day to push towards the left in research. If you write more words today, you’ll publish your next high-class paper sooner, receive more citations faster, grow your scientific footprint, and become a more successful scientist as a result. The beauty of this 80/20 approach is that it allows you to find your focus. Everything else doesn’t matter. You can relax, lean back, and focus on the things that are very important. You can spend less time on all the different tasks. You don’t have to die the death of a thousand cuts. You can be lazy with all activities but one: writing papers. You can blend most things out, ignore emails, don’t go to meetings that don’t push you to more papers, be lazy in all the other activities. Say, you work 8h per day and you spread your day into eight one-hour activities. After completing the success metric exercise, you realize that you can skip two 1h activities per day and reduce four of them by half to complete them in half an hour instead of an hour (being less perfectionistic). You have saved 4h per day but you still accomplish, say, 80% of your results. Now, you invest 2h into writing more words towards high-class papers per day. Within a few months, you’ll have submitted an extra paper and over time, you’ll submit much more papers than any other of your colleague. You work only 6h per day and you generate imperfect quality in most of your work tasks. But you shine on where it matters: you submit more research papers than anyone else in your environment. As a result, you’ll soon be one of the top 20% of researchers. You generate more with less.
This is the power of 80/20 thinking: you invest resources where they have the highest leverage. You create more results by investing less time, effort, money. You become lazy in most things in life. But you focus some of the saved time, energy, and money on the ones that are wildly important. Instead of becoming a “Jack of all trades, master of none”, you become a one-trick pony. You heavily focus on the vital few and ignore the trivial many. You lead a less stressful life, but you enjoy more fruits from your invested labor, efforts, time, and money.
Pareto Implications for Coders
Let’s consider another example: if you’re reading this book, you’re a programmer. In programming, the results are much more heavily skewed towards the top than in most other fields. Instead of 80/20, the distribution often looks more like 90/10. Bill Gates said that a “great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer”. Bill Gates has overseen hundreds of thousands of programmers and software developers and if he makes this statement, it must have some merit. Interestingly, the difference is not 16x like you’ve seen previously. The difference between a great and an average software writer is 10,000x! How can this be? Well, here are a number of reasons why this extreme Pareto distribution holds especially in the software world:
- A great programmer can solve some problems that the average programmer simply cannot solve. In some instances, this makes him infinitely-times more productive.
- A great programmer can write code that is 10,000 times faster than the code of an average programmer. This can make or break the viability of a whole product line of a billion-dollar company.
- A great programmer will write code that has fewer bugs. Think about the effect of a single security bug on Microsoft’s reputation and brand!
- A great programmer will write code that is easier to extend which may improve the productivity of thousands of developers that work on his code at a later stage of the software developing process.
- A great programmer will think out of the box and find creative solutions to circumvent costly development efforts and help focus on the most important things.
Each of the previously stated arguments show why a great software developer can be 10,000 times more productive. In practice, a combination of those factors is at play so that the difference can be even higher.
The key question is: how do you become a great programmer? Because if you can become a great programmer, you’ll always have much more work than you can handle, and the most successful companies in the world—Google, Facebook, Amazon, Apple, and Microsoft—will be happy to pay you big premiums for your services.
A Success Metric for Programmers
Unfortunately, the statement “become a great programmer” is not a success metric you can directly optimize—it’s a multi-dimensional problem. A great programmer can mean a lot of things. He or she understands code quickly, knows algorithms and data structures, knows different technologies and their strengths and weaknesses, can collaborate with other people, is communicative and creative, stays educated and knows about ways to organize the software development process, and possesses hundreds of soft- and hard-skills. But you cannot master all of those! If you don’t focus on the vital few, you’ll become enslaved by the trivial many. To become a great programmer, you must focus on the vital few. One of those vital few activities that will ensure that you become a better coder over time is the success metric “write more lines of code”. If you write more lines of code than your peers, you’ll become a better coder than most of your peers. It’s a simplification of the multi-dimensional problem but we simplified towards the vital few—by optimizing the proxy metric “write more lines of code”, we increased our odds of succeeding at the target metric “become a great writer of software code” (see figure).
Figure: Success metric in programming: number of lines of code written.
By writing more code, you create a self-reinforcing feedback loop. By writing more code, you begin to understand code better. You talk and behave more like an expert coder. You attract better coders and more challenging programming tasks, so you write more code and become even better. You get paid more and more per line of code you write, thus, it makes economic sense to write more code instead of doing housework or doing tedious non-programming tasks at work. You or your company outsource everything else. The more you code, the more successful you’ll become. Here you have the 80/20 activity you can follow every day: track the number of lines you code every day and optimize it. Make it a game to at least match your average every day. If you code more, you’ll ultimately join the top 10% of coders with income levels far above six figures.
Relationship Between Focus and the Pareto Distribution
A closely related topic I want to discuss is focus. The 80/20 principle explains why focus is so powerful. Let’s dive into the argument!
Consider the Pareto distribution in the next figure that shows the percentage improvement of moving towards the top of the distribution. Alice is the fifth most productive person in the organization. If she just overtakes one person in the organization, thereby becoming the fourth most productive person, she’d increase her output (e.g., salary) by 10%. If she moves one step further, her output increases by additional 20%. Think about this: even if she could keep increasing her income by 10% repeatedly, this would be great because she would experience superlinear growth. But in a Pareto-distribution, the growth per rank explodes. This is why even small increases of productivity can result in big increases of income. If you can move towards the top 10% in any Pareto distribution, you’ll be a wildly successful person with massive results in your life. It doesn’t matter if you’re a golfer, Poker player, programmer, or machine learning engineer. Increasing your productivity leads to super-linear improvements of your income, happiness, and joy at work. Some call this phenomenon: the winner takes it all.
Figure: Disproportional benefit of improving your rank in a Pareto distribution.
That’s why it doesn’t pay not to focus: if you don’t focus, you participate in many pareto distributions. Let’s consider the following graphic of two persons: Alice and Bob. Both have three units learning efforts every day. Alice focuses on one thing: programming. She’s neither a good chess player, nor a good golfer, nor good at politics. She just spends three units of efforts in learning to code. Bob spreads his focus to multiple disciplines. He spends one unit of time polishing his chess skills, one unit training his programming skills, and one unit training his politics skills. As a result, he’s reached average skills and output in each of the three areas. However, due to the nature of the Pareto distribution to disproportionally reward the winners in any Pareto distribution, Alice collects more total output (e.g., income or happiness) than Bob through her focusing strategy.
Figure: Non-linearity of rank output – A strategic explanation attempt for the power of focus.
Note that this holds true not only across broad and independent areas such as programming, chess, and politics. It also applies within narrow areas such as programming. For instance, Bob may spend his time reading three general books (let’s call them: Introduction to Python, Introduction to C++, and Introduction to Java) while Alice reads three books diving deep into Python (let’s call them: Introduction to Python, Introduction to Machine Learning with Python, and Machine Learning for Experts). As a result, Alice will focus in becoming a machine learning expert and can demand a higher salary for her specialized skill set.
Another example for a Pareto distribution gone extreme can be seen in contributions to Github repositories. There’s scientific evidence that contributions to open-source projects are Pareto distributed. Let’s consider a wildly repository for machine learning computations in Python: TensorFlow. Here are the top seven contributors to the Github repository:
Figure: TensorFlow Github repository “commit” distribution.
Here’s the table showing the same data numerically:
The user tensorflow-gardener contributed more than 20% of the 93,000 commits to this repository. Given that there are thousands of contributors, the distribution is much more extreme than the 80/20 distribution. The reason for this extreme skewness is that the contributor tensorflow-gardener consists of a team of coders at Google who create and maintain this repository. The interesting observation, however, is that the top contributors are extremely successful programmers with impressive track records working for some of the most successful companies in the world. You can check them out publicly on the Github page. Whether they became successful before or after they generated a large amount of commits to the open-source repository is a mere theoretical discussion. For all practical matters, you should start your success habit write more lines of code every day now. There’s nothing stopping you from becoming #2 on the TensorFlow repository – by committing valuable code to the TensorFlow repository 2-3 times per day for the next 2-3 years. If you persist, you can join the ranks of the most successful coders on earth – by choosing one powerful habit and sticking to it for three years!
The underlying driver of excellence is to leverage the 80/20 principle on multiple fronts: First, you focus on the minority of activities that are most able to push you to success in your profession. Second, you do more of these activities than 80% of the professionals in your industry so that you belong to the top 20% of the professionals regarding these selected activities. By chaining these two Pareto distributions—select the top 20% of activities and join the top 20% in terms of activity execution quantity—you maximally leverage your resources and you’ll become an unstoppable force in your industry. Are you prepared to take a ride to the moon?
Programmer Net Worth
Sure enough, the net worth of programmers is also Pareto distributed. For privacy reasons, it’s hard to get data about individual’s net worth but one page shows the self-reported net worth of computer programmers. Although the data may be noisy, it shows the characteristic skewness of real-world Pareto distributions:
Figure: Self-reported net worth of 60 programmers.
In fact, the curve is likely to be even more skewed in the real-world because there are many billionaire programmers who’ve initiated software services used by billions of people –
Mark Zuckerberg, Bill Gates, Elon Musk, Steve Wozniak come to mind. Each of those tech geniuses created the prototypes of their services themselves laying hand on the source code. The number of software millionaires is significant.