Create a Word Cloud or Tag Cloud in Python

This article was published as a part of the Data Science Blogathon.

Introduction

I have always been in love with Data Visualization since the day I started working on it. I always enjoy deriving useful insights from the data. Before this, I only knew about basic charts like bar graphs, scatter plots, histograms, etc those are inbuilt in tableau and power BI in data visualization. By working every day on this task, I came across many new charts like radial guage chart, waffle charts, and so on.

So, out of my curiosity, recently I was searching for all the types of charts that are used in data visualization, where this word cloud caught my eye and I found it very interesting. Till now, seeing this word cloud images forced me to think that these are just random images where those words are randomly arranged, But I was wrong, and where it all started. After this I tried making word cloud from just small data in Tableau and Power BI . After that successful attempts, I wanted to try it by code alike making bar charts, pie charts and other charts.

What basically A Word Cloud is?

Definition: A word cloud is a simple yet powerful visual representation object for text processing, which shows the most frequent word with bigger and bolder letters, and with different colors. The smaller the the size of the word the lesser it’s important.

word cloud
Sample Worcloud

Uses of Tag Cloud

1) Top Hashtags on Social Media(Instagram , Twitter): Throughout the world, social media in trending for the latest updates , so from that we can get the the most used Hashtags that people use in their post.

2) Hot Topics In Media: Analyzing the the news articles , we can find the keywords in the headlines and extract the top n demanding topics and to get the desired result i.e the top n trending media topics.

3)Search Term in an E-commerce: In an e-commerce shopping website, the owner can make the word cloud of the shopping items that has been searched the most. So that, he can get the idea about which shopping is in great demand during specific period.

Let’s Start Coding in python to achieve this kind of word cloud

First of all we need to install all the libraries in the jupyter notebook.

So, in python there is an inbuild library wordcloud which we will install. In the Anaconda Command prompt write the following code:

pip install wordcloud

If your anaconda environment supports conda, then write:

conda install wordcloud

Although, this can directly be achieved in the notebook itself , just by adding ‘!’ in the begenning of the the code

Like:

!pip install wordcloud

Now, here I will generate the wordcloud of the wikipedia text of any topic. So I will need a wikipedia library to access wikipedia API which can be done by installing wikipedia  in anaconda command prompt as follow:

pip install wikipedia

Now there are some other libraries which we be needed, they are  numpy. matplotlib and pandas.

As of now, we have all the libraries to create the tag cloud

import wikipedia
result= wikipedia.page("MachineLearning")
final_result = result.content
print(final_result)
word cloud

Output of Machine Learning Wikipedia Page

The above is the image of the output that we got by retrieving the the machine learning page of wikipedia. There we will also be able to see the scroll down, which means entire page is retrieved.

Here, we can also get the summary of the page by summary method as below: and

result= wikipedia.summary("MachineLearning", sentences=5)
print(result)

Here we have the parameter of sentences, so we can use it to retrieve specific number of lines.

word cloud

Output of 5 sentences

Let’s have the wordcloud now

 

from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt 
def plot_cloud(wordcloud):
    plt.figure(figsize=(10, 10))
    plt.imshow(wordcloud) 
    plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='pink', random_state=10).generate(final_result)
plot_cloud(wordcloud)

Stopwords are the words which does not have any meaning like ‘is’, ‘are’, ‘an’, ‘I’ and many more.

Wordcloud comes with inbuild library of stop words, that will automatically remove the stop words from the text.

But, Interesting thing comes here is that we can add our choice of stop words in python by stopwords.add() function.

Wordcloud method will have width and height to set, I have set both of them as 500, background color as pink.  If you do not add random state, then every time you run your code, your word cloud will look different. It should be set as any int value.

Here is the desired wordcloud , we will get from the above code:

By seeing the above figure, we see that machine learning is the most used word, and there are some other words that are frequently used are model, task, training, data. So we can conclude that machine learning is the task of training the data model.

We can also change the background color by background color method and the font colors by colormap method here and we can also add the hash codes of the colors in background color, but the mapcolor comes with the inbuild specific colors.

Let’s change the background color to turquoise by using it’s hash code and font colors to blue:

from wordcloud import WordCloud, StopWords
import matplotlib.pyplot as plt
def plot_cloud(wordcloud):
    plt.figure(figsize=(10, 10))
    plt.imshow(wordcloud)
    plt.axis("off");
wordcloud = WordCloud(width = 500, height = 500, background_color='#40E0D0', colormap="ocean",  random_state=10).generate(final_result)
plot_cloud(wordcloud)

Here, I have specified ocean, if I add some wrong color map, jupyter will throw a value error and show me the available options for color map as below:

Worcloud can also be implemented in an any image by using PIL library.

End Note

In this article we discussed about word cloud, it’s definition, it’s application areas and it’s example in python using jupyter notebook.

You can also read this article on our Mobile APP Get it on Google Play

Related Articles

Author: admin

Leave a Reply

Your email address will not be published.