Statistical and Visual Exploratory Data Analysis with One Line of Code

Statistical and Visual Exploratory Data Analysis with One Line of Code

If EDA is not executed correctly, it can cause us to start modeling with “unclean” data. See how to use Pandas Profiling to perform EDA with a single line of code.


By Brenda Hali, Marketing Data Specialist

Image

Exploratory Data Analysis (EDA) is, in my opinion, the most important part of Machine Learning Modeling in new datasets. If EDA is not executed correctly, it can cause us to start modeling with “unclean” data, and this is just as a snowball downhill, it gets bigger and worse.

Basic elements of a good Exploratory Data Analysis

 
The exploratory Data Analysis can be as deep as you want or need it to be, but a basic analysis needs to have elements below:

  • First and last values
  • Dataset shape (#rows and #columns)
  • Data/Variables types
  • Missing and Null values
  • Duplicated values
  • Descriptive Statistics (Mean, Minimum, Maximum)
  • Variables distribution
  • Correlations

I enjoy performing manual EDA to get to know my data better, but a couple of months ago, Adi Bronshtein introduced me to Pandas Profiling. As it takes quite some time to process, I use it when I want to explore small datasets quickly, and I hope that it speeds up your EDA, too.

Getting started with Pandas Profiling

 
In this demonstration, I will conduct EDA in NASA´s Meteorite Landings Dataset.

Did you run it already?

et Voilà, easy peasy!

Image for post

Now the fun starts.

Explore more about Pandas Profiling on their documentation here: https://pandas-profiling.github.io/pandas-profiling/docs/

Did you enjoy this text? You might want to check The Best Free Data Science eBooks.

 
Bio: Brenda Hali is a Marketing Data Specialist based in Washington, D.C. She is passionate about women’s inclusion in technology and data.

Original. Reposted with permission.

Related:

Author: admin

Leave a Reply

Your email address will not be published.