Python Pool: Python Code to Convert a Table to First Normal Form

Python has thousands of modules and libraries to help you with tabular data. These datasets can contain many different ways in which the data can be represented. Before inferring any valuable information for the tables, you need to apply some filters and normalizations to understand. There are thousands of ways to pre-process and clean your data in python. This is where the normalization (first normal form) comes into play. In layman terms, First Normal Form (1NF) refers to certain relations in the dataset. These relations ensure that every domain in the table has a different attribute in different rows. In this post, we will use Python Code to Convert a Table to First Normal Form Form by using the pandas’ library. Before jumping straight to the code, let’s understand more about the first normal form and its rules.

Contents of Tutorial

What exactly is a First Normal Form of a Table?

Normalization is a technique by which you can eliminate data repetition and improve the readability of the dataset. Every dataset comes with garbage values and irrelevant entries. By normalizing, we’re creating a meaningful entry from the garbage entry of the dataset. Accordingly, First Normal Form follows certain rules to eliminate all repetitive data and segregate all comma-separated entries to separate rows.

First Normal Form will help you to ensure that your table is readable by the computer. It also allows you to ignore the data multiplicity in the table to deduce any conclusions. Visualizing a normalized table makes more sense than the default table.

Why do we need First Normal Forms?

Python is widely used for data visualization and creating machine learning models. All these models are based on data input to generate the desired result. In this whole process, there is a section termed “Pre Processing.” Further, Pre Processing ensures that your data is understandable by computer and it has minimum noise.

Normalization is a technique to pre-process your table/dataset into an understandable format. The First Normal Form is used to reduce the redundancy in the dataset. Hence, if the dataset contains multi-valued entries/attributes, the first normal form will reduce it to separate entries.

Rules to follow when creating First Normal Form

Creating a first normal form has certain sets of rules which need to be followed. These rules are –

  1. All the attributes must be single-valued.
  2. Domain names should not change.
  3. All the attributes of a domain are unique. No repetitive attribute should be present.
  4. While breaking the multi-valued attributes, the order does not matter. You can place any attribute anywhere provided it’s unique.

Python Code to convert Table to First Normal Form

Let’s consider the following table –

IDNameSubjects
0DavidEnglish, Math
1GlennMath
2SteveScience, English
Table

The above table contains multi-valued attributes, so it’s not in the first normal form. It’s first normal form will be –

IDNameSubject
0DavidEnglish
0DavidMath
1GlennMath
2SteveScience
2SteveEnglish
1NF form

Example: Python Code to Convert a Table to First Normal Form

Now that we know the expected output, here’s the code which can be used to convert the table into the first normal form –

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['David', 'Glenn', 'Steve'], 'Subjects': [
                  ['English', 'Math'], ['Math'], ['Science', 'English']]})

lens = list(map(len, df['Subjects'].values))

res = pd.DataFrame({'Name': np.repeat(
    df['Name'], lens), 'Subject': np.concatenate(df['Subjects'].values)})

print(res)

Output

    Name  Subject
0  David  English
0  David     Math
1  Glenn     Math
2  Steve  Science
2  Steve  English

Explanation

We start by importing Pandas and Numpy modules. These libraries have built-in functions to separate multivalued items and join them to create a new dataset. Next, in df we create the pandas Dataframe object of a non-normalized table. As you can observe, David has two commas separated subjects in df (Same goes for Steve). In the next line, we calculate the number of attributes every domain has.

In our example, David has 2, Glenn has 1, and Steve has 2 Attributes. Then we create a new data frame with separate attributes for all domain values. To do this, we’ll use numpy repeat and numpy concatenate. Both these functions create multiple rows and then join (concatenate) them with their attributes. Then we do print(res) to verify the normalized table.

As the output follows all basic rules of the first normal form ie, no multivalued attribute, same domain names, and unique attributes. We can surely verify that the table is in first normal form.

Real Life Applications of converting Table to First Normal Form

Let’s consider the table which we’ve created in the above section for the subjects. Suppose, at some point, David wants to add Science in his subjects, the school has to edit the whole multivalued attribute. But in the first normal form, the school just has to add a nMustew row with his name and subject. It’s way easier to handle data this way.

Another example would be what is David wants to withdraw from his Math subject? The school will have to edit the whole attribute to do it. But if the data is in the first normal form, the school will remove the row where the person’s name is David, and the subject is Math. All these operations are way easier in first normal form, and you can easily get the respective data.

Besides, you can perform a search in the First Normal Form of Table easily than the multi-valued form.

Must Read

Conclusion: Python Code to Convert a Table to First Normal Form

To summarize, handling, and operating datasets/tables have become an important part of our lives. Every software has a set of SQL datasets or similar structures to store the values of data. First Normal Form will help you understand the data easily, and you can easily interpret various results. Additionally, you can add, remove, search data easily in normalized form. All these methods are essential to reduce the computational work and reduce the number of resources we’re using.

Comments are closed.