Power of PandasAI -1.0.3🚀

Aashi Dutt
2 min readAug 18, 2023

--

Power of PandasAI
Credit: Pinterest

Whether you’re delving into machine learning or data science, there’s a good chance you’ve encountered the Pandas library, an essential tool for working with CSV or JSON files. Yet, the true challenge often lies in the meticulous process of data cleaning before any further analysis.

What if AI could do it for you?

Well, look no further because a whole new version of PandasAI is here — version 1.0.3 which makes use of Gen AI to clean your data, allows you to chat with your data like an assistant and even draws you amazing plots in seconds.

“PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it. “

Prerequisites ✅

  • OpenAI API Key

Quick Example:

This example has been taken from original documentation of PandasAI. It’s pretty straightforward but very interesting.

You start by importing pandas base library along with PandasAI library. If you do not have PandasAI installed, try:

! pip install pandasai

Now continue to import openAI whose API we’ll be using to access the assistant or to chat with. Import SmartDatalake and create a data lake object that will be used to store and query data.

import pandas as pd
from pandasai import SmartDatalake
from pandasai.llm import OpenAI

employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI('YOUR API KEY')
dl = SmartDatalake([employees_df, salaries_df], config={"llm": llm})
dl.chat("Who gets paid the most?")
Oh, Olivia gets paid the most.

Here are few shortcuts for you to try with your data:

# Clean data
df.clean_data()

# Impute missing values
df.impute_missing_values()

# Generate features
df.generate_features()

# Plot histogram
df.plot_histogram(column="gdp")

Resources for you

Check out the official documentation: https://pypi.org/project/pandasai/

Summary

PandasAI is a powerful tool that can be used with the pandas library to achieve better results with data, reducing the hassle of data cleaning and processing.

--

--