Seaborn library

Naman Mehra
4 min readJun 16, 2021

--

Seaborn is a data visualization library built on top of matplot library. Seaborn helps us understand the data by visualizing it. Seaborn offers a simpler interface and much more understandable plots needed for machine learning.
The plots in seaborn are better looking than in Matplotlib. Seaborn is considerably more organized and functional than Matplotlib and treats the entire dataset as a solitary unit.

Installing And Importing Seaborn

pip install seaborn
import pandas as pd #pandas, numpy and matplotlib libraries are used
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%reload_ext autoreload
%autoreload 2

Importing Datasets

Let us look at the datasets already available in seaborn library by typing the following code

print(sns.get_dataset_names()) #will show the inbuilt seaborn datasets

Importing ‘car_crashes’ dataset

crash_df = sns.load_dataset('car_crashes')

Distribution Plot

So the distribution plot in seaborn library will give us the histogram of the attribute which we have mentioned. It will also give us the kernel density estimation line which we can set as true or false.

sns.distplot(crash_df['alcohol'],kde=False,bins =20) #kde is kernel density estimation

Joint Plot

Joint Plot is basically used to compare two distribution and it plots a scatter plot by default but we can change it by using kind.

sns.jointplot(x='speeding',y='not_distracted', data = crash_df , kind = 'reg')

Pair plot

A pair plot is going to plot the relationships across the entire Dataframe numerical values.

sns.pairplot(crash_df)

We can also use hue for categorical data. And the plot will be colorized based upon that categorical data

sns.pairplot(crashes_df, hue='not_distracted')

Rug Plot

Rug plot is going to plot a single column of data points in a dataframe as sticks. And with a rug plot we can see more dense amount of lines where the amount is most common.

sns.rugplot(crash_df['alcohol'],height=0.2)
sns.kdeplot(crash_df['alcohol'])

Styling

sns.set_style('darkgrid')sns.set_context('talk', font_scale=1.5) # we can use 'poster'/'paper' to see different stylessns.jointplot(x='speeding',y='not_distracted',data = crash_df,kind = 'reg')

Categorical Plotting:

Bar Plot

For plotting categorical lets load another dataset tips from seaborn library using:

tips_df = sns.load_dataset('tips')
sns.barplot(x = 'time' , y = 'tip' , data = tips_df)
sns.set_context('paper')

The block bar on top of the bar is showing us the variance.

Box Plot

Box plot will allow us to compare different variables. It will show us the quartiles of the data.

sns.boxplot(x='day',y='total_bill' , data = tips_df , hue = 'sex')#hue is providing category
sns.set_context('poster')

The black line (horizontal) in the middle of the bar is the median. The box is going to extend one standard deviation from our median. The black line(vertical) on the bar is called whiskers and they are going to extend to all of the other data asides from what is in our standard deviation

Strip Plot

plt.figure(figsize=(8,5))
sns.set_style('dark')
sns.set_context('talk')
sns.stripplot(x='day',
y='total_bill',
data=tips_df ,
hue='sex',
jitter=True, # jitter is used so that our points don't overlap each other
dodge=True)#dodge will help us separate the data between male and female

Matrix Plots

Heatmaps

crash_mx = crash_df.corr()
crash_mx

First we have created a correlation matrix. And then we will make a heatmap from the correlation matrix.

sns.set_context('paper')
sns.heatmap(crash_mx,annot=True,cmap='Oranges')

Heatmaps using pivot table
In order to use heatmap using pivot table lets try using a different dataset.
flights_df = sns.load_dataset(‘flights’)
flights_df

flights = flights_df.pivot_table(index='month', columns = 'year', values='passengers')
flights
sns.heatmap(flights , cmap='hot')

So this was all for this blog on seaborn.

Follow me on linkedin.

--

--

Naman Mehra
Naman Mehra

Written by Naman Mehra

"Doing B.Tech from JIIT" <<"/n"; "20 y/o";

No responses yet