Seaborn library
Seaborn is a data visualization library built on top of matplot library. Seaborn helps us understand the data by visualizing it. Seaborn offers a simpler interface and much more understandable plots needed for machine learning.
The plots in seaborn are better looking than in Matplotlib. Seaborn is considerably more organized and functional than Matplotlib and treats the entire dataset as a solitary unit.
Installing And Importing Seaborn
pip install seaborn
import pandas as pd #pandas, numpy and matplotlib libraries are used
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt%matplotlib inline
%reload_ext autoreload
%autoreload 2
Importing Datasets
Let us look at the datasets already available in seaborn library by typing the following code
print(sns.get_dataset_names()) #will show the inbuilt seaborn datasets
Importing ‘car_crashes’ dataset
crash_df = sns.load_dataset('car_crashes')
Distribution Plot
So the distribution plot in seaborn library will give us the histogram of the attribute which we have mentioned. It will also give us the kernel density estimation line which we can set as true or false.
sns.distplot(crash_df['alcohol'],kde=False,bins =20) #kde is kernel density estimation
Joint Plot
Joint Plot is basically used to compare two distribution and it plots a scatter plot by default but we can change it by using kind.
sns.jointplot(x='speeding',y='not_distracted', data = crash_df , kind = 'reg')
Pair plot
A pair plot is going to plot the relationships across the entire Dataframe numerical values.
sns.pairplot(crash_df)
We can also use hue for categorical data. And the plot will be colorized based upon that categorical data
sns.pairplot(crashes_df, hue='not_distracted')
Rug Plot
Rug plot is going to plot a single column of data points in a dataframe as sticks. And with a rug plot we can see more dense amount of lines where the amount is most common.
sns.rugplot(crash_df['alcohol'],height=0.2)
sns.kdeplot(crash_df['alcohol'])
Styling
sns.set_style('darkgrid')sns.set_context('talk', font_scale=1.5) # we can use 'poster'/'paper' to see different stylessns.jointplot(x='speeding',y='not_distracted',data = crash_df,kind = 'reg')
Categorical Plotting:
Bar Plot
For plotting categorical lets load another dataset tips from seaborn library using:
tips_df = sns.load_dataset('tips')
sns.barplot(x = 'time' , y = 'tip' , data = tips_df)
sns.set_context('paper')
The block bar on top of the bar is showing us the variance.
Box Plot
Box plot will allow us to compare different variables. It will show us the quartiles of the data.
sns.boxplot(x='day',y='total_bill' , data = tips_df , hue = 'sex')#hue is providing category
sns.set_context('poster')
The black line (horizontal) in the middle of the bar is the median. The box is going to extend one standard deviation from our median. The black line(vertical) on the bar is called whiskers and they are going to extend to all of the other data asides from what is in our standard deviation
Strip Plot
plt.figure(figsize=(8,5))
sns.set_style('dark')
sns.set_context('talk')
sns.stripplot(x='day',
y='total_bill',
data=tips_df ,
hue='sex',
jitter=True, # jitter is used so that our points don't overlap each other
dodge=True)#dodge will help us separate the data between male and female
Matrix Plots
Heatmaps
crash_mx = crash_df.corr()
crash_mx
First we have created a correlation matrix. And then we will make a heatmap from the correlation matrix.
sns.set_context('paper')
sns.heatmap(crash_mx,annot=True,cmap='Oranges')
Heatmaps using pivot table
In order to use heatmap using pivot table lets try using a different dataset.flights_df = sns.load_dataset(‘flights’)
flights_df
flights = flights_df.pivot_table(index='month', columns = 'year', values='passengers')
flights
sns.heatmap(flights , cmap='hot')
So this was all for this blog on seaborn.
Follow me on linkedin.