Pandas Library

Naman Mehra
3 min readJun 11, 2021

--

https://miro.medium.com/max/1838/1*KdxlBR9P3mDp9JZ_URMdYQ.jpeg

Topics Covered:

  • Short Introduction to Pandas
  • Creating Datasets in Pandas
  • Viewing Data
  • Selection

Pandas is one of the most important library used for data analysis in day to day basis with Python.

Pandas has two main data structures that it uses all the time, Dataframe and Series.

Short Introduction to Pandas

Importing pandas

import pandas as pd

Creating datasets in Pandas

Creating a series by passing a list of values

s = pd.Series([1, 3, 5, np.nan, 6, 8])

Creating a Dataframe by passing a NumPy array, with dates index and labeled columns

df=pd.DataFrame(np.random.randn(7,5), index = dates , columns = list(“ABCDE”))

df

Creating a Dataframe by passing a dict of objects that can be converted to series-like

df2 = pd.DataFrame({
“A”: 1,
“B”: pd.Timestamp.now(),
“C”: pd.Series(1,index=list(range(4)),dtype = ‘int32’),
“D”: np.array([3]*4,dtype=’int32'),
“E”: pd.Categorical([‘Female’,’Male’,’Male’,’Female’]),
})

df2

Viewing Data

df.head() displays the first five rows of the Dataframe
df.tail() displays the last five rows of the Dataframe

Python code for viewing index and columns is

df.index
df.columns

decribe() shows a quick statistic summary of our data

df.describe()

Dataframe.to_numpy() gives a numpy representation of the underlying data.
Note that this can be an expensive operation when your DataFrame has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.
DataFrame.to_numpy() does not include the index or column labels in the output.

df.to_numpy()

Some other pandas function are

df.T #Transposing your data
df.sort_index(axis = 1 , ascending = False) #sort the data in descending order according to the first row i.e. ‘E’,’D’,’C’,’B’,’A’
df.sort_values(by=”A”) #will sort the data acc to column “A”

Selection

Selection by label

df.loc[:, [“A”, “B”]]

df.loc[“20140701”:”20140703",”A”:”C”]

Selection by position

df.iloc[3:6,0:3]

Boolean Indexing

df[df[“C”]>0]

To learn more about pandas library refer to the official documentation of pandas .

References: https://pandas.pydata.org/docs/user_guide/10min.html.

--

--