Pandas Library
Topics Covered:
- Short Introduction to Pandas
- Creating Datasets in Pandas
- Viewing Data
- Selection
Pandas is one of the most important library used for data analysis in day to day basis with Python.
Pandas has two main data structures that it uses all the time, Dataframe and Series.
Short Introduction to Pandas
Importing pandas
import pandas as pd
Creating datasets in Pandas
Creating a series by passing a list of values
s = pd.Series([1, 3, 5, np.nan, 6, 8])
Creating a Dataframe by passing a NumPy array, with dates index and labeled columns
df=pd.DataFrame(np.random.randn(7,5), index = dates , columns = list(“ABCDE”))
df
Creating a Dataframe by passing a dict of objects that can be converted to series-like
df2 = pd.DataFrame({
“A”: 1,
“B”: pd.Timestamp.now(),
“C”: pd.Series(1,index=list(range(4)),dtype = ‘int32’),
“D”: np.array([3]*4,dtype=’int32'),
“E”: pd.Categorical([‘Female’,’Male’,’Male’,’Female’]),
})
df2
Viewing Data
df.head() displays the first five rows of the Dataframe
df.tail() displays the last five rows of the Dataframe
Python code for viewing index and columns is
df.index
df.columns
decribe() shows a quick statistic summary of our data
df.describe()
Dataframe.to_numpy() gives a numpy representation of the underlying data.
Note that this can be an expensive operation when your DataFrame
has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.DataFrame.to_numpy()
does not include the index or column labels in the output.
df.to_numpy()
Some other pandas function are
df.T #Transposing your data
df.sort_index(axis = 1 , ascending = False) #sort the data in descending order according to the first row i.e. ‘E’,’D’,’C’,’B’,’A’
df.sort_values(by=”A”) #will sort the data acc to column “A”
Selection
Selection by label
df.loc[:, [“A”, “B”]]
df.loc[“20140701”:”20140703",”A”:”C”]
Selection by position
df.iloc[3:6,0:3]
Boolean Indexing
df[df[“C”]>0]
To learn more about pandas library refer to the official documentation of pandas .
References: https://pandas.pydata.org/docs/user_guide/10min.html.