Pandas Series
What is Pandas?
Pandas stands for Panel Data is a high level data manipulation tool used for data analysing and was created by Wes McKinney in 2008.
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
- Is there a correlation between two or more columns?
- What is average value?
- Max value?
- Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.
Pandas Data Structures
Data structure refer to specialized way of stroring data so as to apply a specific type of functionality on them.
Pandas has three types of data structure
*Series
* DataFrame
* Panel (Out of syllabus)
What is a Series?
A Pandas Series is like a column in a table.
It is a one-dimensional array holding data of any type.
1. Creation of Empty Series
Ex: 1
# creation of an empty series
import pandas as pd
s1=pd.Series()
print (s1)
Output
Series([], dtype: object)
2. Creation of Series by using a list
Ex.: 2
# creation of a series by a list
import pandas as pd
s1=pd.Series([1,2,3,4,5])
print (s1)
Output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
left side numbers are called index.
Ex.3
# creation of a series by a list
import pandas as pd
s1=pd.Series(["A","B","C","D","E"])
print (s1)
Output:
0 A
1 B
2 C
3 D
4 E
dtype: object
Ex.4
# creation of an series by a list
import pandas as pd
s1=pd.Series(["A","B","C","D","E"], index=(1,3,5,7,9))
print (s1)
Output:
1 A
3 B
5 C
7 D
9 E
dtype: object
We can also use letter or strings as indices :-
Ex. 5
# creation of an series by a list with string indexes
import pandas as pd
s1=pd.Series([31,28,31,30,31], index=("jan", "feb", "March", "April", "May"))
print (s1)
Output:
jan 31
feb 28
March 31
April 30
May 31
dtype: int64
We can also create Series by using 1D Numpy array:
What is numpy?
ans. Numpy array is a python library used for working with arrays.
Numpy array is a grid of values, all of the same data type and is indexed by a tuple of non-negative integers. the number of the dimesions is the rank of the array, the shape of an array is a tuple of integers giving the size of the array along each dimension.
Ex.6
# creation of an series by numpy array
import pandas as pd
import numpy as np
a1=np.array([10,20,30,40,50])
s1=pd.Series(a1)
print (s1)
Output:
0 10
1 20
2 30
3 40
4 50
dtype: int32
Ex.7
# creation of an series by numpy array and letter or strings as index
import pandas as pd
import numpy as np
a1=np.array([10,20,30,40,50])
s1=pd.Series(a1, index=['a', 'b', 'c', 'd','e'])
print (s1)
Output:
a 10
b 20
c 30
d 40
e 50
dtype: int32
Note: in above example provided index titles are not the same as the length of the array
then:
Ex. 8
# in above example index titles are not the same as the length of the array
import pandas as pd
import numpy as np
a1=np.array([10,20,30,40])
s1=pd.Series(a1, index=['a', 'b', 'c', 'd','e'])
print (s1)
then it with produce an error called : value error
ValueError: Length of values (4) does not match length of index (5)
4. Creation of Series using dictionaries
when a series is created using dictionary , then the keys of the dictionary becomes index of the Series, So there is no need of declaring indexes as a separate list.
Ex. 9:
# creation of an series by using Dictionary
import pandas as pd
s1=pd.Series({"jan": 31, "feb": 28, "March":31})
print (s1)
Output:
jan 31
feb 28
March 31
dtype: int64
OR
# creation of an series by using Dictionary
import pandas as pd
a1={"jan": 31, "feb": 28, "March":31}
s1=pd.Series(a1)
print (s1)
# creation of an series by using Dictionary
import pandas as pd
a1={"jan": 31, "feb": 28, "March":31}
s1=pd.Series(a1,index=["JAN", "FEB", "MARCH"])
print (s1)
output:
JAN NaN
FEB NaN
MARCH NaN
dtype: float64
5. Creation of Series using Mathematical Expression
Ex: 10
# creation of an series by using Mathematical Operations
import pandas as pd
a1=[5,10,15,20]
s1=pd.Series(data=[a1]*4, index=a1)
print (s1)
Output:
5 [5, 10, 15, 20]
10 [5, 10, 15, 20]
15 [5, 10, 15, 20]
20 [5, 10, 15, 20]
dtype: object
II How to modify the indexes
for ex:
# creation of an series
import pandas as pd
d1=[5,10,15,20]
a1=[1,3,5,7]
s1=pd.Series(d1, index=a1)
print (s1)
Output:
1 5
3 10
5 15
7 20
dtype: int64
now to modify the indexes
# creation of an series
import pandas as pd
d1=[5,10,15,20]
a1=[1,3,5,7]
s1=pd.Series(d1, index=a1)
a1=[2,4,6,8]
s1=pd.Series(d1, index=a1)
print (s1)
Output:
2 5
4 10
6 15
8 20
dtype: int64
III How to access the elements of the Series
There are two ways to access the elements -indexing and slicing
1. Indexing: Elements of a series can be accessed in two ways:
- Accessing Element from Series with row Position
- Accessing Element Using row Label (index)
Accessing Element from Series with Position
iloc[]: iloc is used for indexing or selection based on position, i.e. we have to specify integer index for selection by position .
for ex.
import pandas as pd
s=([1,2,3,4,5], index=[ 'a' , 'b', 'c' , 'd', 'e'])
print (s.iloc[1:4])
gives out put as
b 2
c 3
d 4
Access an Element in Pandas Using Label:
2: Slicing :
In order to access multiple elements from a series, we use Slice operation. Slice operation is performed on Series with the use of the colon(:). To print elements from beginning to a range use [:Index], to print elements from end-use [:-Index], to print elements from specific Index till the end use [Index:], to print elements within a range, use [Start Index:End Index] and to print whole Series with the use of slicing operation, use [:]. Further, to print the whole Series in reverse order, use [::-1].
for example :
import pandas as pd
s= pd. Series([1,2,3,4,5], index=['a' , 'b', 'c' , 'd', 'e'])
print(s[0])
print(s[:3])
print(s[:])
Comments
Post a Comment