Pandas Series

 

What is Pandas?

Pandas stands for Panel Data is a  high level data manipulation tool used  for data analysing and was created by Wes McKinney in 2008.

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Why Use Pandas?

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.

What Can Pandas Do?

Pandas gives you answers about the data. Like:

  • Is there a correlation between two or more columns?
  • What is average value?
  • Max value?
  • Min value?

Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

Pandas Data Structures

Data structure refer to specialized way of stroring data so as to apply a specific type of functionality on them.

Pandas has three types of data structure 

*Series 

* DataFrame

* Panel  (Out of syllabus)

What is a Series?

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.


 A series can be created using Series( ) method.
I. Creation of the Series

1. Creation of Empty Series

Ex: 1

# creation of an empty series

import pandas as pd

s1=pd.Series()

print (s1)

Output

Series([], dtype: object)

2. Creation of Series by using a list

Ex.: 2

# creation of a series by a list

import pandas as pd

s1=pd.Series([1,2,3,4,5])

print (s1)

Output: 

0    1

1    2

2    3

3    4

4    5

dtype: int64

left side numbers are called index.

Ex.3 

# creation of a series by a list

import pandas as pd

s1=pd.Series(["A","B","C","D","E"])

print (s1)

Output:

0    A

1    B

2    C

3    D

4    E

dtype: object

Ex.4

# creation of an series by a list

import pandas as pd

s1=pd.Series(["A","B","C","D","E"], index=(1,3,5,7,9))

print (s1)

Output:

1    A

3    B

5    C

7    D

9    E

dtype: object 

We can also use letter or strings as indices :-

Ex. 5

# creation of an series by a list with string indexes

import pandas as pd

s1=pd.Series([31,28,31,30,31], index=("jan", "feb", "March", "April", "May"))

print (s1)

Output:

jan      31

feb      28

March    31

April    30

May      31

dtype: int64

3. Creation of Series by NumPy array

We can also create Series by using 1D Numpy array:

What is numpy?

ans.  Numpy array is a python library used for working with arrays.

Numpy array is a grid of values, all of the same data type and is indexed by a tuple of non-negative integers. the number of the dimesions is the rank of the array, the shape of an array is a tuple of integers giving the size of the array along each dimension.

Ex.6 

# creation of an series by numpy array

import pandas as pd

import numpy as np

a1=np.array([10,20,30,40,50])

s1=pd.Series(a1)

print (s1)

Output:


0    10

1    20

2    30

3    40

4    50

dtype: int32


Ex.7 

# creation of an series by numpy array and letter or strings as index

import pandas as pd

import numpy as np

a1=np.array([10,20,30,40,50])

s1=pd.Series(a1, index=['a', 'b', 'c', 'd','e'])

print (s1)

Output:

a    10

b    20

c    30

d    40

e    50

dtype: int32

Note:  in above example provided index titles are not the same as the length of the array

then: 

Ex. 8

 # in above example index titles are not the same as the length of the array

import pandas as pd

import numpy as np

a1=np.array([10,20,30,40])

s1=pd.Series(a1, index=['a', 'b', 'c', 'd','e'])

print (s1)

then it with produce an error called : value error

ValueError: Length of values (4) does not match length of index (5)

4. Creation of Series using dictionaries

when a series is created using dictionary , then the keys of the dictionary becomes index of the Series, So there is no need of declaring indexes as a separate list. 

Ex. 9:

 # creation of an series by using Dictionary

import pandas as pd

s1=pd.Series({"jan": 31, "feb": 28, "March":31})

print (s1)

Output:

jan      31

feb      28

March    31

dtype: int64 

OR

# creation of an series by using Dictionary

import pandas as pd

a1={"jan": 31, "feb": 28, "March":31}

s1=pd.Series(a1)

print (s1)


# creation of an series by using Dictionary

import pandas as pd

a1={"jan": 31, "feb": 28, "March":31}

s1=pd.Series(a1,index=["JAN", "FEB", "MARCH"])

print (s1)

output:

JAN     NaN

FEB     NaN

MARCH   NaN

dtype: float64

5. Creation of Series using Mathematical Expression

Ex: 10

# creation of an series by using Mathematical Operations

import pandas as pd

a1=[5,10,15,20]

s1=pd.Series(data=[a1]*4, index=a1)

print (s1)

Output:

5     [5, 10, 15, 20]

10    [5, 10, 15, 20]

15    [5, 10, 15, 20]

20    [5, 10, 15, 20]

dtype: object

II How to modify the indexes

for ex:

# creation of an series 

import pandas as pd

d1=[5,10,15,20]

a1=[1,3,5,7]

s1=pd.Series(d1, index=a1)

print (s1)

Output: 

1     5

3    10

5    15

7    20

dtype: int64

now to modify the indexes

# creation of an series 

import pandas as pd

d1=[5,10,15,20]

a1=[1,3,5,7]

s1=pd.Series(d1, index=a1)

a1=[2,4,6,8]

s1=pd.Series(d1, index=a1)

print (s1)

Output: 

2     5

4    10

6    15

8    20

dtype: int64


III How to access the elements of the Series

There are two ways to access the elements -indexing and slicing

1. Indexing:  Elements of a series can be accessed in two ways:

  • Accessing Element from Series with row Position
  • Accessing Element Using row Label (index)

Accessing Element from Series with Position

iloc[]: iloc is used for indexing or selection based on position, i.e. we have to specify integer index for selection by position .

for ex. 

import pandas as pd

s=([1,2,3,4,5], index=[ 'a' , 'b', 'c' , 'd', 'e'])

print (s.iloc[1:4])

gives out put as

b    2

c    3

d    4


Access an Element in Pandas Using Label:

loc[] : loc uis used for indexing or selection based on name i.e. 

2: Slicing :

In order to access multiple elements from a series, we use Slice operation. Slice operation is performed on Series with the use of the colon(:). To print elements from beginning to a range use [:Index], to print elements from end-use [:-Index], to print elements from specific Index till the end use [Index:], to print elements within a range, use [Start Index:End Index] and to print whole Series with the use of slicing operation, use [:]. Further, to print the whole Series in reverse order, use [::-1].

for example :

import pandas as pd

s= pd. Series([1,2,3,4,5], index=['a' , 'b', 'c' , 'd', 'e'])

print(s[0])

print(s[:3])

print(s[3:])

print(s[-3:])

print(s[:])

print(s[1:4])

print(s[0::2])


Comments

Popular posts from this blog

UNIT 4 DATABASE QUERY USING SQL

Unit-3 Review of Database Concepts & SQL