Tuesday, August 8, 2023

Python Tips - Chain Methods In Pandas

 

Scenario

To get a dataset ready for analysis, we’ll need to carry out a sequence of steps to process it after it is loaded from a file or database. Pandas, as a widely used tool, comes with powerful data processing capability. Also, it can chain a number of methods, so we can streamline those processing steps and group them according to their functionalities. This helps make Pandas code more concise and improve its readability. Please check out the reference for a detailed view. This post demonstrates how this works with a simple example. In the example, we’ll chain Pandas’s built-in methods and a user-defined function and a lambda function.


Example

This is the csv file we’ll use for the demonstration.

testdata.csv

The user defined function, splitname, splits full name into first name and last name. 

The lambda function calculates Divisor/Dividend and save it to a new field, CalcField.

We’ll execute the Python code in JupyterLab.


 The output produced by JupyterLab is shown below.


Source code is also attached here.


import pandas as pd

def splitname(df):
    firstlast = df.Name.str.split(' ', expand=True)
    firstlast.columns=['FirstName', 'LastName']
    df = pd.concat([df, firstlast], axis=1)
    df = df.drop('Name', axis=1)
    return df
    
df = pd.read_csv('.\\testdata.csv', low_memory=False) \
       .query("Dividend > 3000") \
       .assign(CalcField = lambda x: x.Divisor / x.Dividend) \
       .pipe(splitname) \
       .sort_values(by='CalcField', ascending=True)
print(df)

No comments:

Post a Comment

AWS - Build A Serverless Web App

 ‘Run your application without servers’. The idea presented by the cloud service providers is fascinating. Of course, an application runs on...