+tech Blog: [Tips] Python - Chain Methods In Pandas

Tuesday, August 8, 2023

[Tips] Python - Chain Methods In Pandas

Scenario

To get a dataset ready for analysis, we’ll need to carry out a sequence of steps to process it after it is loaded from a file or database. Pandas, as a widely used tool, comes with powerful data processing capability. Also, it can chain a number of methods, so we can streamline those processing steps and group them according to their functionalities. This helps make Pandas code more concise and improve its readability. Please check out the reference for a detailed view. This post demonstrates how this works with a simple example. In the example, we’ll chain Pandas’s built-in methods and a user-defined function and a lambda function.

Example

This is the csv file we’ll use for the demonstration.

testdata.csv

The user defined function, splitname, splits full name into first name and last name.

The lambda function calculates Divisor/Dividend and save it to a new field, CalcField.

We’ll execute the Python code in JupyterLab.

The output produced by JupyterLab is shown below.

Source code is also attached here.

import pandas as pd

def splitname(df):
firstlast = df.Name.str.split(' ', expand=True)
firstlast.columns=['FirstName', 'LastName']
df = pd.concat([df, firstlast], axis=1)
df = df.drop('Name', axis=1)
return df

df = pd.read_csv('.\\testdata.csv', low_memory=False) \
.query("Dividend > 3000") \
.assign(CalcField = lambda x: x.Divisor / x.Dividend) \
.pipe(splitname) \
.sort_values(by='CalcField', ascending=True)
print(df)

Reference

How to use method chaining in Pandas

+tech Blog

Tuesday, August 8, 2023

[Tips] Python - Chain Methods In Pandas

Scenario

Example

Reference

No comments:

Post a Comment

Oracle - JOINs

Labels

Wikipedia