Scenario
To get a dataset ready for analysis, we’ll need to carry out a sequence of steps to process it after it is loaded from a file or database. Pandas, as a widely used tool, comes with powerful data processing capability. Also, it can chain a number of methods, so we can streamline those processing steps and group them according to their functionalities. This helps make Pandas code more concise and improve its readability. Please check out the reference for a detailed view. This post demonstrates how this works with a simple example. In the example, we’ll chain Pandas’s built-in methods and a user-defined function and a lambda function.
Example
This is the csv file we’ll use for the demonstration.
testdata.csv
The user defined function, splitname, splits full name into first name and last name.
The lambda function calculates Divisor/Dividend and save it to a new field, CalcField.
We’ll execute the Python code in JupyterLab.
Source code is also attached here.
import pandas as pddef splitname(df):firstlast = df.Name.str.split(' ', expand=True)firstlast.columns=['FirstName', 'LastName']df = pd.concat([df, firstlast], axis=1)df = df.drop('Name', axis=1)return dfdf = pd.read_csv('.\\testdata.csv', low_memory=False) \.query("Dividend > 3000") \.assign(CalcField = lambda x: x.Divisor / x.Dividend) \.pipe(splitname) \.sort_values(by='CalcField', ascending=True)print(df)
No comments:
Post a Comment