Scenario
The context is that Pandas is used for data preprocessing in a data analytical process or a machine learning application. A relational database like Oracle serves as the persistent layer to keep the data.
In this post, we look at how to load data in a DataFrame object into a table defined in Oracle. This post is based on the following technical article with modifications to handle a DataFrame object.
Quick Start: Developing Python Applications for Oracle Database
Example
We use iris dataset in scikit-learn as the example. The purpose is to save the dataset to a table in Oracle. To do this, we follow the steps as shown below.
- Create the DataFrame instance
- Prepare the dataset for insertion
- Establish database connection
- Create a table in Oracle
- Execute data insertion
- Fetch data back from the table to validate the result
Output:
A part of rows retrieved from the table show the operation is successfully executed.
import numpy as npimport pandas as pdfrom sklearn.datasets import load_irisimport getpassimport oracledb
iris = load_iris()X, y = iris.data, iris.targetcolumns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']df = pd.DataFrame(np.hstack([X, y.reshape(len(y), 1)]), columns=columns)
# get class name# 0: setosa# 1: versicolor# 2: virginicadef get_class_name(id):if id == 0:class_name = 'setosa'elif id == 1:class_name = 'versicolor'elif id == 2:class_name = 'virginica'else:class_name = np.nanreturn class_namedf['class'] = df['class'].apply(get_class_name)rows = []for dfi in df.values:rows.append(dfi.tolist())
pw = getpass.getpass("Enter password: ")conn = oracledb.connect(user="test",password=pw,dsn="localhost/xepdb1")
cursor = conn.cursor()# Create a tablecursor.execute("""beginexecute immediate 'drop table iris';exception when others then if sqlcode <> -942 then raise; end if;end;""")cursor.execute("""create table iris (sepal_length number(10, 2),sepal_width number(10, 2),petal_length number(10, 2),petal_width number(10, 2),class varchar2(64))""")
# Insert iris datasetinsertsql = "insert into iris values(:1, :2, :3, :4, :5)"cursor.executemany(insertsql, rows)print(cursor.rowcount, "Rows Inserted")conn.commit()
# Now query the rows backfor row in cursor.execute('select * from iris'):if (row[1]):print(row, "is done")else:print(row, "is NOT done")
No comments:
Post a Comment