{
"cells": [
{
"cell_type": "markdown",
"id": "1029c2f4",
"metadata": {},
"source": [
"# Seekwellpandas: Advanced Usage\n",
"\n",
"This notebook demonstrates advanced usage of the SEEKWELLPANDAS library, showcasing SQL-like operations on pandas DataFrames."
]
},
{
"cell_type": "markdown",
"id": "085e3751",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"Import the necessary libraries and create sample data for advanced operations."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "788dabe8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"( id name age city\n",
" 0 1 Alice 25 New York\n",
" 1 2 Bob 30 London\n",
" 2 3 Charlie 35 Paris\n",
" 3 4 David 28 Tokyo\n",
" 4 5 Eve 22 Sydney,\n",
" id name age city\n",
" 0 4 David 28 Tokyo\n",
" 1 5 Eve 22 Sydney\n",
" 2 6 Frank 40 Berlin\n",
" 3 7 Grace 33 Moscow\n",
" 4 8 Henry 45 Rome)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import seekwellpandas\n",
"\n",
"# Create sample data\n",
"data1 = pd.DataFrame({\n",
" 'id': range(1, 6),\n",
" 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n",
" 'age': [25, 30, 35, 28, 22],\n",
" 'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney']\n",
"})\n",
"\n",
"data2 = pd.DataFrame({\n",
" 'id': range(4, 9),\n",
" 'name': ['David', 'Eve', 'Frank', 'Grace', 'Henry'],\n",
" 'age': [28, 22, 40, 33, 45],\n",
" 'city': ['Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome']\n",
"})\n",
"\n",
"data1, data2"
]
},
{
"cell_type": "markdown",
"id": "77d12118",
"metadata": {},
"source": [
"## TRUNCATE\n",
"\n",
"Remove all rows from a DataFrame while retaining its structure."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "112ab48c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [id, name, age, city]\n",
"Index: []"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.TRUNCATE()"
]
},
{
"cell_type": "markdown",
"id": "6a2769f6",
"metadata": {},
"source": [
"## ALTER TABLE\n",
"\n",
"Add or drop columns in a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9a2ee9b4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" city | \n",
" country | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Alice | \n",
" New York | \n",
" Unknown | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Bob | \n",
" London | \n",
" Unknown | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Charlie | \n",
" Paris | \n",
" Unknown | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" David | \n",
" Tokyo | \n",
" Unknown | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Eve | \n",
" Sydney | \n",
" Unknown | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name city country\n",
"0 1 Alice New York Unknown\n",
"1 2 Bob London Unknown\n",
"2 3 Charlie Paris Unknown\n",
"3 4 David Tokyo Unknown\n",
"4 5 Eve Sydney Unknown"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.ALTER_TABLE(add_columns={'country': 'Unknown'}, drop_columns=['age'])"
]
},
{
"cell_type": "markdown",
"id": "12c20d58",
"metadata": {},
"source": [
"## MERGE\n",
"\n",
"Merge two DataFrames based on a common column."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "08a4d8d6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name_x | \n",
" age_x | \n",
" city_x | \n",
" country | \n",
" name_y | \n",
" age_y | \n",
" city_y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
" Unknown | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
" Unknown | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name_x age_x city_x country name_y age_y city_y\n",
"0 4 David 28 Tokyo Unknown David 28 Tokyo\n",
"1 5 Eve 22 Sydney Unknown Eve 22 Sydney"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.MERGE(data2, on='id', how='inner')"
]
},
{
"cell_type": "markdown",
"id": "49fcf2dc",
"metadata": {},
"source": [
"## DISTINCT\n",
"\n",
"Remove duplicate rows from a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d42693b6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
"
\n",
" \n",
" 2 | \n",
" 6 | \n",
" Frank | \n",
" 40 | \n",
" Berlin | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" Grace | \n",
" 33 | \n",
" Moscow | \n",
"
\n",
" \n",
" 4 | \n",
" 8 | \n",
" Henry | \n",
" 45 | \n",
" Rome | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city\n",
"0 4 David 28 Tokyo\n",
"1 5 Eve 22 Sydney\n",
"2 6 Frank 40 Berlin\n",
"3 7 Grace 33 Moscow\n",
"4 8 Henry 45 Rome"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.DISTINCT()"
]
},
{
"cell_type": "markdown",
"id": "30181cf9",
"metadata": {},
"source": [
"## INTERSECT\n",
"\n",
"Find the intersection of two DataFrames."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3238e9c6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
" country | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
" Unknown | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
" Unknown | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city country\n",
"0 4 David 28 Tokyo Unknown\n",
"1 5 Eve 22 Sydney Unknown"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.INTERSECT(data2)"
]
},
{
"cell_type": "markdown",
"id": "44d55422",
"metadata": {},
"source": [
"## DIFFERENCE\n",
"\n",
"Find the difference between two DataFrames."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "5aa6c3f5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
" country | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Alice | \n",
" 25 | \n",
" New York | \n",
" Unknown | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Bob | \n",
" 30 | \n",
" London | \n",
" Unknown | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Charlie | \n",
" 35 | \n",
" Paris | \n",
" Unknown | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
" Unknown | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
" Unknown | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city country\n",
"0 1 Alice 25 New York Unknown\n",
"1 2 Bob 30 London Unknown\n",
"2 3 Charlie 35 Paris Unknown\n",
"3 4 David 28 Tokyo Unknown\n",
"4 5 Eve 22 Sydney Unknown"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.DIFFERENCE(data2)"
]
},
{
"cell_type": "markdown",
"id": "206f06a1",
"metadata": {},
"source": [
"## ADD COLUMN\n",
"\n",
"Add a new column to a DataFrame based on an expression."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "aada152b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
" country | \n",
" age_group | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Alice | \n",
" 25 | \n",
" New York | \n",
" Unknown | \n",
" 20 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Bob | \n",
" 30 | \n",
" London | \n",
" Unknown | \n",
" 30 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Charlie | \n",
" 35 | \n",
" Paris | \n",
" Unknown | \n",
" 30 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
" Unknown | \n",
" 20 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
" Unknown | \n",
" 20 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city country age_group\n",
"0 1 Alice 25 New York Unknown 20\n",
"1 2 Bob 30 London Unknown 30\n",
"2 3 Charlie 35 Paris Unknown 30\n",
"3 4 David 28 Tokyo Unknown 20\n",
"4 5 Eve 22 Sydney Unknown 20"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.ADD_COLUMN('age_group', 'age // 10 * 10')"
]
},
{
"cell_type": "markdown",
"id": "3f4df7db",
"metadata": {},
"source": [
"## DELETE\n",
"\n",
"Remove rows from a DataFrame based on a condition."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "92d852ec",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city\n",
"0 4 David 28 Tokyo\n",
"1 5 Eve 22 Sydney"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.DELETE('age > 30')"
]
},
{
"cell_type": "markdown",
"id": "db7e75ca",
"metadata": {},
"source": [
"## UPDATE\n",
"\n",
"Update values in a DataFrame based on a condition."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0a081072",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
"
\n",
" \n",
" 1 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
"
\n",
" \n",
" 2 | \n",
" 6 | \n",
" Frank | \n",
" 40 | \n",
" Updated City | \n",
"
\n",
" \n",
" 3 | \n",
" 7 | \n",
" Grace | \n",
" 33 | \n",
" Updated City | \n",
"
\n",
" \n",
" 4 | \n",
" 8 | \n",
" Henry | \n",
" 45 | \n",
" Updated City | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city\n",
"0 4 David 28 Tokyo\n",
"1 5 Eve 22 Sydney\n",
"2 6 Frank 40 Updated City\n",
"3 7 Grace 33 Updated City\n",
"4 8 Henry 45 Updated City"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.UPDATE('age > 30', {'city': 'Updated City'})"
]
},
{
"cell_type": "markdown",
"id": "13130e0e",
"metadata": {},
"source": [
"## INSERT\n",
"\n",
"Insert new rows into a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "1d55f1c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" name | \n",
" age | \n",
" city | \n",
" country | \n",
" age_group | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Alice | \n",
" 25 | \n",
" New York | \n",
" Unknown | \n",
" 20.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Bob | \n",
" 30 | \n",
" London | \n",
" Unknown | \n",
" 30.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Charlie | \n",
" 35 | \n",
" Paris | \n",
" Unknown | \n",
" 30.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" David | \n",
" 28 | \n",
" Tokyo | \n",
" Unknown | \n",
" 20.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Eve | \n",
" 22 | \n",
" Sydney | \n",
" Unknown | \n",
" 20.0 | \n",
"
\n",
" \n",
" 5 | \n",
" 9 | \n",
" Ivy | \n",
" 27 | \n",
" Madrid | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" 10 | \n",
" Jack | \n",
" 31 | \n",
" Toronto | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name age city country age_group\n",
"0 1 Alice 25 New York Unknown 20.0\n",
"1 2 Bob 30 London Unknown 30.0\n",
"2 3 Charlie 35 Paris Unknown 30.0\n",
"3 4 David 28 Tokyo Unknown 20.0\n",
"4 5 Eve 22 Sydney Unknown 20.0\n",
"5 9 Ivy 27 Madrid NaN NaN\n",
"6 10 Jack 31 Toronto NaN NaN"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_rows = pd.DataFrame({\n",
" 'id': [9, 10],\n",
" 'name': ['Ivy', 'Jack'],\n",
" 'age': [27, 31],\n",
" 'city': ['Madrid', 'Toronto']\n",
"})\n",
"data1.INSERT(new_rows)"
]
},
{
"cell_type": "markdown",
"id": "12c8f9a9",
"metadata": {},
"source": [
"## SQL-Like Aggregation Functions\n",
"\n",
"Demonstrate the SQL-like aggregation functions: COUNT, SUM, AVG, MIN, and MAX."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "2b149b1f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" region | \n",
" units_sold | \n",
" price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" North | \n",
" 100 | \n",
" 10 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" North | \n",
" 200 | \n",
" 15 | \n",
"
\n",
" \n",
" 2 | \n",
" A | \n",
" South | \n",
" 150 | \n",
" 10 | \n",
"
\n",
" \n",
" 3 | \n",
" C | \n",
" North | \n",
" 300 | \n",
" 20 | \n",
"
\n",
" \n",
" 4 | \n",
" B | \n",
" South | \n",
" 250 | \n",
" 15 | \n",
"
\n",
" \n",
" 5 | \n",
" A | \n",
" South | \n",
" 175 | \n",
" 10 | \n",
"
\n",
" \n",
" 6 | \n",
" C | \n",
" South | \n",
" 225 | \n",
" 20 | \n",
"
\n",
" \n",
" 7 | \n",
" B | \n",
" North | \n",
" 180 | \n",
" 15 | \n",
"
\n",
" \n",
" 8 | \n",
" D | \n",
" South | \n",
" 220 | \n",
" 25 | \n",
"
\n",
" \n",
" 9 | \n",
" D | \n",
" North | \n",
" 190 | \n",
" 25 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product region units_sold price\n",
"0 A North 100 10\n",
"1 B North 200 15\n",
"2 A South 150 10\n",
"3 C North 300 20\n",
"4 B South 250 15\n",
"5 A South 175 10\n",
"6 C South 225 20\n",
"7 B North 180 15\n",
"8 D South 220 25\n",
"9 D North 190 25"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Make sure we import the aggregate module\n",
"import seekwellpandas\n",
"import pandas as pd\n",
"\n",
"# Create a new DataFrame with more data points for better demonstration\n",
"sales_data = pd.DataFrame({\n",
" 'product': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'D', 'D'],\n",
" 'region': ['North', 'North', 'South', 'North', 'South', 'South', 'South', 'North', 'South', 'North'],\n",
" 'units_sold': [100, 200, 150, 300, 250, 175, 225, 180, 220, 190],\n",
" 'price': [10, 15, 10, 20, 15, 10, 20, 15, 25, 25]\n",
"})\n",
"sales_data"
]
},
{
"cell_type": "markdown",
"id": "8028de34",
"metadata": {},
"source": [
"### COUNT\n",
"\n",
"Count records by group using the SQL-like COUNT function."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "f3d6efe3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" region | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" North | \n",
" 5 | \n",
"
\n",
" \n",
" 1 | \n",
" South | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" region count\n",
"0 North 5\n",
"1 South 5"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Count products sold in each region\n",
"sales_data.GROUP_BY('region').COUNT()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "97b309ea",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" region | \n",
" product | \n",
" product_count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" North | \n",
" A | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" North | \n",
" B | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" North | \n",
" C | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" North | \n",
" D | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" South | \n",
" A | \n",
" 2 | \n",
"
\n",
" \n",
" 5 | \n",
" South | \n",
" B | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" South | \n",
" C | \n",
" 1 | \n",
"
\n",
" \n",
" 7 | \n",
" South | \n",
" D | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" region product product_count\n",
"0 North A 1\n",
"1 North B 2\n",
"2 North C 1\n",
"3 North D 1\n",
"4 South A 2\n",
"5 South B 1\n",
"6 South C 1\n",
"7 South D 1"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Count specific product types sold in each region\n",
"sales_data.GROUP_BY(['region', 'product']).COUNT('units_sold', 'product_count')"
]
},
{
"cell_type": "markdown",
"id": "1485fc6d",
"metadata": {},
"source": [
"### SUM\n",
"\n",
"Sum values by group using the SQL-like SUM function."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "9e019d99",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" region | \n",
" units_sold_sum | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" North | \n",
" 970 | \n",
"
\n",
" \n",
" 1 | \n",
" South | \n",
" 1020 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" region units_sold_sum\n",
"0 North 970\n",
"1 South 1020"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sum of units sold by region\n",
"sales_data.GROUP_BY('region').SUM('units_sold')"
]
},
{
"cell_type": "markdown",
"id": "db875ea2",
"metadata": {},
"source": [
"### AVG\n",
"\n",
"Calculate averages by group using the SQL-like AVG function."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "6fb4773a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" average_price | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" A | \n",
" 10.0 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" 15.0 | \n",
"
\n",
" \n",
" 2 | \n",
" C | \n",
" 20.0 | \n",
"
\n",
" \n",
" 3 | \n",
" D | \n",
" 25.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product average_price\n",
"0 A 10.0\n",
"1 B 15.0\n",
"2 C 20.0\n",
"3 D 25.0"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Average price by product\n",
"sales_data.GROUP_BY('product').AVG('price', 'average_price')"
]
},
{
"cell_type": "markdown",
"id": "2d15f482",
"metadata": {},
"source": [
"### MIN and MAX\n",
"\n",
"Find minimum and maximum values using the SQL-like MIN and MAX functions."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "e34308ae",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" region | \n",
" min_units | \n",
" max_units | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" North | \n",
" 100 | \n",
" 300 | \n",
"
\n",
" \n",
" 1 | \n",
" South | \n",
" 150 | \n",
" 250 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" region min_units max_units\n",
"0 North 100 300\n",
"1 South 150 250"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Min and Max units sold by region\n",
"min_results = sales_data.GROUP_BY('region').MIN('units_sold', 'min_units')\n",
"max_results = sales_data.GROUP_BY('region').MAX('units_sold', 'max_units')\n",
"min_results.MERGE(max_results, on='region')"
]
},
{
"cell_type": "markdown",
"id": "a1909954",
"metadata": {},
"source": [
"### Complex Example: Sales Analysis\n",
"\n",
"Demonstrate a more complex query chain combining multiple SQL-like operations."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0ee761f1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" product | \n",
" region | \n",
" total_revenue | \n",
"
\n",
" \n",
" \n",
" \n",
" 2 | \n",
" C | \n",
" North | \n",
" 6000 | \n",
"
\n",
" \n",
" 0 | \n",
" B | \n",
" North | \n",
" 5700 | \n",
"
\n",
" \n",
" 5 | \n",
" D | \n",
" South | \n",
" 5500 | \n",
"
\n",
" \n",
" 4 | \n",
" D | \n",
" North | \n",
" 4750 | \n",
"
\n",
" \n",
" 3 | \n",
" C | \n",
" South | \n",
" 4500 | \n",
"
\n",
" \n",
" 1 | \n",
" B | \n",
" South | \n",
" 3750 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" product region total_revenue\n",
"2 C North 6000\n",
"0 B North 5700\n",
"5 D South 5500\n",
"4 D North 4750\n",
"3 C South 4500\n",
"1 B South 3750"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" sales_data\n",
" .ADD_COLUMN('revenue', 'units_sold * price')\n",
" .WHERE('revenue > 2000')\n",
" .GROUP_BY(['product', 'region'])\n",
" .SUM('revenue', 'total_revenue')\n",
" .ORDER_BY('total_revenue', ascending=False)\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}