{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# seekwellpandas: Basic Usage\n", "\n", "This notebook demonstrates the basic usage of the seekwellpandas library, which extends pandas with SQL-like functionality.\n", "\n", "## Setup\n", "\n", "First, let's import the necessary libraries and create some sample data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecity
01Alice25New York
12Bob30London
23Charlie35Paris
34David28Tokyo
45Eve22Sydney
56Frank40Berlin
67Grace33Moscow
78Henry45Rome
89Ivy27Madrid
910Jack31Toronto
\n", "
" ], "text/plain": [ " id name age city\n", "0 1 Alice 25 New York\n", "1 2 Bob 30 London\n", "2 3 Charlie 35 Paris\n", "3 4 David 28 Tokyo\n", "4 5 Eve 22 Sydney\n", "5 6 Frank 40 Berlin\n", "6 7 Grace 33 Moscow\n", "7 8 Henry 45 Rome\n", "8 9 Ivy 27 Madrid\n", "9 10 Jack 31 Toronto" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import seekwellpandas\n", "\n", "# Create sample data\n", "people = pd.DataFrame({\n", " 'id': range(1, 11),\n", " 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],\n", " 'age': [25, 30, 35, 28, 22, 40, 33, 45, 27, 31],\n", " 'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome', 'Madrid', 'Toronto']\n", "})\n", "people" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Operations\n", "\n", "### Select\n", "\n", "The `select` method allows you to choose specific columns from the DataFrame.\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameage
0Alice25
1Bob30
2Charlie35
3David28
4Eve22
5Frank40
6Grace33
7Henry45
8Ivy27
9Jack31
\n", "
" ], "text/plain": [ " name age\n", "0 Alice 25\n", "1 Bob 30\n", "2 Charlie 35\n", "3 David 28\n", "4 Eve 22\n", "5 Frank 40\n", "6 Grace 33\n", "7 Henry 45\n", "8 Ivy 27\n", "9 Jack 31" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.select('name', 'age')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Negative selections are also supported." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameagecity
0Alice25New York
1Bob30London
2Charlie35Paris
3David28Tokyo
4Eve22Sydney
5Frank40Berlin
6Grace33Moscow
7Henry45Rome
8Ivy27Madrid
9Jack31Toronto
\n", "
" ], "text/plain": [ " name age city\n", "0 Alice 25 New York\n", "1 Bob 30 London\n", "2 Charlie 35 Paris\n", "3 David 28 Tokyo\n", "4 Eve 22 Sydney\n", "5 Frank 40 Berlin\n", "6 Grace 33 Moscow\n", "7 Henry 45 Rome\n", "8 Ivy 27 Madrid\n", "9 Jack 31 Toronto" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.select('-id')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Where\n", "\n", "Use `where` to filter rows based on a condition." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecity
23Charlie35Paris
56Frank40Berlin
67Grace33Moscow
78Henry45Rome
910Jack31Toronto
\n", "
" ], "text/plain": [ " id name age city\n", "2 3 Charlie 35 Paris\n", "5 6 Frank 40 Berlin\n", "6 7 Grace 33 Moscow\n", "7 8 Henry 45 Rome\n", "9 10 Jack 31 Toronto" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.where_('age > 30')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Group By\n", "\n", "Group the data by a specific column." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
age
city
Berlin40.0
London30.0
Madrid27.0
Moscow33.0
New York25.0
Paris35.0
Rome45.0
Sydney22.0
Tokyo28.0
Toronto31.0
\n", "
" ], "text/plain": [ " age\n", "city \n", "Berlin 40.0\n", "London 30.0\n", "Madrid 27.0\n", "Moscow 33.0\n", "New York 25.0\n", "Paris 35.0\n", "Rome 45.0\n", "Sydney 22.0\n", "Tokyo 28.0\n", "Toronto 31.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.group_by('city').agg({'age': 'mean'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Order By\n", "\n", "Sort the DataFrame based on one or more columns." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecity
78Henry45Rome
56Frank40Berlin
23Charlie35Paris
67Grace33Moscow
910Jack31Toronto
12Bob30London
34David28Tokyo
89Ivy27Madrid
01Alice25New York
45Eve22Sydney
\n", "
" ], "text/plain": [ " id name age city\n", "7 8 Henry 45 Rome\n", "5 6 Frank 40 Berlin\n", "2 3 Charlie 35 Paris\n", "6 7 Grace 33 Moscow\n", "9 10 Jack 31 Toronto\n", "1 2 Bob 30 London\n", "3 4 David 28 Tokyo\n", "8 9 Ivy 27 Madrid\n", "0 1 Alice 25 New York\n", "4 5 Eve 22 Sydney" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.order_by('age', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Limit\n", "\n", "Limit the number of rows returned." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecity
45Eve22Sydney
01Alice25New York
89Ivy27Madrid
34David28Tokyo
12Bob30London
\n", "
" ], "text/plain": [ " id name age city\n", "4 5 Eve 22 Sydney\n", "0 1 Alice 25 New York\n", "8 9 Ivy 27 Madrid\n", "3 4 David 28 Tokyo\n", "1 2 Bob 30 London" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "people.order_by('age').limit(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Operations\n", "\n", "### Join\n", "\n", "Demonstrate joining two DataFrames." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecitycountry
01Alice25New YorkUSA
12Bob30LondonUK
23Charlie35ParisFrance
34David28TokyoJapan
45Eve22SydneyAustralia
\n", "
" ], "text/plain": [ " id name age city country\n", "0 1 Alice 25 New York USA\n", "1 2 Bob 30 London UK\n", "2 3 Charlie 35 Paris France\n", "3 4 David 28 Tokyo Japan\n", "4 5 Eve 22 Sydney Australia" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "countries = pd.DataFrame({\n", " 'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],\n", " 'country': ['USA', 'UK', 'France', 'Japan', 'Australia']\n", "})\n", "\n", "people.join_(countries, on='city')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Union\n", "\n", "Combine two DataFrames vertically." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecity
01Alice25New York
12Bob30London
23Charlie35Paris
34David28Tokyo
45Eve22Sydney
56Frank40Berlin
67Grace33Moscow
78Henry45Rome
89Ivy27Madrid
910Jack31Toronto
1011Karen29Chicago
1112Leo36Dublin
1213Mike41Amsterdam
1314Nina24Oslo
1415Oscar38Stockholm
\n", "
" ], "text/plain": [ " id name age city\n", "0 1 Alice 25 New York\n", "1 2 Bob 30 London\n", "2 3 Charlie 35 Paris\n", "3 4 David 28 Tokyo\n", "4 5 Eve 22 Sydney\n", "5 6 Frank 40 Berlin\n", "6 7 Grace 33 Moscow\n", "7 8 Henry 45 Rome\n", "8 9 Ivy 27 Madrid\n", "9 10 Jack 31 Toronto\n", "10 11 Karen 29 Chicago\n", "11 12 Leo 36 Dublin\n", "12 13 Mike 41 Amsterdam\n", "13 14 Nina 24 Oslo\n", "14 15 Oscar 38 Stockholm" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "other_people = pd.DataFrame({\n", " 'id': range(11, 16),\n", " 'name': ['Karen', 'Leo', 'Mike', 'Nina', 'Oscar'],\n", " 'age': [29, 36, 41, 24, 38],\n", " 'city': ['Chicago', 'Dublin', 'Amsterdam', 'Oslo', 'Stockholm']\n", "})\n", "all_people = people.union(other_people)\n", "all_people" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### With Column\n", "\n", "Add a new column based on an expression." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagecityage_group
01Alice25New York20
12Bob30London30
23Charlie35Paris30
34David28Tokyo20
45Eve22Sydney20
56Frank40Berlin40
67Grace33Moscow30
78Henry45Rome40
89Ivy27Madrid20
910Jack31Toronto30
1011Karen29Chicago20
1112Leo36Dublin30
1213Mike41Amsterdam40
1314Nina24Oslo20
1415Oscar38Stockholm30
\n", "
" ], "text/plain": [ " id name age city age_group\n", "0 1 Alice 25 New York 20\n", "1 2 Bob 30 London 30\n", "2 3 Charlie 35 Paris 30\n", "3 4 David 28 Tokyo 20\n", "4 5 Eve 22 Sydney 20\n", "5 6 Frank 40 Berlin 40\n", "6 7 Grace 33 Moscow 30\n", "7 8 Henry 45 Rome 40\n", "8 9 Ivy 27 Madrid 20\n", "9 10 Jack 31 Toronto 30\n", "10 11 Karen 29 Chicago 20\n", "11 12 Leo 36 Dublin 30\n", "12 13 Mike 41 Amsterdam 40\n", "13 14 Nina 24 Oslo 20\n", "14 15 Oscar 38 Stockholm 30" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_people.with_column('age_group', 'age // 10 * 10')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rename Column\n", "\n", "Rename an existing column." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameagelocationage_group
01Alice25New York20
12Bob30London30
23Charlie35Paris30
34David28Tokyo20
45Eve22Sydney20
56Frank40Berlin40
67Grace33Moscow30
78Henry45Rome40
89Ivy27Madrid20
910Jack31Toronto30
1011Karen29Chicago20
1112Leo36Dublin30
1213Mike41Amsterdam40
1314Nina24Oslo20
1415Oscar38Stockholm30
\n", "
" ], "text/plain": [ " id name age location age_group\n", "0 1 Alice 25 New York 20\n", "1 2 Bob 30 London 30\n", "2 3 Charlie 35 Paris 30\n", "3 4 David 28 Tokyo 20\n", "4 5 Eve 22 Sydney 20\n", "5 6 Frank 40 Berlin 40\n", "6 7 Grace 33 Moscow 30\n", "7 8 Henry 45 Rome 40\n", "8 9 Ivy 27 Madrid 20\n", "9 10 Jack 31 Toronto 30\n", "10 11 Karen 29 Chicago 20\n", "11 12 Leo 36 Dublin 30\n", "12 13 Mike 41 Amsterdam 40\n", "13 14 Nina 24 Oslo 20\n", "14 15 Oscar 38 Stockholm 30" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_people.rename_column('city', 'location')" ] } ], "metadata": { "kernelspec": { "display_name": "Python (seekwellpandas)", "language": "python", "name": "seekwellpandas" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 2 }