Seekwellpandas: basic usage¶

This notebook demonstrates the basic usage of the SEEKWELLPANDAS library, which extends pandas with SQL-like functionality.

Setup¶

First, let’s import the necessary libraries and create some sample data.

[16]:

import pandas as pd
import seekwellpandas

# Create sample data
people = pd.DataFrame({
    'id': range(1, 11),
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
    'age': [25, 30, 35, 28, 22, 40, 33, 45, 27, 31],
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome', 'Madrid', 'Toronto']
})
people

[16]:

	id	name	age	city
0	1	Alice	25	New York
1	2	Bob	30	London
2	3	Charlie	35	Paris
3	4	David	28	Tokyo
4	5	Eve	22	Sydney
5	6	Frank	40	Berlin
6	7	Grace	33	Moscow
7	8	Henry	45	Rome
8	9	Ivy	27	Madrid
9	10	Jack	31	Toronto

Basic Operations¶

SELECT¶

The SELECT method allows you to choose specific columns from the DataFrame.

[17]:

people.SELECT('name', 'age')

[17]:

	name	age
0	Alice	25
1	Bob	30
2	Charlie	35
3	David	28
4	Eve	22
5	Frank	40
6	Grace	33
7	Henry	45
8	Ivy	27
9	Jack	31

Negative selections are also supported.

[18]:

people.SELECT('-id')

[18]:

	name	age	city
0	Alice	25	New York
1	Bob	30	London
2	Charlie	35	Paris
3	David	28	Tokyo
4	Eve	22	Sydney
5	Frank	40	Berlin
6	Grace	33	Moscow
7	Henry	45	Rome
8	Ivy	27	Madrid
9	Jack	31	Toronto

WHERE¶

Use WHERE to filter rows based on a condition.

[19]:

people.WHERE('age > 30')

[19]:

	id	name	age	city
2	3	Charlie	35	Paris
5	6	Frank	40	Berlin
6	7	Grace	33	Moscow
7	8	Henry	45	Rome
9	10	Jack	31	Toronto

GROUP BY¶

Group the data by a specific column.

[20]:

# Corrected example without FROM
people.SELECT(
    columns=['city'],
    aggregate={'age': 'mean'},
    alias='mean_age'
).GROUP_BY('city')

[20]:

	mean_age
city
Berlin	40.0
London	30.0
Madrid	27.0
Moscow	33.0
New York	25.0
Paris	35.0
Rome	45.0
Sydney	22.0
Tokyo	28.0
Toronto	31.0

Order By¶

Sort the DataFrame based on one or more columns.

[21]:

people.ORDER_BY('age', ascending=False)

[21]:

	id	name	age	city
7	8	Henry	45	Rome
5	6	Frank	40	Berlin
2	3	Charlie	35	Paris
6	7	Grace	33	Moscow
9	10	Jack	31	Toronto
1	2	Bob	30	London
3	4	David	28	Tokyo
8	9	Ivy	27	Madrid
0	1	Alice	25	New York
4	5	Eve	22	Sydney

Limit¶

Limit the number of rows returned.

[22]:

people.ORDER_BY('age').LIMIT(5)

[22]:

	id	name	age	city
4	5	Eve	22	Sydney
0	1	Alice	25	New York
8	9	Ivy	27	Madrid
3	4	David	28	Tokyo
1	2	Bob	30	London

Advanced Operations¶

Join¶

Demonstrate joining two DataFrames.

[23]:

countries = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'country': ['USA', 'UK', 'France', 'Japan', 'Australia']
})

people.JOIN(countries, on='city')

[23]:

	id	name	age	city	country
0	1	Alice	25	New York	USA
1	2	Bob	30	London	UK
2	3	Charlie	35	Paris	France
3	4	David	28	Tokyo	Japan
4	5	Eve	22	Sydney	Australia

Union¶

Combine two DataFrames vertically.

[24]:

other_people = pd.DataFrame({
    'id': range(11, 16),
    'name': ['Karen', 'Leo', 'Mike', 'Nina', 'Oscar'],
    'age': [29, 36, 41, 24, 38],
    'city': ['Chicago', 'Dublin', 'Amsterdam', 'Oslo', 'Stockholm']
})
all_people = people.UNION(other_people)
all_people

[24]:

	id	name	age	city
0	1	Alice	25	New York
1	2	Bob	30	London
2	3	Charlie	35	Paris
3	4	David	28	Tokyo
4	5	Eve	22	Sydney
5	6	Frank	40	Berlin
6	7	Grace	33	Moscow
7	8	Henry	45	Rome
8	9	Ivy	27	Madrid
9	10	Jack	31	Toronto
10	11	Karen	29	Chicago
11	12	Leo	36	Dublin
12	13	Mike	41	Amsterdam
13	14	Nina	24	Oslo
14	15	Oscar	38	Stockholm

Add Column¶

Add a new column based on an expression.

[26]:

all_people.ADD_COLUMN('age_group', 'age // 10 * 10')

[26]:

	id	name	age	city	age_group
0	1	Alice	25	New York	20
1	2	Bob	30	London	30
2	3	Charlie	35	Paris	30
3	4	David	28	Tokyo	20
4	5	Eve	22	Sydney	20
5	6	Frank	40	Berlin	40
6	7	Grace	33	Moscow	30
7	8	Henry	45	Rome	40
8	9	Ivy	27	Madrid	20
9	10	Jack	31	Toronto	30
10	11	Karen	29	Chicago	20
11	12	Leo	36	Dublin	30
12	13	Mike	41	Amsterdam	40
13	14	Nina	24	Oslo	20
14	15	Oscar	38	Stockholm	30

Rename Column¶

Rename an existing column.

[27]:

all_people.RENAME_COLUMN('city', 'location')

[27]:

	id	name	age	location	age_group
0	1	Alice	25	New York	20
1	2	Bob	30	London	30
2	3	Charlie	35	Paris	30
3	4	David	28	Tokyo	20
4	5	Eve	22	Sydney	20
5	6	Frank	40	Berlin	40
6	7	Grace	33	Moscow	30
7	8	Henry	45	Rome	40
8	9	Ivy	27	Madrid	20
9	10	Jack	31	Toronto	30
10	11	Karen	29	Chicago	20
11	12	Leo	36	Dublin	30
12	13	Mike	41	Amsterdam	40
13	14	Nina	24	Oslo	20
14	15	Oscar	38	Stockholm	30

SQL-Like Aggregation Functions¶

SEEKWELLPANDAS includes SQL-like aggregation functions that can be applied after grouping data.

[ ]:

# Create a dataset with more rows for better demonstration
sales = pd.DataFrame({
    'product': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B'],
    'region': ['North', 'North', 'South', 'North', 'South', 'South', 'South', 'North'],
    'units': [100, 200, 150, 300, 250, 175, 225, 180],
    'price': [10, 15, 10, 20, 15, 10, 20, 15]
})
sales

COUNT¶

Count the number of records in each group.

[ ]:

# Count records by region
sales.GROUP_BY('region').COUNT()

SUM¶

Calculate the sum of a column for each group.

[ ]:

# Sum units by product
sales.GROUP_BY('product').SUM('units')

AVG¶

Calculate the average of a column for each group.

[ ]:

# Average price by region
sales.GROUP_BY('region').AVG('price')

MIN and MAX¶

Find the minimum and maximum values in each group.

[ ]:

# Minimum and maximum units by region
min_units = sales.GROUP_BY('region').MIN('units')
max_units = sales.GROUP_BY('region').MAX('units')

# Display results side by side
pd.merge(min_units, max_units, on='region')

Complex Example¶

Combining SQL-like operations for advanced analysis.

[ ]:

# Calculate revenue and filter for high-value sales
result = (
    sales.ADD_COLUMN('revenue', 'units * price')
         .WHERE('revenue > 2000')
         .GROUP_BY('product')
         .SUM('revenue', 'total_revenue')
         .ORDER_BY('total_revenue', ascending=False)
)
result