seekwellpandas: Basic Usage

This notebook demonstrates the basic usage of the seekwellpandas library, which extends pandas with SQL-like functionality.

Setup

First, let’s import the necessary libraries and create some sample data.

[1]:
import pandas as pd
import seekwellpandas

# Create sample data
people = pd.DataFrame({
    'id': range(1, 11),
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
    'age': [25, 30, 35, 28, 22, 40, 33, 45, 27, 31],
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome', 'Madrid', 'Toronto']
})
people
[1]:
id name age city
0 1 Alice 25 New York
1 2 Bob 30 London
2 3 Charlie 35 Paris
3 4 David 28 Tokyo
4 5 Eve 22 Sydney
5 6 Frank 40 Berlin
6 7 Grace 33 Moscow
7 8 Henry 45 Rome
8 9 Ivy 27 Madrid
9 10 Jack 31 Toronto

Basic Operations

Select

The select method allows you to choose specific columns from the DataFrame.

[2]:
people.select('name', 'age')
[2]:
name age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 28
4 Eve 22
5 Frank 40
6 Grace 33
7 Henry 45
8 Ivy 27
9 Jack 31

Negative selections are also supported.

[3]:
people.select('-id')
[3]:
name age city
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 David 28 Tokyo
4 Eve 22 Sydney
5 Frank 40 Berlin
6 Grace 33 Moscow
7 Henry 45 Rome
8 Ivy 27 Madrid
9 Jack 31 Toronto

Where

Use where to filter rows based on a condition.

[4]:
people.where_('age > 30')
[4]:
id name age city
2 3 Charlie 35 Paris
5 6 Frank 40 Berlin
6 7 Grace 33 Moscow
7 8 Henry 45 Rome
9 10 Jack 31 Toronto

Group By

Group the data by a specific column.

[5]:
people.group_by('city').agg({'age': 'mean'})
[5]:
age
city
Berlin 40.0
London 30.0
Madrid 27.0
Moscow 33.0
New York 25.0
Paris 35.0
Rome 45.0
Sydney 22.0
Tokyo 28.0
Toronto 31.0

Order By

Sort the DataFrame based on one or more columns.

[6]:
people.order_by('age', ascending=False)
[6]:
id name age city
7 8 Henry 45 Rome
5 6 Frank 40 Berlin
2 3 Charlie 35 Paris
6 7 Grace 33 Moscow
9 10 Jack 31 Toronto
1 2 Bob 30 London
3 4 David 28 Tokyo
8 9 Ivy 27 Madrid
0 1 Alice 25 New York
4 5 Eve 22 Sydney

Limit

Limit the number of rows returned.

[7]:
people.order_by('age').limit(5)
[7]:
id name age city
4 5 Eve 22 Sydney
0 1 Alice 25 New York
8 9 Ivy 27 Madrid
3 4 David 28 Tokyo
1 2 Bob 30 London

Advanced Operations

Join

Demonstrate joining two DataFrames.

[8]:
countries = pd.DataFrame({
    'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
    'country': ['USA', 'UK', 'France', 'Japan', 'Australia']
})

people.join_(countries, on='city')
[8]:
id name age city country
0 1 Alice 25 New York USA
1 2 Bob 30 London UK
2 3 Charlie 35 Paris France
3 4 David 28 Tokyo Japan
4 5 Eve 22 Sydney Australia

Union

Combine two DataFrames vertically.

[9]:
other_people = pd.DataFrame({
    'id': range(11, 16),
    'name': ['Karen', 'Leo', 'Mike', 'Nina', 'Oscar'],
    'age': [29, 36, 41, 24, 38],
    'city': ['Chicago', 'Dublin', 'Amsterdam', 'Oslo', 'Stockholm']
})
all_people = people.union(other_people)
all_people
[9]:
id name age city
0 1 Alice 25 New York
1 2 Bob 30 London
2 3 Charlie 35 Paris
3 4 David 28 Tokyo
4 5 Eve 22 Sydney
5 6 Frank 40 Berlin
6 7 Grace 33 Moscow
7 8 Henry 45 Rome
8 9 Ivy 27 Madrid
9 10 Jack 31 Toronto
10 11 Karen 29 Chicago
11 12 Leo 36 Dublin
12 13 Mike 41 Amsterdam
13 14 Nina 24 Oslo
14 15 Oscar 38 Stockholm

With Column

Add a new column based on an expression.

[10]:
all_people.with_column('age_group', 'age // 10 * 10')
[10]:
id name age city age_group
0 1 Alice 25 New York 20
1 2 Bob 30 London 30
2 3 Charlie 35 Paris 30
3 4 David 28 Tokyo 20
4 5 Eve 22 Sydney 20
5 6 Frank 40 Berlin 40
6 7 Grace 33 Moscow 30
7 8 Henry 45 Rome 40
8 9 Ivy 27 Madrid 20
9 10 Jack 31 Toronto 30
10 11 Karen 29 Chicago 20
11 12 Leo 36 Dublin 30
12 13 Mike 41 Amsterdam 40
13 14 Nina 24 Oslo 20
14 15 Oscar 38 Stockholm 30

Rename Column

Rename an existing column.

[11]:
all_people.rename_column('city', 'location')
[11]:
id name age location age_group
0 1 Alice 25 New York 20
1 2 Bob 30 London 30
2 3 Charlie 35 Paris 30
3 4 David 28 Tokyo 20
4 5 Eve 22 Sydney 20
5 6 Frank 40 Berlin 40
6 7 Grace 33 Moscow 30
7 8 Henry 45 Rome 40
8 9 Ivy 27 Madrid 20
9 10 Jack 31 Toronto 30
10 11 Karen 29 Chicago 20
11 12 Leo 36 Dublin 30
12 13 Mike 41 Amsterdam 40
13 14 Nina 24 Oslo 20
14 15 Oscar 38 Stockholm 30