seekwellpandas: Basic Usage¶
This notebook demonstrates the basic usage of the seekwellpandas library, which extends pandas with SQL-like functionality.
Setup¶
First, let’s import the necessary libraries and create some sample data.
[1]:
import pandas as pd
import seekwellpandas
# Create sample data
people = pd.DataFrame({
'id': range(1, 11),
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Henry', 'Ivy', 'Jack'],
'age': [25, 30, 35, 28, 22, 40, 33, 45, 27, 31],
'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney', 'Berlin', 'Moscow', 'Rome', 'Madrid', 'Toronto']
})
people
[1]:
id | name | age | city | |
---|---|---|---|---|
0 | 1 | Alice | 25 | New York |
1 | 2 | Bob | 30 | London |
2 | 3 | Charlie | 35 | Paris |
3 | 4 | David | 28 | Tokyo |
4 | 5 | Eve | 22 | Sydney |
5 | 6 | Frank | 40 | Berlin |
6 | 7 | Grace | 33 | Moscow |
7 | 8 | Henry | 45 | Rome |
8 | 9 | Ivy | 27 | Madrid |
9 | 10 | Jack | 31 | Toronto |
Basic Operations¶
Select¶
The select
method allows you to choose specific columns from the DataFrame.
[2]:
people.select('name', 'age')
[2]:
name | age | |
---|---|---|
0 | Alice | 25 |
1 | Bob | 30 |
2 | Charlie | 35 |
3 | David | 28 |
4 | Eve | 22 |
5 | Frank | 40 |
6 | Grace | 33 |
7 | Henry | 45 |
8 | Ivy | 27 |
9 | Jack | 31 |
Negative selections are also supported.
[3]:
people.select('-id')
[3]:
name | age | city | |
---|---|---|---|
0 | Alice | 25 | New York |
1 | Bob | 30 | London |
2 | Charlie | 35 | Paris |
3 | David | 28 | Tokyo |
4 | Eve | 22 | Sydney |
5 | Frank | 40 | Berlin |
6 | Grace | 33 | Moscow |
7 | Henry | 45 | Rome |
8 | Ivy | 27 | Madrid |
9 | Jack | 31 | Toronto |
Where¶
Use where
to filter rows based on a condition.
[4]:
people.where_('age > 30')
[4]:
id | name | age | city | |
---|---|---|---|---|
2 | 3 | Charlie | 35 | Paris |
5 | 6 | Frank | 40 | Berlin |
6 | 7 | Grace | 33 | Moscow |
7 | 8 | Henry | 45 | Rome |
9 | 10 | Jack | 31 | Toronto |
Group By¶
Group the data by a specific column.
[5]:
people.group_by('city').agg({'age': 'mean'})
[5]:
age | |
---|---|
city | |
Berlin | 40.0 |
London | 30.0 |
Madrid | 27.0 |
Moscow | 33.0 |
New York | 25.0 |
Paris | 35.0 |
Rome | 45.0 |
Sydney | 22.0 |
Tokyo | 28.0 |
Toronto | 31.0 |
Order By¶
Sort the DataFrame based on one or more columns.
[6]:
people.order_by('age', ascending=False)
[6]:
id | name | age | city | |
---|---|---|---|---|
7 | 8 | Henry | 45 | Rome |
5 | 6 | Frank | 40 | Berlin |
2 | 3 | Charlie | 35 | Paris |
6 | 7 | Grace | 33 | Moscow |
9 | 10 | Jack | 31 | Toronto |
1 | 2 | Bob | 30 | London |
3 | 4 | David | 28 | Tokyo |
8 | 9 | Ivy | 27 | Madrid |
0 | 1 | Alice | 25 | New York |
4 | 5 | Eve | 22 | Sydney |
Limit¶
Limit the number of rows returned.
[7]:
people.order_by('age').limit(5)
[7]:
id | name | age | city | |
---|---|---|---|---|
4 | 5 | Eve | 22 | Sydney |
0 | 1 | Alice | 25 | New York |
8 | 9 | Ivy | 27 | Madrid |
3 | 4 | David | 28 | Tokyo |
1 | 2 | Bob | 30 | London |
Advanced Operations¶
Join¶
Demonstrate joining two DataFrames.
[8]:
countries = pd.DataFrame({
'city': ['New York', 'London', 'Paris', 'Tokyo', 'Sydney'],
'country': ['USA', 'UK', 'France', 'Japan', 'Australia']
})
people.join_(countries, on='city')
[8]:
id | name | age | city | country | |
---|---|---|---|---|---|
0 | 1 | Alice | 25 | New York | USA |
1 | 2 | Bob | 30 | London | UK |
2 | 3 | Charlie | 35 | Paris | France |
3 | 4 | David | 28 | Tokyo | Japan |
4 | 5 | Eve | 22 | Sydney | Australia |
Union¶
Combine two DataFrames vertically.
[9]:
other_people = pd.DataFrame({
'id': range(11, 16),
'name': ['Karen', 'Leo', 'Mike', 'Nina', 'Oscar'],
'age': [29, 36, 41, 24, 38],
'city': ['Chicago', 'Dublin', 'Amsterdam', 'Oslo', 'Stockholm']
})
all_people = people.union(other_people)
all_people
[9]:
id | name | age | city | |
---|---|---|---|---|
0 | 1 | Alice | 25 | New York |
1 | 2 | Bob | 30 | London |
2 | 3 | Charlie | 35 | Paris |
3 | 4 | David | 28 | Tokyo |
4 | 5 | Eve | 22 | Sydney |
5 | 6 | Frank | 40 | Berlin |
6 | 7 | Grace | 33 | Moscow |
7 | 8 | Henry | 45 | Rome |
8 | 9 | Ivy | 27 | Madrid |
9 | 10 | Jack | 31 | Toronto |
10 | 11 | Karen | 29 | Chicago |
11 | 12 | Leo | 36 | Dublin |
12 | 13 | Mike | 41 | Amsterdam |
13 | 14 | Nina | 24 | Oslo |
14 | 15 | Oscar | 38 | Stockholm |
With Column¶
Add a new column based on an expression.
[10]:
all_people.with_column('age_group', 'age // 10 * 10')
[10]:
id | name | age | city | age_group | |
---|---|---|---|---|---|
0 | 1 | Alice | 25 | New York | 20 |
1 | 2 | Bob | 30 | London | 30 |
2 | 3 | Charlie | 35 | Paris | 30 |
3 | 4 | David | 28 | Tokyo | 20 |
4 | 5 | Eve | 22 | Sydney | 20 |
5 | 6 | Frank | 40 | Berlin | 40 |
6 | 7 | Grace | 33 | Moscow | 30 |
7 | 8 | Henry | 45 | Rome | 40 |
8 | 9 | Ivy | 27 | Madrid | 20 |
9 | 10 | Jack | 31 | Toronto | 30 |
10 | 11 | Karen | 29 | Chicago | 20 |
11 | 12 | Leo | 36 | Dublin | 30 |
12 | 13 | Mike | 41 | Amsterdam | 40 |
13 | 14 | Nina | 24 | Oslo | 20 |
14 | 15 | Oscar | 38 | Stockholm | 30 |
Rename Column¶
Rename an existing column.
[11]:
all_people.rename_column('city', 'location')
[11]:
id | name | age | location | age_group | |
---|---|---|---|---|---|
0 | 1 | Alice | 25 | New York | 20 |
1 | 2 | Bob | 30 | London | 30 |
2 | 3 | Charlie | 35 | Paris | 30 |
3 | 4 | David | 28 | Tokyo | 20 |
4 | 5 | Eve | 22 | Sydney | 20 |
5 | 6 | Frank | 40 | Berlin | 40 |
6 | 7 | Grace | 33 | Moscow | 30 |
7 | 8 | Henry | 45 | Rome | 40 |
8 | 9 | Ivy | 27 | Madrid | 20 |
9 | 10 | Jack | 31 | Toronto | 30 |
10 | 11 | Karen | 29 | Chicago | 20 |
11 | 12 | Leo | 36 | Dublin | 30 |
12 | 13 | Mike | 41 | Amsterdam | 40 |
13 | 14 | Nina | 24 | Oslo | 20 |
14 | 15 | Oscar | 38 | Stockholm | 30 |