# Usage¶

## API¶

StupiDB’s user facing API is heavily inspired by dplyr.

## Constructing a Relation¶

You can construct a relation (a table) by calling the table function with a list of mappings

```
>>> from stupidb import *
>>> from datetime import date, timedelta
>>> today = date(2019, 2, 9)
>>> days = timedelta(days=1)
>>> rows = [
... {"name": "Alice", "balance": 400, "date": today},
... {"name": "Alice", "balance": 300, "date": today + 1 * days},
... {"name": "Alice", "balance": 100, "date": today + 2 * days},
... {"name": "Bob", "balance": -150, "date": today - 4 * days},
... {"name": "Bob", "balance": 200, "date": today - 3 * days},
... ]
>>> t = table(rows)
>>> t
name balance date
------ --------- ----------
Alice 400 2019-02-09
Alice 300 2019-02-10
Alice 100 2019-02-11
Bob -150 2019-02-05
Bob 200 2019-02-06
```

Since every `Relation`

in StupiDB implements the
iterator protocol (see
`stupidb.stupidb.Relation.__iter__()`

), you can

materialize the rows of a relation by calling `list`

on the relation.

Note

The `Row`

objects that make up the elements of the
`list`

above are a very thin layer on top of `dict`

, allowing
two things:

Column access by attribute

User friendly handling of ambiguous column naming in

`Join`

relations.

One design goal of StupiDB is that it allows you to use any mapping you want to
to represent a row, as long as it conforms to the `typing.Mapping[str, T]`

interface (where `T`

is an instance of `typing.TypeVar`

).

## Operations on Relations¶

StupiDB provides standard operations over relations:

Projection (column selection, SELECT,

`select()`

).Selection (row filtering, WHERE,

`sift()`

).Simple aggregation, using

`aggregate()`

.Window functions (including standard aggregate functions, and

`lead()`

,`lag()`

, etc.).Group by (aggregate by a key, GROUP BY,

`group_by()`

)Order by (sorting a relation by one or more columns, ORDER BY,

`order_by()`

)Join (match rows in one table to another, INNER JOIN, LEFT JOIN, etc., e.g.,

`left_join()`

)Set operations (UNION [ALL], INTERSECT [ALL], EXCEPT [ALL], using

`union()`

,`union_all()`

,`intersect()`

,`intersect_all()`

,`difference()`

,`difference_all()`

)

We’ll briefly describe each of these in turn and and show how to use them in the stupidest way.

## Projection (`SELECT`

)¶

```
>>> name_and_bal = (
... table(rows) >> select(n=lambda r: r.name, b=lambda r: r.balance)
... )
>>> bal_times_2 = name_and_bal >> mutate(bal2=lambda r: r.b * 2)
>>> bal_times_2
n b bal2
----- ---- ------
Alice 400 800
Alice 300 600
Alice 100 200
Bob -150 -300
Bob 200 400
```

The `mutate()`

function preserves the child table in the
result, while `select()`

does not.

## Selection (`WHERE`

)¶

Filtering rows is done with the `sift()`

function.

```
>>> alice = table(rows) >> sift(lambda r: r.name == "Alice")
>>> alice
name balance date
------ --------- ----------
Alice 400 2019-02-09
Alice 300 2019-02-10
Alice 100 2019-02-11
```

## Simple Aggregation¶

```
>>> agg = table(rows) >> aggregate(
... my_sum=sum(lambda r: r.balance),
... my_avg=mean(lambda r: r.balance)
... )
>>> agg
my_sum my_avg
-------- --------
850 170
```

`GROUP BY`

¶

```
>>> gb = (
... table(rows) >> group_by(name=lambda r: r.name)
... >> aggregate(bal_over_time=sum(lambda r: r.balance))
... )
>>> gb
name bal_over_time
------ ---------------
Alice 800
Bob 50
```

`ORDER BY`

¶

To sort in ascending order of the specified columns:

```
>>> ob = table(rows) >> order_by(lambda r: r.name, lambda r: r.date)
>>> ob
name balance date
------ --------- ----------
Alice 400 2019-02-09
Alice 300 2019-02-10
Alice 100 2019-02-11
Bob -150 2019-02-05
Bob 200 2019-02-06
```

Currently there is no convenient way to sort descending if your order by values are not numeric.

## Joins¶

`CROSS JOIN`

¶

For two relations \(L\) and \(R\), the cross join, denoted \(\times\), is defined as:

It’s worth noting that all joins can be defined as variations and filters on a cross join.

In stupidb this is:

```
>>> L >> cross_join(R)
```

`INNER JOIN`

¶

Given the definition of a cross join and two relations \(L\) and \(R\) and a predicate \(p\left(l, r\right)\rightarrow\mbox{bool}\), which is a function that takes a tuple \(l\in{L}\) and a tuple \(r\in{R}\) the inner join is defined as:

In stupidb this is:

```
>>> L >> inner_join(R, lambda left, right: left.name == right.name)
```

`LEFT JOIN`

¶

The left join is the set of rows from an inner join of two relations, plus the rows from the left relation that are not in the inner join, substituting NULL values for those attributes that are missing in the inner join.

In stupidb this is:

```
>>> L >> left_join(R, lambda left, right: left.name == right.name)
```

`RIGHT JOIN`

¶

The right join follows the same logic as the left join, with the tables reversed.

In stupidb this is:

```
>>> L >> right_join(R, lambda left, right: left.name == right.name)
```

## Set Operations¶

`UNION`

¶

The union of two relations \(L\) and \(R\) is defined as:

that is, tuples that are in either \(L\) or \(R\).

In stupidb this is:

```
>>> L >> union(R)
```

`INTERSECT`

¶

The intersection of two relations \(L\) and \(R\) is defined as:

that is, tuples that are in both \(L\) and \(R\).

In stupidb this is:

```
>>> L >> intersect(R)
```

`DIFFERENCE`

¶

The difference of two relations \(L\) and \(R\) is defined as:

that is, tuples that are in \(L\) and not in \(R\).

In stupidb this is:

```
>>> L >> difference(R)
```

## Aggregations¶

StupiDB is focused on creating the right abstractions. Aggregations are no exception. To that end there is really one goal:

**Easy creation of custom aggregates, including window functions.**

The UD(A)F interface is heavily inspired by SQLite’s aggregate function interface, so there isn’t anything new here with respect to the API.