Usage

API

StupiDB’s user facing API is heavily inspired by dplyr.

Constructing a Relation

You can construct a relation (a table) by calling the table function with a list of mappings

>>> from stupidb import *
>>> from datetime import date, timedelta
>>> today = date(2019, 2, 9)
>>> days = timedelta(days=1)
>>> rows = [
...     {"name": "Alice", "balance": 400, "date": today},
...     {"name": "Alice", "balance": 300, "date": today + 1 * days},
...     {"name": "Alice", "balance": 100, "date": today + 2 * days},
...     {"name": "Bob", "balance": -150, "date": today - 4 * days},
...     {"name": "Bob", "balance": 200, "date": today - 3 * days},
... ]
>>> t = table(rows)
>>> t
name      balance  date
------  ---------  ----------
Alice         400  2019-02-09
Alice         300  2019-02-10
Alice         100  2019-02-11
Bob          -150  2019-02-05
Bob           200  2019-02-06

Since every Relation in StupiDB implements the iterator protocol (see stupidb.stupidb.Relation.__iter__()), you can

materialize the rows of a relation by calling list on the relation.

Note

The Row objects that make up the elements of the list above are a very thin layer on top of dict, allowing two things:

  • Column access by attribute

  • User friendly handling of ambiguous column naming in Join relations.

One design goal of StupiDB is that it allows you to use any mapping you want to to represent a row, as long as it conforms to the typing.Mapping[str, T] interface (where T is an instance of typing.TypeVar).

Operations on Relations

StupiDB provides standard operations over relations:

  1. Projection (column selection, SELECT, select()).

  2. Selection (row filtering, WHERE, sift()).

  3. Simple aggregation, using aggregate().

  4. Window functions (including standard aggregate functions, and lead(), lag(), etc.).

  5. Group by (aggregate by a key, GROUP BY, group_by())

  6. Order by (sorting a relation by one or more columns, ORDER BY, order_by())

  7. Join (match rows in one table to another, INNER JOIN, LEFT JOIN, etc., e.g., left_join())

  8. Set operations (UNION [ALL], INTERSECT [ALL], EXCEPT [ALL], using union(), union_all(), intersect(), intersect_all(), difference(), difference_all())

We’ll briefly describe each of these in turn and and show how to use them in the stupidest way.

Projection (SELECT)

>>> name_and_bal = (
...     table(rows) >> select(n=lambda r: r.name, b=lambda r: r.balance)
... )
>>> bal_times_2 = name_and_bal >> mutate(bal2=lambda r: r.b * 2)
>>> bal_times_2
n         b    bal2
-----  ----  ------
Alice   400     800
Alice   300     600
Alice   100     200
Bob    -150    -300
Bob     200     400

The mutate() function preserves the child table in the result, while select() does not.

Selection (WHERE)

Filtering rows is done with the sift() function.

>>> alice = table(rows) >> sift(lambda r: r.name == "Alice")
>>> alice
name      balance  date
------  ---------  ----------
Alice         400  2019-02-09
Alice         300  2019-02-10
Alice         100  2019-02-11

Simple Aggregation

>>> agg = table(rows) >> aggregate(
...     my_sum=sum(lambda r: r.balance),
...     my_avg=mean(lambda r: r.balance)
... )
>>> agg
  my_sum    my_avg
--------  --------
     850       170

GROUP BY

>>> gb = (
...     table(rows) >> group_by(name=lambda r: r.name)
...                 >> aggregate(bal_over_time=sum(lambda r: r.balance))
... )
>>> gb
name      bal_over_time
------  ---------------
Alice               800
Bob                  50

ORDER BY

To sort in ascending order of the specified columns:

>>> ob = table(rows) >> order_by(lambda r: r.name, lambda r: r.date)
>>> ob
name      balance  date
------  ---------  ----------
Alice         400  2019-02-09
Alice         300  2019-02-10
Alice         100  2019-02-11
Bob          -150  2019-02-05
Bob           200  2019-02-06

Currently there is no convenient way to sort descending if your order by values are not numeric.

Joins

CROSS JOIN

For two relations \(L\) and \(R\), the cross join, denoted \(\times\), is defined as:

\[L\times{R} = \left\{l \cup r \mid l \in L\mbox{ and }r \in R\right\}\]

It’s worth noting that all joins can be defined as variations and filters on a cross join.

In stupidb this is:

>>> L >> cross_join(R)

INNER JOIN

Given the definition of a cross join and two relations \(L\) and \(R\) and a predicate \(p\left(l, r\right)\rightarrow\mbox{bool}\), which is a function that takes a tuple \(l\in{L}\) and a tuple \(r\in{R}\) the inner join is defined as:

\[\left\{l\cup{r}\mid l\in{L}\mbox{ and }r\in{R}\mbox{ if }p\left(l, r\right)\right\}\]

In stupidb this is:

>>> L >> inner_join(R, lambda left, right: left.name == right.name)

LEFT JOIN

The left join is the set of rows from an inner join of two relations, plus the rows from the left relation that are not in the inner join, substituting NULL values for those attributes that are missing in the inner join.

In stupidb this is:

>>> L >> left_join(R, lambda left, right: left.name == right.name)

RIGHT JOIN

The right join follows the same logic as the left join, with the tables reversed.

In stupidb this is:

>>> L >> right_join(R, lambda left, right: left.name == right.name)

Set Operations

UNION

The union of two relations \(L\) and \(R\) is defined as:

\[L\cup{R}\]

that is, tuples that are in either \(L\) or \(R\).

In stupidb this is:

>>> L >> union(R)

INTERSECT

The intersection of two relations \(L\) and \(R\) is defined as:

\[L\cap{R}\]

that is, tuples that are in both \(L\) and \(R\).

In stupidb this is:

>>> L >> intersect(R)

DIFFERENCE

The difference of two relations \(L\) and \(R\) is defined as:

\[L - R\]

that is, tuples that are in \(L\) and not in \(R\).

In stupidb this is:

>>> L >> difference(R)

Aggregations

StupiDB is focused on creating the right abstractions. Aggregations are no exception. To that end there is really one goal:

Easy creation of custom aggregates, including window functions.

The UD(A)F interface is heavily inspired by SQLite’s aggregate function interface, so there isn’t anything new here with respect to the API.