stupidb.api

StupiDB user-facing API.

Note

The join functions all take right, predicate and then left as arguments, in that order.

This is intentional, and is the way the functions must be written to enable currying. Currying is the technique that allows us to use the right shift operator (>>) to chain operations.

Functions

aggregate(**aggregations)

Aggregate values from the child operator using aggregations.

const(x)

Return a function that returns x regardless of input.

count(x)

Count the number of non-NULL values of x.

cov_pop(x, y)

Compute the population covariance of two columns.

cov_samp(x, y)

Compute the sample covariance of two columns.

dense_rank()

Rank the rows of a relation based on the ordering key given in over.

exists(relation)

Compute whether any of the rows in relation are truthy.

first(x)

Compute the first row of x over a window.

get(name)

Return a function that gets the name field from a row.

group_by(**group_by)

Group the rows of the child operator according to group_by.

lag(x[, n, default])

Lag a column x by n rows, using default for NULL values.

last(x)

Compute the last row of x over a window.

lead(x[, n, default])

Lead a column x by n rows, using default for NULL values.

max(x)

Compute the maximum of a column.

mean(x)

Compute the average of a column.

min(x)

Compute the minimum of a column.

mutate(**mutators)

Add new columns specified by mutators.

nth(x, i)

Compute the i-th row of x over a window.

order_by(*order_by[, nulls])

Order the rows of the child operator according to order_by.

rank()

Rank the rows of a relation based on the ordering key given in over.

row_number()

Compute the row number over a window.

select(**projectors)

Subset or compute new columns from projectors.

stdev_pop(x)

Compute the population standard deviation of a column.

stdev_samp(x)

Compute the sample standard deviation of a column.

sum(x)

Compute the sum of x, with an empty column summing to NULL.

total(x)

Compute the sum of x, with an empty column summing to zero.

var_pop(x)

Compute the population variance of a column.

var_samp(x)

Compute the sample variance of a column.

Classes

shiftable(*args, **kwargs)

Shiftable curry.

stupidb.api.aggregate(**aggregations)[source]

Aggregate values from the child operator using aggregations.

Parameters

aggregations (AggregateSpecification) – A mapping from str column names to AggregateSpecification instances.

Examples

Compute the average of a column:

>>> from stupidb import aggregate, group_by, mean, table
>>> rows = [
...     dict(name="Bob", age=30, timezone="America/New_York"),
...     dict(name="Susan", age=20, timezone="America/New_York"),
...     dict(name="Joe", age=41, timezone="America/Los_Angeles"),
...     dict(name="Alice", age=39, timezone="America/Los_Angeles"),
... ]
>>> average_age = table(rows) >> aggregate(avg_age=mean(lambda r: r.age))
>>> average_age
  avg_age
---------
     32.5

Compute the average a column, grouped by another column:

>>> average_age_by_timezone = (
...     table(rows) >> group_by(tz=lambda r: r.timezone)
...                 >> aggregate(avg_age=mean(lambda r: r.age))
... )
>>> average_age_by_timezone
tz                     avg_age
-------------------  ---------
America/New_York            25
America/Los_Angeles         40

See also

group_by

Return type

Aggregation

stupidb.api.const(x)[source]

Return a function that returns x regardless of input.

stupidb.api.count(x)[source]

Count the number of non-NULL values of x.

Parameters

x – A column getter.

stupidb.api.cov_pop(x, y)[source]

Compute the population covariance of two columns.

Parameters
  • x – A column selector.

  • y – A column selector.

stupidb.api.cov_samp(x, y)[source]

Compute the sample covariance of two columns.

Parameters
  • x – A column selector.

  • y – A column selector.

stupidb.api.cross_join[source]

Return the Cartesian product of tuples from left and right.

Parameters
  • right – A relation

  • left – A relation

Examples

>>> from stupidb import cross_join, select, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> s = table(rows)
>>> crossed = cross_join(t, s)
>>> crossed >> select(
...     left_name=lambda row: row.left["name"],
...     left_balance=lambda row: row.left["balance"],
...     right_name=lambda row: row.right["name"],
...     right_balance=lambda row: row.right["balance"]
... )
left_name      left_balance  right_name      right_balance
-----------  --------------  ------------  ---------------
Bob                    -300  Bob                      -300
Bob                    -300  Bob                      -100
Bob                    -300  Alice                     400
Bob                    -300  Alice                     700
Bob                    -100  Bob                      -300
Bob                    -100  Bob                      -100
Bob                    -100  Alice                     400
Bob                    -100  Alice                     700
Alice                   400  Bob                      -300
Alice                   400  Bob                      -100
stupidb.api.dense_rank()[source]

Rank the rows of a relation based on the ordering key given in over.

Return type

AggregateSpecification

stupidb.api.difference[source]

Compute the set difference of left and right.

Parameters
  • right – A relation

  • left – A relation

stupidb.api.difference_all[source]

Compute the set difference of left and right, preserving duplicates.

Parameters
  • right – A relation

  • left – A relation

stupidb.api.exists(relation)[source]

Compute whether any of the rows in relation are truthy.

This is useful for computing semi-joins.

Return type

bool

stupidb.api.first(x)[source]

Compute the first row of x over a window.

Parameters

x – A column getter.

stupidb.api.full_join[source]

Full outer join.

stupidb.api.get(name)[source]

Return a function that gets the name field from a row.

stupidb.api.group_by(**group_by)[source]

Group the rows of the child operator according to group_by.

Parameters

group_by (Callable[[AbstractRow], Hashable]) – A mapping of str column names to functions that compute grouping keys.

Notes

Iterating over the rows of the result of this function is not very useful, since its __iter__() method just yields the rows of its child. A call to this function is best followed by a call to aggregate().

Examples

>>> from stupidb import aggregate, group_by, mean, table
>>> rows = [
...     dict(name="Bob", age=30, timezone="America/New_York"),
...     dict(name="Susan", age=20, timezone="America/New_York"),
...     dict(name="Joe", age=41, timezone="America/Los_Angeles"),
...     dict(name="Alice", age=39, timezone="America/Los_Angeles"),
... ]
>>> average_age_by_timezone = (
...     table(rows) >> group_by(tz=lambda r: r.timezone)
...                 >> aggregate(avg_age=mean(lambda r: r.age))
... )
>>> average_age_by_timezone
tz                     avg_age
-------------------  ---------
America/New_York            25
America/Los_Angeles         40

See also

aggregate

Return type

GroupBy

stupidb.api.inner_join[source]

Join left and right relations using predicate.

Drop rows if predicate returns False.

Parameters
  • right – A relation

  • predicate – A callable taking two arguments and returning a bool.

Examples

>>> from stupidb import inner_join, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> s = table(rows)
>>> t >> inner_join(
...     s,
...     lambda left, right: left["balance"] < right["balance"]
... ) >> select(name=lambda r: r.left["name"], bal=lambda r: r.left["balance"])
name      bal
------  -----
Bob      -300
Bob      -300
Bob      -300
Bob      -100
Bob      -100
Alice     400
stupidb.api.intersect[source]

Compute the intersection of left and right, ignoring duplicate rows.

Parameters
  • right – A relation

  • left – A relation

See also

intersect_all

stupidb.api.intersect_all[source]

Compute the intersection of left and right, preserving duplicates.

Parameters
  • right – A relation

  • left – A relation

See also

intersect

stupidb.api.lag(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]

Lag a column x by n rows, using default for NULL values.

Parameters
  • x – A column selector.

  • n – A callable computing the number of rows to lag. Defaults to a lag of 1 row. The callable takes the current row as input and thus the lag can be computed relative to the current row.

  • default – A callable computing the default value for the lag if the row would produce a NULL value when lagged. The callable takes the current row as input and thus the default can be computed relative to the current row.

stupidb.api.last(x)[source]

Compute the last row of x over a window.

Parameters

x – A column getter.

stupidb.api.lead(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]

Lead a column x by n rows, using default for NULL values.

Parameters
  • x – A column selector.

  • n – A callable computing the number of rows to lead. Defaults to a lead of 1 row. The callable takes the current row as input and thus the lead can be computed relative to the current row.

  • default – A callable computing the default value for the lead if the row would produce a NULL value when led. The callable takes the current row as input and thus the default can be computed relative to the current row.

stupidb.api.left_join[source]

Join left and right relations using predicate.

Drop rows if predicate returns False. Returns at least one of every row from left.

Parameters
  • right – A relation

  • predicate – A callable taking two arguments and returning a bool.

stupidb.api.limit[source]

Return the rows in relation starting from offset up to limit.

Parameters
  • limit – The number of rows starting from offset to produce

  • relation – Relation whose rows to limit

  • offset – The number of rows to skip before yielding

stupidb.api.max(x)[source]

Compute the maximum of a column.

Parameters

x – A column selector.

stupidb.api.mean(x)[source]

Compute the average of a column.

Parameters

x – A column selector.

stupidb.api.min(x)[source]

Compute the minimum of a column.

Parameters

x – A column selector.

stupidb.api.mutate(**mutators)[source]

Add new columns specified by mutators.

Parameters

projectors – A mapping from str to FullProjector instances.

Notes

Columns are appended, unlike select().

Examples

>>> from stupidb import mutate, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> rows = table(rows) >> mutate(lower_name=lambda r: r.name.lower())
>>> rows
name      balance  lower_name
------  ---------  ------------
Bob          -300  bob
Alice         400  alice
Bob          -100  bob
Alice         700  alice

See also

select

stupidb.api.nth(x, i)[source]

Compute the i-th row of x over a window.

Parameters
  • x – Column selector.

  • i – Callable to compute the row offset of the window to return.

stupidb.api.order_by(*order_by, nulls=Nulls.FIRST)[source]

Order the rows of the child operator according to order_by.

Parameters
  • order_by (Callable[[AbstractRow], Comparable[~T]]) – A sequence of OrderBy instances

  • nulls (Nulls) – One of Nulls indicating how to treat nulls when sorting. FIRST treats nulls as less than every other value, and LAST treats them as greater than every other value.

Examples

>>> from stupidb import order_by, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> ordered = table(rows) >> order_by(lambda r: r.balance)
>>> balances = [row.balance for row in ordered]
>>> balances
[-300, -100, 400, 700]
Return type

SortBy

stupidb.api.over[source]

Construct a window aggregate.

Parameters
  • window – A FrameClause instance constructed from rows or range.

  • child – The aggregation to compute over window

Notes

This is one of the few user-facing functions that does not return a Relation. The behavior of materializing the rows of the result of calling this function is undefined.

Examples

>>> from stupidb import Window, over, mean, select, table
>>> from datetime import date, timedelta
>>> today = date(2019, 2, 9)
>>> days = timedelta(days=1)
>>> rows = [
...     {"name": "Alice", "balance": 400, "date": today},
...     {"name": "Alice", "balance": 300, "date": today + 1 * days},
...     {"name": "Alice", "balance": 100, "date": today + 2 * days},
...     {"name": "Bob", "balance": -150, "date": today - 4 * days},
...     {"name": "Bob", "balance": 200, "date": today - 3 * days},
... ]
>>> t = table(rows)
>>> window = Window.range(
...     partition_by=[lambda r: r.name],
...     order_by=[lambda r: r.date],
...     preceding=lambda r: 2 * days  # two days behind + the current row
... )
>>> avg_balance_per_person = table(rows) >> select(
...     name=lambda r: r.name,
...     avg_balance=mean(lambda r: r.balance) >> over(window),
...     balance=lambda r: r.balance,
...     date=lambda r: r.date,
... ) >> order_by(lambda r: r.name, lambda r: r.date)
>>> avg_balance_per_person
name      balance  date          avg_balance
------  ---------  ----------  -------------
Alice         400  2019-02-09        400
Alice         300  2019-02-10        350
Alice         100  2019-02-11        266.667
Bob          -150  2019-02-05       -150
Bob           200  2019-02-06         25
stupidb.api.pretty[source]

Pretty-format a relation.

Parameters
  • rows – The relation to print

  • n – The number of rows to format; A value of None will pretty format all rows

  • tablefmt – The kind of table to use for formatting

  • headers – A string indicating how to compute column names

  • kwargs – Additional keyword arguments passed to the tabulate.tabulate function

Returns

Pretty-formatted relation

Return type

str

See also

stupidb.api.show

stupidb.api.rank()[source]

Rank the rows of a relation based on the ordering key given in over.

Return type

AggregateSpecification

stupidb.api.right_join[source]

Join left and right relations using predicate.

Drop rows if predicate returns False. Returns at least one of every row from right.

Parameters
  • right – A relation

  • predicate – A callable taking two arguments and returning a bool.

stupidb.api.row_number()[source]

Compute the row number over a window.

Return type

AggregateSpecification

stupidb.api.select(**projectors)[source]

Subset or compute new columns from projectors.

Parameters

projectors – A mapping from str to FullProjector instances.

Examples

>>> from stupidb import select, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> names = table(rows) >> select(lower_name=lambda r: r.name.lower())
>>> names = [row.lower_name for row in names]
>>> names
['bob', 'alice', 'bob', 'alice']

See also

mutate

stupidb.api.show[source]

Pretty-print a relation.

Parameters
  • rows – The relation to print

  • kwargs – Additional keyword arguments passed to the stupidb.api.pretty function

stupidb.api.sift[source]

Filter rows in child according to predicate.

Parameters

predicate – A callable of one argument taking an AbstractRow and returning a bool.

Examples

>>> from stupidb import sift, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> rows = table(rows) >> sift(lambda r: r.name.lower().startswith("a"))
>>> rows
name      balance
------  ---------
Alice         400
Alice         700
stupidb.api.stdev_pop(x)[source]

Compute the population standard deviation of a column.

Parameters

x – A column selector.

stupidb.api.stdev_samp(x)[source]

Compute the sample standard deviation of a column.

Parameters

x – A column selector.

stupidb.api.sum(x)[source]

Compute the sum of x, with an empty column summing to NULL.

Parameters

x – A column getter.

stupidb.api.table[source]

Construct a relation from an iterable of mappings.

Parameters

rows – An iterable of mappings whose keys are str instances.

Examples

>>> from stupidb import table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> t  
name      balance
------  ---------
Bob          -300
Bob          -100
Alice         400
Alice         700
stupidb.api.total(x)[source]

Compute the sum of x, with an empty column summing to zero.

Parameters

x – A column getter.

stupidb.api.union[source]

Compute the union of left and right, ignoring duplicate rows.

Parameters
  • right – A relation

  • left – A relation

See also

union_all

stupidb.api.union_all[source]

Compute the union of left and right, preserving duplicate rows.

Parameters
  • right – A relation

  • left – A relation

See also

union

stupidb.api.var_pop(x)[source]

Compute the population variance of a column.

Parameters

x (Callable[[AbstractRow], ~R]) – A column selector.

Return type

AggregateSpecification

stupidb.api.var_samp(x)[source]

Compute the sample variance of a column.

Parameters

x – A column selector.