stupidb.api¶

StupiDB user-facing API.

Note

The join functions all take right, predicate and then left as arguments, in that order.

This is intentional, and is the way the functions must be written to enable currying. Currying is the technique that allows us to use the right shift operator (>>) to chain operations.

Functions

`aggregate`(**aggregations)	Aggregate values from the child operator using aggregations.
`const`(x)	Return a function that returns x regardless of input.
`count`(x)	Count the number of non-NULL values of x.
`cov_pop`(x, y)	Compute the population covariance of two columns.
`cov_samp`(x, y)	Compute the sample covariance of two columns.
`dense_rank`()	Rank the rows of a relation based on the ordering key given in over.
`exists`(relation)	Compute whether any of the rows in relation are truthy.
`first`(x)	Compute the first row of x over a window.
`get`(name)	Return a function that gets the name field from a row.
`group_by`(**group_by)	Group the rows of the child operator according to group_by.
`lag`(x[, n, default])	Lag a column x by n rows, using default for NULL values.
`last`(x)	Compute the last row of x over a window.
`lead`(x[, n, default])	Lead a column x by n rows, using default for NULL values.
`max`(x)	Compute the maximum of a column.
`mean`(x)	Compute the average of a column.
`min`(x)	Compute the minimum of a column.
`mutate`(**mutators)	Add new columns specified by mutators.
`nth`(x, i)	Compute the i-th row of x over a window.
`order_by`(*order_by[, nulls])	Order the rows of the child operator according to order_by.
`rank`()	Rank the rows of a relation based on the ordering key given in over.
`row_number`()	Compute the row number over a window.
`select`(**projectors)	Subset or compute new columns from projectors.
`stdev_pop`(x)	Compute the population standard deviation of a column.
`stdev_samp`(x)	Compute the sample standard deviation of a column.
`sum`(x)	Compute the sum of x, with an empty column summing to NULL.
`total`(x)	Compute the sum of x, with an empty column summing to zero.
`var_pop`(x)	Compute the population variance of a column.
`var_samp`(x)	Compute the sample variance of a column.

Classes

shiftable(*args, **kwargs)

Shiftable curry.

stupidb.api.aggregate(**aggregations)[source]¶

Aggregate values from the child operator using aggregations.

Parameters: aggregations (AggregateSpecification) – A mapping from str column names to AggregateSpecification instances.

Examples

Compute the average of a column:

>>> from stupidb import aggregate, group_by, mean, table
>>> rows = [
...     dict(name="Bob", age=30, timezone="America/New_York"),
...     dict(name="Susan", age=20, timezone="America/New_York"),
...     dict(name="Joe", age=41, timezone="America/Los_Angeles"),
...     dict(name="Alice", age=39, timezone="America/Los_Angeles"),
... ]
>>> average_age = table(rows) >> aggregate(avg_age=mean(lambda r: r.age))
>>> average_age
  avg_age
---------
     32.5

Compute the average a column, grouped by another column:

>>> average_age_by_timezone = (
...     table(rows) >> group_by(tz=lambda r: r.timezone)
...                 >> aggregate(avg_age=mean(lambda r: r.age))
... )
>>> average_age_by_timezone
tz                     avg_age
-------------------  ---------
America/New_York            25
America/Los_Angeles         40

See also

group_by

Return type: Aggregation

stupidb.api.const(x)[source]¶: Return a function that returns x regardless of input.

stupidb.api.count(x)[source]¶

Count the number of non-NULL values of x.

Parameters: x – A column getter.

stupidb.api.cov_pop(x, y)[source]¶

Compute the population covariance of two columns.

Parameters

x – A column selector.
y – A column selector.

stupidb.api.cov_samp(x, y)[source]¶

Compute the sample covariance of two columns.

Parameters

x – A column selector.
y – A column selector.

stupidb.api.cross_join[source]¶

Return the Cartesian product of tuples from left and right.

Parameters

right – A relation
left – A relation

Examples

>>> from stupidb import cross_join, select, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> s = table(rows)
>>> crossed = cross_join(t, s)
>>> crossed >> select(
...     left_name=lambda row: row.left["name"],
...     left_balance=lambda row: row.left["balance"],
...     right_name=lambda row: row.right["name"],
...     right_balance=lambda row: row.right["balance"]
... )
left_name      left_balance  right_name      right_balance
-----------  --------------  ------------  ---------------
Bob                    -300  Bob                      -300
Bob                    -300  Bob                      -100
Bob                    -300  Alice                     400
Bob                    -300  Alice                     700
Bob                    -100  Bob                      -300
Bob                    -100  Bob                      -100
Bob                    -100  Alice                     400
Bob                    -100  Alice                     700
Alice                   400  Bob                      -300
Alice                   400  Bob                      -100

stupidb.api.dense_rank()[source]¶

Rank the rows of a relation based on the ordering key given in over.

Return type: AggregateSpecification

stupidb.api.difference[source]¶

Compute the set difference of left and right.

Parameters

right – A relation
left – A relation

stupidb.api.difference_all[source]¶

Compute the set difference of left and right, preserving duplicates.

Parameters

right – A relation
left – A relation

stupidb.api.exists(relation)[source]¶

Compute whether any of the rows in relation are truthy.

This is useful for computing semi-joins.

Return type: bool

stupidb.api.first(x)[source]¶

Compute the first row of x over a window.

Parameters: x – A column getter.

stupidb.api.full_join[source]¶: Full outer join.

stupidb.api.get(name)[source]¶: Return a function that gets the name field from a row.

stupidb.api.group_by(**group_by)[source]¶

Group the rows of the child operator according to group_by.

Parameters: group_by (Callable[[AbstractRow], Hashable]) – A mapping of str column names to functions that compute grouping keys.

Notes

Iterating over the rows of the result of this function is not very useful, since its __iter__() method just yields the rows of its child. A call to this function is best followed by a call to aggregate().

Examples

>>> from stupidb import aggregate, group_by, mean, table
>>> rows = [
...     dict(name="Bob", age=30, timezone="America/New_York"),
...     dict(name="Susan", age=20, timezone="America/New_York"),
...     dict(name="Joe", age=41, timezone="America/Los_Angeles"),
...     dict(name="Alice", age=39, timezone="America/Los_Angeles"),
... ]
>>> average_age_by_timezone = (
...     table(rows) >> group_by(tz=lambda r: r.timezone)
...                 >> aggregate(avg_age=mean(lambda r: r.age))
... )
>>> average_age_by_timezone
tz                     avg_age
-------------------  ---------
America/New_York            25
America/Los_Angeles         40

See also

aggregate

Return type: GroupBy

stupidb.api.inner_join[source]¶

Join left and right relations using predicate.

Drop rows if predicate returns False.

Parameters

right – A relation
predicate – A callable taking two arguments and returning a bool.

Examples

>>> from stupidb import inner_join, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> s = table(rows)
>>> t >> inner_join(
...     s,
...     lambda left, right: left["balance"] < right["balance"]
... ) >> select(name=lambda r: r.left["name"], bal=lambda r: r.left["balance"])
name      bal
------  -----
Bob      -300
Bob      -300
Bob      -300
Bob      -100
Bob      -100
Alice     400

stupidb.api.intersect[source]¶

Compute the intersection of left and right, ignoring duplicate rows.

Parameters

right – A relation
left – A relation

See also

intersect_all

stupidb.api.intersect_all[source]¶

Compute the intersection of left and right, preserving duplicates.

Parameters

right – A relation
left – A relation

See also

intersect

stupidb.api.lag(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]¶

Lag a column x by n rows, using default for NULL values.

Parameters

x – A column selector.
n – A callable computing the number of rows to lag. Defaults to a lag of 1 row. The callable takes the current row as input and thus the lag can be computed relative to the current row.
default – A callable computing the default value for the lag if the row would produce a NULL value when lagged. The callable takes the current row as input and thus the default can be computed relative to the current row.

stupidb.api.last(x)[source]¶

Compute the last row of x over a window.

Parameters: x – A column getter.

stupidb.api.lead(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]¶

Lead a column x by n rows, using default for NULL values.

Parameters

x – A column selector.
n – A callable computing the number of rows to lead. Defaults to a lead of 1 row. The callable takes the current row as input and thus the lead can be computed relative to the current row.
default – A callable computing the default value for the lead if the row would produce a NULL value when led. The callable takes the current row as input and thus the default can be computed relative to the current row.

stupidb.api.left_join[source]¶

Join left and right relations using predicate.

Drop rows if predicate returns False. Returns at least one of every row from left.

Parameters

right – A relation
predicate – A callable taking two arguments and returning a bool.

stupidb.api.limit[source]¶

Return the rows in relation starting from offset up to limit.

Parameters

limit – The number of rows starting from offset to produce
relation – Relation whose rows to limit
offset – The number of rows to skip before yielding

stupidb.api.max(x)[source]¶

Compute the maximum of a column.

Parameters: x – A column selector.

stupidb.api.mean(x)[source]¶

Compute the average of a column.

Parameters: x – A column selector.

stupidb.api.min(x)[source]¶

Compute the minimum of a column.

Parameters: x – A column selector.

stupidb.api.mutate(**mutators)[source]¶

Add new columns specified by mutators.

Parameters: projectors – A mapping from str to FullProjector instances.

Notes

Columns are appended, unlike select().

Examples

>>> from stupidb import mutate, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> rows = table(rows) >> mutate(lower_name=lambda r: r.name.lower())
>>> rows
name      balance  lower_name
------  ---------  ------------
Bob          -300  bob
Alice         400  alice
Bob          -100  bob
Alice         700  alice

See also

select

stupidb.api.nth(x, i)[source]¶

Compute the i-th row of x over a window.

Parameters

x – Column selector.
i – Callable to compute the row offset of the window to return.

stupidb.api.order_by(*order_by, nulls=Nulls.FIRST)[source]¶

Order the rows of the child operator according to order_by.

Parameters

order_by (Callable[[AbstractRow], Comparable[~T]]) – A sequence of OrderBy instances
nulls (Nulls) – One of Nulls indicating how to treat nulls when sorting. FIRST treats nulls as less than every other value, and LAST treats them as greater than every other value.

Examples

>>> from stupidb import order_by, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> ordered = table(rows) >> order_by(lambda r: r.balance)
>>> balances = [row.balance for row in ordered]
>>> balances
[-300, -100, 400, 700]

Return type: SortBy

stupidb.api.over[source]¶

Construct a window aggregate.

Parameters

window – A FrameClause instance constructed from rows or range.
child – The aggregation to compute over window

Notes

This is one of the few user-facing functions that does not return a Relation. The behavior of materializing the rows of the result of calling this function is undefined.

Examples

>>> from stupidb import Window, over, mean, select, table
>>> from datetime import date, timedelta
>>> today = date(2019, 2, 9)
>>> days = timedelta(days=1)
>>> rows = [
...     {"name": "Alice", "balance": 400, "date": today},
...     {"name": "Alice", "balance": 300, "date": today + 1 * days},
...     {"name": "Alice", "balance": 100, "date": today + 2 * days},
...     {"name": "Bob", "balance": -150, "date": today - 4 * days},
...     {"name": "Bob", "balance": 200, "date": today - 3 * days},
... ]
>>> t = table(rows)
>>> window = Window.range(
...     partition_by=[lambda r: r.name],
...     order_by=[lambda r: r.date],
...     preceding=lambda r: 2 * days  # two days behind + the current row
... )
>>> avg_balance_per_person = table(rows) >> select(
...     name=lambda r: r.name,
...     avg_balance=mean(lambda r: r.balance) >> over(window),
...     balance=lambda r: r.balance,
...     date=lambda r: r.date,
... ) >> order_by(lambda r: r.name, lambda r: r.date)
>>> avg_balance_per_person
name      balance  date          avg_balance
------  ---------  ----------  -------------
Alice         400  2019-02-09        400
Alice         300  2019-02-10        350
Alice         100  2019-02-11        266.667
Bob          -150  2019-02-05       -150
Bob           200  2019-02-06         25

stupidb.api.pretty[source]¶

Pretty-format a relation.

Parameters

rows – The relation to print
n – The number of rows to format; A value of None will pretty format all rows
tablefmt – The kind of table to use for formatting
headers – A string indicating how to compute column names
kwargs – Additional keyword arguments passed to the tabulate.tabulate function

Returns

Pretty-formatted relation

Return type

str

See also

stupidb.api.show

stupidb.api.rank()[source]¶

Rank the rows of a relation based on the ordering key given in over.

Return type: AggregateSpecification

stupidb.api.right_join[source]¶

Join left and right relations using predicate.

Drop rows if predicate returns False. Returns at least one of every row from right.

Parameters

right – A relation
predicate – A callable taking two arguments and returning a bool.

stupidb.api.row_number()[source]¶

Compute the row number over a window.

Return type: AggregateSpecification

stupidb.api.select(**projectors)[source]¶

Subset or compute new columns from projectors.

Parameters: projectors – A mapping from str to FullProjector instances.

Examples

>>> from stupidb import select, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> names = table(rows) >> select(lower_name=lambda r: r.name.lower())
>>> names = [row.lower_name for row in names]
>>> names
['bob', 'alice', 'bob', 'alice']

See also

mutate

stupidb.api.show[source]¶

Pretty-print a relation.

Parameters

rows – The relation to print
kwargs – Additional keyword arguments passed to the stupidb.api.pretty function

See also

stupidb.api.pretty

stupidb.api.sift[source]¶

Filter rows in child according to predicate.

Parameters: predicate – A callable of one argument taking an AbstractRow and returning a bool.

Examples

>>> from stupidb import sift, table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Alice", balance=400),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=700),
... ]
>>> rows = table(rows) >> sift(lambda r: r.name.lower().startswith("a"))
>>> rows
name      balance
------  ---------
Alice         400
Alice         700

stupidb.api.stdev_pop(x)[source]¶

Compute the population standard deviation of a column.

Parameters: x – A column selector.

stupidb.api.stdev_samp(x)[source]¶

Compute the sample standard deviation of a column.

Parameters: x – A column selector.

stupidb.api.sum(x)[source]¶

Compute the sum of x, with an empty column summing to NULL.

Parameters: x – A column getter.

stupidb.api.table[source]¶

Construct a relation from an iterable of mappings.

Parameters: rows – An iterable of mappings whose keys are str instances.

Examples

>>> from stupidb import table
>>> rows = [
...     dict(name="Bob", balance=-300),
...     dict(name="Bob", balance=-100),
...     dict(name="Alice", balance=400),
...     dict(name="Alice", balance=700),
... ]
>>> t = table(rows)
>>> t  
name      balance
------  ---------
Bob          -300
Bob          -100
Alice         400
Alice         700

stupidb.api.total(x)[source]¶

Compute the sum of x, with an empty column summing to zero.

Parameters: x – A column getter.

stupidb.api.union[source]¶

Compute the union of left and right, ignoring duplicate rows.

Parameters

right – A relation
left – A relation

See also

union_all

stupidb.api.union_all[source]¶

Compute the union of left and right, preserving duplicate rows.

Parameters

right – A relation
left – A relation

See also

union

stupidb.api.var_pop(x)[source]¶

Compute the population variance of a column.

Parameters: x (Callable[[AbstractRow], ~R]) – A column selector.
Return type: AggregateSpecification

stupidb.api.var_samp(x)[source]¶

Compute the sample variance of a column.

Parameters: x – A column selector.

stupidb.api¶

stupidb

Navigation

Related Topics