stupidb.api¶
StupiDB user-facing API.
Note
The join functions all take right, predicate and then left as arguments, in that order.
This is intentional, and is the way the functions must be written to enable
currying. Currying is the
technique that allows us to use the right shift operator (>>
) to chain
operations.
Functions
|
Aggregate values from the child operator using aggregations. |
|
Return a function that returns x regardless of input. |
|
Count the number of non-NULL values of x. |
|
Compute the population covariance of two columns. |
|
Compute the sample covariance of two columns. |
Rank the rows of a relation based on the ordering key given in over. |
|
|
Compute whether any of the rows in relation are truthy. |
|
Compute the first row of x over a window. |
|
Return a function that gets the name field from a row. |
|
Group the rows of the child operator according to group_by. |
|
Lag a column x by n rows, using default for NULL values. |
|
Compute the last row of x over a window. |
|
Lead a column x by n rows, using default for NULL values. |
|
Compute the maximum of a column. |
|
Compute the average of a column. |
|
Compute the minimum of a column. |
|
Add new columns specified by mutators. |
|
Compute the i-th row of x over a window. |
|
Order the rows of the child operator according to order_by. |
|
Rank the rows of a relation based on the ordering key given in over. |
Compute the row number over a window. |
|
|
Subset or compute new columns from projectors. |
|
Compute the population standard deviation of a column. |
|
Compute the sample standard deviation of a column. |
|
Compute the sum of x, with an empty column summing to NULL. |
|
Compute the sum of x, with an empty column summing to zero. |
|
Compute the population variance of a column. |
|
Compute the sample variance of a column. |
Classes
|
Shiftable curry. |
- stupidb.api.aggregate(**aggregations)[source]¶
Aggregate values from the child operator using aggregations.
- Parameters
aggregations (
AggregateSpecification
) – A mapping fromstr
column names toAggregateSpecification
instances.
Examples
Compute the average of a column:
>>> from stupidb import aggregate, group_by, mean, table >>> rows = [ ... dict(name="Bob", age=30, timezone="America/New_York"), ... dict(name="Susan", age=20, timezone="America/New_York"), ... dict(name="Joe", age=41, timezone="America/Los_Angeles"), ... dict(name="Alice", age=39, timezone="America/Los_Angeles"), ... ] >>> average_age = table(rows) >> aggregate(avg_age=mean(lambda r: r.age)) >>> average_age avg_age --------- 32.5
Compute the average a column, grouped by another column:
>>> average_age_by_timezone = ( ... table(rows) >> group_by(tz=lambda r: r.timezone) ... >> aggregate(avg_age=mean(lambda r: r.age)) ... ) >>> average_age_by_timezone tz avg_age ------------------- --------- America/New_York 25 America/Los_Angeles 40
See also
- Return type
- stupidb.api.count(x)[source]¶
Count the number of non-NULL values of x.
- Parameters
x – A column getter.
- stupidb.api.cov_pop(x, y)[source]¶
Compute the population covariance of two columns.
- Parameters
x – A column selector.
y – A column selector.
- stupidb.api.cov_samp(x, y)[source]¶
Compute the sample covariance of two columns.
- Parameters
x – A column selector.
y – A column selector.
- stupidb.api.cross_join[source]¶
Return the Cartesian product of tuples from left and right.
- Parameters
right – A relation
left – A relation
Examples
>>> from stupidb import cross_join, select, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=400), ... dict(name="Alice", balance=700), ... ] >>> t = table(rows) >>> s = table(rows) >>> crossed = cross_join(t, s) >>> crossed >> select( ... left_name=lambda row: row.left["name"], ... left_balance=lambda row: row.left["balance"], ... right_name=lambda row: row.right["name"], ... right_balance=lambda row: row.right["balance"] ... ) left_name left_balance right_name right_balance ----------- -------------- ------------ --------------- Bob -300 Bob -300 Bob -300 Bob -100 Bob -300 Alice 400 Bob -300 Alice 700 Bob -100 Bob -300 Bob -100 Bob -100 Bob -100 Alice 400 Bob -100 Alice 700 Alice 400 Bob -300 Alice 400 Bob -100
- stupidb.api.dense_rank()[source]¶
Rank the rows of a relation based on the ordering key given in over.
- Return type
- stupidb.api.difference[source]¶
Compute the set difference of left and right.
- Parameters
right – A relation
left – A relation
- stupidb.api.difference_all[source]¶
Compute the set difference of left and right, preserving duplicates.
- Parameters
right – A relation
left – A relation
- stupidb.api.exists(relation)[source]¶
Compute whether any of the rows in relation are truthy.
This is useful for computing semi-joins.
- Return type
- stupidb.api.first(x)[source]¶
Compute the first row of x over a window.
- Parameters
x – A column getter.
- stupidb.api.group_by(**group_by)[source]¶
Group the rows of the child operator according to group_by.
- Parameters
group_by (
Callable
[[AbstractRow
],Hashable
]) – A mapping ofstr
column names to functions that compute grouping keys.
Notes
Iterating over the rows of the result of this function is not very useful, since its
__iter__()
method just yields the rows of its child. A call to this function is best followed by a call toaggregate()
.Examples
>>> from stupidb import aggregate, group_by, mean, table >>> rows = [ ... dict(name="Bob", age=30, timezone="America/New_York"), ... dict(name="Susan", age=20, timezone="America/New_York"), ... dict(name="Joe", age=41, timezone="America/Los_Angeles"), ... dict(name="Alice", age=39, timezone="America/Los_Angeles"), ... ] >>> average_age_by_timezone = ( ... table(rows) >> group_by(tz=lambda r: r.timezone) ... >> aggregate(avg_age=mean(lambda r: r.age)) ... ) >>> average_age_by_timezone tz avg_age ------------------- --------- America/New_York 25 America/Los_Angeles 40
See also
- Return type
- stupidb.api.inner_join[source]¶
Join left and right relations using predicate.
Drop rows if predicate returns
False
.- Parameters
right – A relation
predicate – A callable taking two arguments and returning a
bool
.
Examples
>>> from stupidb import inner_join, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=400), ... dict(name="Alice", balance=700), ... ] >>> t = table(rows) >>> s = table(rows) >>> t >> inner_join( ... s, ... lambda left, right: left["balance"] < right["balance"] ... ) >> select(name=lambda r: r.left["name"], bal=lambda r: r.left["balance"]) name bal ------ ----- Bob -300 Bob -300 Bob -300 Bob -100 Bob -100 Alice 400
- stupidb.api.intersect[source]¶
Compute the intersection of left and right, ignoring duplicate rows.
- Parameters
right – A relation
left – A relation
See also
- stupidb.api.intersect_all[source]¶
Compute the intersection of left and right, preserving duplicates.
- Parameters
right – A relation
left – A relation
See also
- stupidb.api.lag(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]¶
Lag a column x by n rows, using default for NULL values.
- Parameters
x – A column selector.
n – A callable computing the number of rows to lag. Defaults to a lag of 1 row. The callable takes the current row as input and thus the lag can be computed relative to the current row.
default – A callable computing the default value for the lag if the row would produce a NULL value when lagged. The callable takes the current row as input and thus the default can be computed relative to the current row.
- stupidb.api.last(x)[source]¶
Compute the last row of x over a window.
- Parameters
x – A column getter.
- stupidb.api.lead(x, n=<function const.<locals>.<lambda>>, default=<function const.<locals>.<lambda>>)[source]¶
Lead a column x by n rows, using default for NULL values.
- Parameters
x – A column selector.
n – A callable computing the number of rows to lead. Defaults to a lead of 1 row. The callable takes the current row as input and thus the lead can be computed relative to the current row.
default – A callable computing the default value for the lead if the row would produce a NULL value when led. The callable takes the current row as input and thus the default can be computed relative to the current row.
- stupidb.api.left_join[source]¶
Join left and right relations using predicate.
Drop rows if predicate returns
False
. Returns at least one of every row from left.- Parameters
right – A relation
predicate – A callable taking two arguments and returning a
bool
.
- stupidb.api.limit[source]¶
Return the rows in relation starting from offset up to limit.
- Parameters
limit – The number of rows starting from offset to produce
relation – Relation whose rows to limit
offset – The number of rows to skip before yielding
- stupidb.api.mutate(**mutators)[source]¶
Add new columns specified by mutators.
- Parameters
projectors – A mapping from
str
toFullProjector
instances.
Notes
Columns are appended, unlike
select()
.Examples
>>> from stupidb import mutate, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Alice", balance=400), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=700), ... ] >>> rows = table(rows) >> mutate(lower_name=lambda r: r.name.lower()) >>> rows name balance lower_name ------ --------- ------------ Bob -300 bob Alice 400 alice Bob -100 bob Alice 700 alice
See also
- stupidb.api.nth(x, i)[source]¶
Compute the i-th row of x over a window.
- Parameters
x – Column selector.
i – Callable to compute the row offset of the window to return.
- stupidb.api.order_by(*order_by, nulls=Nulls.FIRST)[source]¶
Order the rows of the child operator according to order_by.
- Parameters
order_by (
Callable
[[AbstractRow
],Comparable
[~T]]) – A sequence ofOrderBy
instancesnulls (
Nulls
) – One ofNulls
indicating how to treat nulls when sorting.FIRST
treats nulls as less than every other value, andLAST
treats them as greater than every other value.
Examples
>>> from stupidb import order_by, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Alice", balance=400), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=700), ... ] >>> ordered = table(rows) >> order_by(lambda r: r.balance) >>> balances = [row.balance for row in ordered] >>> balances [-300, -100, 400, 700]
- Return type
- stupidb.api.over[source]¶
Construct a window aggregate.
- Parameters
window – A
FrameClause
instance constructed fromrows
orrange
.child – The aggregation to compute over window
Notes
This is one of the few user-facing functions that does not return a
Relation
. The behavior of materializing the rows of the result of calling this function is undefined.Examples
>>> from stupidb import Window, over, mean, select, table >>> from datetime import date, timedelta >>> today = date(2019, 2, 9) >>> days = timedelta(days=1) >>> rows = [ ... {"name": "Alice", "balance": 400, "date": today}, ... {"name": "Alice", "balance": 300, "date": today + 1 * days}, ... {"name": "Alice", "balance": 100, "date": today + 2 * days}, ... {"name": "Bob", "balance": -150, "date": today - 4 * days}, ... {"name": "Bob", "balance": 200, "date": today - 3 * days}, ... ] >>> t = table(rows) >>> window = Window.range( ... partition_by=[lambda r: r.name], ... order_by=[lambda r: r.date], ... preceding=lambda r: 2 * days # two days behind + the current row ... ) >>> avg_balance_per_person = table(rows) >> select( ... name=lambda r: r.name, ... avg_balance=mean(lambda r: r.balance) >> over(window), ... balance=lambda r: r.balance, ... date=lambda r: r.date, ... ) >> order_by(lambda r: r.name, lambda r: r.date) >>> avg_balance_per_person name balance date avg_balance ------ --------- ---------- ------------- Alice 400 2019-02-09 400 Alice 300 2019-02-10 350 Alice 100 2019-02-11 266.667 Bob -150 2019-02-05 -150 Bob 200 2019-02-06 25
- stupidb.api.pretty[source]¶
Pretty-format a relation.
- Parameters
rows – The relation to print
n – The number of rows to format; A value of None will pretty format all rows
tablefmt – The kind of table to use for formatting
headers – A string indicating how to compute column names
kwargs – Additional keyword arguments passed to the tabulate.tabulate function
- Returns
Pretty-formatted relation
- Return type
See also
- stupidb.api.rank()[source]¶
Rank the rows of a relation based on the ordering key given in over.
- Return type
- stupidb.api.right_join[source]¶
Join left and right relations using predicate.
Drop rows if predicate returns
False
. Returns at least one of every row from right.- Parameters
right – A relation
predicate – A callable taking two arguments and returning a
bool
.
- stupidb.api.select(**projectors)[source]¶
Subset or compute new columns from projectors.
- Parameters
projectors – A mapping from
str
toFullProjector
instances.
Examples
>>> from stupidb import select, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Alice", balance=400), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=700), ... ] >>> names = table(rows) >> select(lower_name=lambda r: r.name.lower()) >>> names = [row.lower_name for row in names] >>> names ['bob', 'alice', 'bob', 'alice']
See also
- stupidb.api.show[source]¶
Pretty-print a relation.
- Parameters
rows – The relation to print
kwargs – Additional keyword arguments passed to the stupidb.api.pretty function
See also
- stupidb.api.sift[source]¶
Filter rows in child according to predicate.
- Parameters
predicate – A callable of one argument taking an
AbstractRow
and returning abool
.
Examples
>>> from stupidb import sift, table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Alice", balance=400), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=700), ... ] >>> rows = table(rows) >> sift(lambda r: r.name.lower().startswith("a")) >>> rows name balance ------ --------- Alice 400 Alice 700
- stupidb.api.stdev_pop(x)[source]¶
Compute the population standard deviation of a column.
- Parameters
x – A column selector.
- stupidb.api.stdev_samp(x)[source]¶
Compute the sample standard deviation of a column.
- Parameters
x – A column selector.
- stupidb.api.sum(x)[source]¶
Compute the sum of x, with an empty column summing to NULL.
- Parameters
x – A column getter.
- stupidb.api.table[source]¶
Construct a relation from an iterable of mappings.
- Parameters
rows – An iterable of mappings whose keys are
str
instances.
Examples
>>> from stupidb import table >>> rows = [ ... dict(name="Bob", balance=-300), ... dict(name="Bob", balance=-100), ... dict(name="Alice", balance=400), ... dict(name="Alice", balance=700), ... ] >>> t = table(rows) >>> t name balance ------ --------- Bob -300 Bob -100 Alice 400 Alice 700
- stupidb.api.total(x)[source]¶
Compute the sum of x, with an empty column summing to zero.
- Parameters
x – A column getter.
- stupidb.api.union[source]¶
Compute the union of left and right, ignoring duplicate rows.
- Parameters
right – A relation
left – A relation
See also
- stupidb.api.union_all[source]¶
Compute the union of left and right, preserving duplicate rows.
- Parameters
right – A relation
left – A relation
See also
- stupidb.api.var_pop(x)[source]¶
Compute the population variance of a column.
- Parameters
x (
Callable
[[AbstractRow
], ~R]) – A column selector.- Return type