stupidb.functions.associative

Segment tree and corresponding aggregate function implementations.

The segment tree implementation is based on Leis, 2015

This segment tree implementation uses AssociativeAggregate instances as its nodes. The leaves of the tree are computed by calling the step() method once when the tree is initialized. The fanout of the tree is adjustable.

From the leaves, the traversal continues breadth-first and bottom-up during which each interior node’s current aggregation state is combined with those of its children by calling the combine() method. This method takes another instance of the same aggregation as input and combines the calling instance’s aggregation state with the input instance’s aggregation state.

Each interior node therefore contains the combined aggregation value of all of its children. This makes it possible to compute a range query in \(O\left(\log{N}\right)\) time rather than \(O\left(N\right)\).

Here’s an example of a segment tree for the Sum aggregation, constructed with the following leaves and a fanout of 2:

[1, 2, 3, 4, 5, 6, 7, 8]

Blue indicates that a node was just aggregated into its parent, and red indicates a node that contains the aggregate value of all of its children.

../_images/main.gif

Note

You can generate this GIF yourself by typing:

$ python -m stupidb.associative.animate > segment_tree.gif

at the command line, after you’ve installed stupidb. The code used to generate the GIF lives in stupidb.associative.animate.

Using segment trees in this way results in window aggregations having \(O\left(N\log{N}\right)\) worst case behavior rather than \(O\left(N^{2}\right)\), which is the complexity of naive window aggregation implementations.

A previous iteration of stupidb had \(O\left(N\right)\) worst case behavior for some aggregations such as those based on sum. The segment tree implementation provides a generic solution for any associative aggregate, including min and max as well as the typical sum based aggregations, that gives a worst case runtime complexity of \(O\left(N\log{N}\right)\).

A future iteration might determine the aggregation algorithm based on the specific aggregate to achieve optimal behavior from all aggregates.

Classes

Count()

Count column values.

Covariance(*, ddof)

Base class modeling the covariance of two columns.

Max()

Maximum of column values.

Mean()

Average values in a column.

Min()

Minimum of column values.

MinMax(*, comparator)

Base class modeling min/max order statistics.

PopulationCovariance()

Population covariance of two columns.

PopulationStandardDeviation()

Population standard deviation of a column.

PopulationVariance()

Population variance of a column.

SampleCovariance()

Sample covariance of two columns.

SampleStandardDeviation()

Sample standard deviation of a column.

SampleVariance()

Sample variance of a column.

StandardDeviation(ddof)

Base class modeling the standard deviation of a column.

Sum()

Sum column values, ignoring nulls.

Total()

Sum column values, preserving nulls.

Variance(ddof)

Base class modeling the variance of a column.

class stupidb.functions.associative.Count[source]

Count column values.

combine(other)[source]

Combine two Count instances.

Return type

None

finalize()[source]

Return the count.

step(input1)[source]

Add one to the count if input1 is not None.

class stupidb.functions.associative.Covariance(*, ddof)[source]

Base class modeling the covariance of two columns.

combine(other)[source]

Combine two BinaryAssociativeAggregate instances.

Return type

None

finalize()[source]

Compute the value of the aggregation from its current state.

step(x, y)[source]

Perform a single step of the aggregation.

class stupidb.functions.associative.Max[source]

Maximum of column values.

class stupidb.functions.associative.Mean[source]

Average values in a column.

finalize()[source]

Compute the value of the aggregation from its current state.

class stupidb.functions.associative.Min[source]

Minimum of column values.

class stupidb.functions.associative.MinMax(*, comparator)[source]

Base class modeling min/max order statistics.

combine(other)[source]

Combine two UnaryAssociativeAggregate instances.

Return type

None

finalize()[source]

Compute the value of the aggregation from its current state.

step(input1)[source]

Perform a single step of the aggregation.

class stupidb.functions.associative.PopulationCovariance[source]

Population covariance of two columns.

class stupidb.functions.associative.PopulationStandardDeviation[source]

Population standard deviation of a column.

class stupidb.functions.associative.PopulationVariance[source]

Population variance of a column.

class stupidb.functions.associative.SampleCovariance[source]

Sample covariance of two columns.

class stupidb.functions.associative.SampleStandardDeviation[source]

Sample standard deviation of a column.

class stupidb.functions.associative.SampleVariance[source]

Sample variance of a column.

class stupidb.functions.associative.StandardDeviation(ddof)[source]

Base class modeling the standard deviation of a column.

finalize()[source]

Compute the value of the aggregation from its current state.

class stupidb.functions.associative.Sum[source]

Sum column values, ignoring nulls.

combine(other)[source]

Combine two UnaryAssociativeAggregate instances.

Return type

None

finalize()[source]

Compute the value of the aggregation from its current state.

step(input1)[source]

Perform a single step of the aggregation.

class stupidb.functions.associative.Total[source]

Sum column values, preserving nulls.

finalize()[source]

Compute the value of the aggregation from its current state.

class stupidb.functions.associative.Variance(ddof)[source]

Base class modeling the variance of a column.

combine(other)[source]

Combine two UnaryAssociativeAggregate instances.

Return type

None

finalize()[source]

Compute the value of the aggregation from its current state.

step(x)[source]

Perform a single step of the aggregation.

Modules

stupidb.functions.associative.core