You can kinda think of stepping as “React for backends”.

In ye olde JQuery days, you would:

  • Render the page.
  • Set up a load of listeners that twiddled with the elements on the rest of the page.

This often turned into a spaghetti mess, as a change of input might need to update many other components and you had competing callbacks happening asynchronously.

React (in theory at least) solved this by allowing you to do:

  • Render the page as f(state)
  • <input onInput=mutateState(...) >
  • Re-render the page by recomputing f(state)

Deciding which bits of the page to twiddle is taken care of by React.

Using stepping involves a similar shift, but on the backend, in this case from twiddling with cached data in the database to declaratively describing outputs = f(inputs) and letting stepping handle efficient updates.



The Python backend you’re currently building probably has a really simple “interview-question” version along the lines of:


def process_data(
    inputs: list[Input],
    t: int
) -> Output:

    output = f(inputs[:t])
    return output

Where t is time and the Output is the state of the system considering all the inputs up to and including that time.

There’s many reasons why your production system has more complexity – often, computing process_data(...) at request-time would be prohibitively expensive – if you squint, a lot of backend code exists to surmount this problem by writing to various caches.

stepping’s aim is to try and let you write your production system something closer to process_data(...) – you describe a rich, declarative function of all your inputs, feed it changes, and it tells you what changed in the output.

There are some example applications here.

Incremental View Maintenance

In most SQL dbs there are two ways of declaratively describing outputs = f(inputs), each with different pros and cons:

  • VIEWs – the output is always up to date, but can be slow to SELECT from as the data has to be recomputed each query.
  • MATERIALIZED VIEWsSELECT is quick because the data has been precomputed, but the data is only as fresh as the most recent REFRESH MATERIALIZED VIEW (which might itself be an expensive operation).

Incremental View Maintenance is an attempt to have one’s cake and eat it - fresh data, quickly.

Existing Incremental View Maintenance software

There are numerous existing pieces of Incremental View Maintenance software, notably:

Jamie Brandon has written a nice taxonomy of them.

Then why write stepping?

The niche stepping tries to sit in is:

  • Less focus on big-data pipelines, more focus on application development.
  • Allows describing the computation in Python not SQL.
  • Can sit next to existing applications, potentially sharing Postgres databases/transactions.
  • Provide an educational example of DBSP - about 3000 lines of pure Python at time of writing.

What about Event Sourcing?

Event Sourcing has many meanings depending on who you speak to. For example: the classic Martin Fowler definition, Martin Kleppmann’s influential talk.

In practice, these systems often amount to many services broadcasting changes to each other over message buses. This can lead to a some problems:

  • The developer ergonomics can be bad.
  • Replaying messages on changes to the code (and the downstream ramifications) are often an afterthought. (stepping will in future try to tackle this in an opinionated manner with stepping manager).
  • Often no easy way to express JOINs/GROUP BYs.
  • Shifting all the messages around over the wire often incurs significant performance cost – see below.

Should I use stepping?

Probably not, at least right now:

  • For most applications, storing all the data in normalised form with suitable indexes, then computing everything at request-time in a single thread will outperform stepping (or any other Event Sourcing approach for that matter). Profile your code!
  • stepping is currently very much in Alpha form.