Lab note #052 Epx for everything

Lab note #052 Epx for everything

This past week, I worked through more confusion on my part.

  • Consolidated the representation of an algebraic effect with the React-like components (Fibers)
  • Worked out a solution for resuming fibers after getting a result back from from the algebraic effect. (Then block)
  • Use a data description of all cells (Epx)

I'll run through a brief of each, but first some nomenclature.

Naming things

Sometimes, part of resolving confusion is just figuring out what to name something. Once you know what something is, the name becomes more apparent.

This is a Cell, which describes a part of a computational DAG and pipeline. This particular component fetches data from an API and yields the result in one of its composite cells. It also composes other cells, such as ones that fetches the sunrise/sunset times, as well as a cell that fetches Chuck Norris jokes. If you've done any amount of React with hooks, this should look familiar to you.

@Epx
def CompositeEffectCell(*_children, **_args):
    catfacts = yield from useFetchJson("https://cat-fact.herokuapp.com/facts")

    return Fragment(
        SunRiseAndSetCell(),
        ChuckNorrisJokeCell(),
        Yield(
            f"fact: {catfacts[0]['text']}"
        )
    )
  • Cells: These are the computable blocks inside of a computational notebook. However, working at the level of a runtime, a cell is akin to a React component. It's a declarative and functional description of a part of a computation that needs to take place.
  • Epx: Cells, when called in Python, don't actually execute. They return Epx, which is a description of what is to be run. I ended up calling this representation Epx, for "expression" but also as a nod to Jsx.Element.
  • Hook: Like React, we have hooks. But they're implemented as algebraic effects. When a hook is called, we're actually raising an algebraic effect. When the handler returns a value, that value resumes where the algebraic effect was raised. A raised algebraic effect is also sent to the runtime as Epx.
  • Functional Cells: These cells don't use hooks and hence never yield. That means when evaluated, they immediately return the Epx representing the cell's composition.
  • Generator Cells: These are cells that raise algebraic effects using generators. That means they don't immediately return the Epx representing the cell like Functional Cells. Instead they yield Epx representing raised algebraic effect until they finally represent the Epx for the cell. Because Python doesn't have true continuations, we're currently limited with single-shot resumption.
  • Fiber: This is a unified unit of work that the runtime uses to evaluate the computation. Each Epx is attached to a fiber, and it's the fiber that keeps the state across render frames.
  • Render Frame: The runtime runs at a certain rate (target 60fps), in which the scheduler evaluates as many fibers it can do within 1/60th of a second.

Consolidated unit of work

I originally had two separate units of work for algebraic effects and the cell evaluation. I unified them into the same unit of work, borrowing a term of use from React and called it a Fiber.

I'm breaking up the computation to be done into bits and pieces, so I can have detailed control of what to execute and when. But why not execute Epx directly, instead of wrapping it in a fiber? An Epx is immutable, whereas the fiber will hold state. I anticipate useState hooks to need to store state across render frames in the execution of a cell's Epx.

Consolidating the system to a single unit of work makes it simpler to tell what's going on in an execution trace.

Resumption after algebraic effect

I mentioned earlier that after an algebraic effect is raised, the effect handler handles the effect and returns the result. Algebraic effects are represented by an Epx, which also translates to a fiber. This fiber is sent to the scheduler queue to wait in line for evaluation.

That also means in that we can't await the result of a algebraic effect eval while we're evaluating a generator cell , because otherwise, that would block the runtime's iteration over the queue of fibers.

I needed some way for fibers to refer to each other. Currently, they don't reference each other, and their only relationships are indirect, through the composition of the Epx inside of a fiber. I figured I'll need to do some kind of linking between fibers to mirror the parent/child relationships in the Epx.

I was a little stumped, until I remembered you don't have to await a promise. I could just use the `then()` block after a promise to resume the result from the handler and resume the fiber.

I'll eventually need to relate the different fibers, but for now, this works as a workaround.

Cells always return Epx

The return value for a cell is always a nested composite of cells, just like in React. I decided not to go with the angle brackets syntax to keep from needing a preprocessor. I went with a nested function DSL, like this:

@Epx
def SomeCell(*children, **args):
    return Fragment(
    	*(children + (
          SomeOtherCell(foo = "sun"),
          YetAnotherCell(bar = "moon"))
        )
    )

Implementation of this DSL is such that the functions are executed, returning its Epx. However, not all of the cells are functions. Some of them could be generators, and they need to be driven to return Epx. And even then, it's only the final Epx of a generator cell that represents it. All the Epx yield in the middle are algebraic effects. In addition, children of a cell could sometimes be lambdas. All this variation meant that the runtime had to handle all these different cases, and it hand to do that across the Javascript Runtime/Python Kernel boundary.

While I found that I could send Python generators to the Javascript side and drive them from there, I thought it was better to only send Epx across this boundary. Why not use a thunk and a generator instead?

Epx is only data, and being only data, it gives me the flexibility to have remote kernels, because I wouldn't be able to send functions across a network. In addition, it simplified the runtime to have a unified way to handle expressions. It also gave me a way to represent delayed computation for all cells, rather than only having delayed computation for generator cells.

To achieve this DSL, I used the @Epx decorator using the trick I learned about using it to "quote" a function. The implementation for @Epx doesn't use thunks, but returns Epx instead, but uses the same idea.

So far the progress is good, but I think I'm going too slow. Hopefully, but laying the ground work, some later stuff will go faster. But for now, there are other design issues to work through, so it will be at least another week of working through confusion.