Lab note #069 Partial implementation and prompt prompting

Lab note #069 Partial implementation and prompt prompting

It's been two or three weeks since the last lab note. I've been on vacation with my family for two weeks, and then caught a cold this last week when I got back. I've been doing two things:

  • Continuing the implementation of DBSP
  • Trying to see if generating a prompt to vibe code would work.

A blurb on what DBSP is

The idea behind DBSP is to build a computational circuit that can be incrementalized. Hence, when the input changes a little, we can compute exactly how the output changes a little, rather than recomputing from scratch. DBSP differents from previous formulations of incrementalization.[1] Rather than requiring the user to define a differential version of every operator they use, DBSP borrows from Digital Signal Processing (DSP) and uses a delay to compute differentials.

From this basic delay along with the unary and binary lifting operator, we can compose more complicated operators. However, we must adhere to very specific constraints: the operators are linear time-invariant, the incrementalization is homomorphic over the circuit algebra, and the elements of the data stream are carriers of an abelian group. That's a lot to live up to! But if we can keep to those constraints, we can compose them together to build more complicated circuits like differentiators, integrals, and distinct.

Would the constraints only limit us to useless operators? Apparently not. The paper was able to build incrementalized versions of all the major parts of SQL. This means you can build incremental SQL queries with DBSP.

I've been able to build out the basic operators, as well as diff, integral and distinct. However, I had to do a lot of refactoring as my understanding of it matured. Right now, I'm pulling out any state from the description of the circuit, so that I can operate on pure values representing the computation. The talks on DBSP gloss over a compilation step to optimize the incremental circuit. While not strictly necessary, it does make it practical. So that's where I'm at the moment.

Prompting to generate prompts

This particular refactor touches a lot of things. Recently I was lamenting that I'd love to be able to vibe code this stuff, like Geordi La Forge, but my experience has been that LLMs just aren't very good if you're anywhere off the beaten path. And that's where I am currently: implementing new types of reactive or incremental code, sometimes from papers, in a functional style.

When I was building a sample AI app that you can chat with about Zig, I was able to one-shot the web interface. It was great! But trying to one-shot some of these off-the-beaten path refactors or implementations, it just wasn't very good. Currently, I can't tell if the capabilities aren't up to snuff, or if it's my prompting skills. Back when I was evaluating GPT-3, I had written it off because I wasn't aware of RAG techniques. So I'm always wondering if I'm doing something wrong. I'm willing to admit that there's a new way of working with code, and willing to try it out.

This time, I have an extensive CLAUDE.md/CODEX.md file with instructions and information about the current project. I also used o3 to write a prompt to give to OpenAI Codex and Claude Code.

As you can see, it's more extensive than the usual prompt to "do this" or "refactor that." I don't know if I need to format it more as a PRD, but this is more detailed than my usual prompting. I also found that I needed to restart it two or three times, when it started doing something I didn't think up and needed to update my prompt.

The results? Better than when I was doing interactive prompting. I gave them free rein to just run with any changes they wanted to do and run tests. However, none of them really made it to the finish line

Codex got through some of the refactor for basic components, but then got afraid of making changes without breaking tests. It simply kept talking about submitting PRs and wouldn't make patches, even though I told it I'd tip it and that it should just do it.

Claude actually made it through the refactor and got all the tests to pass. However, its taste for the separation between the description of the circuit and the evaluation is still not very good. They're quite tangled with each other. However, I think for easier tasks, I can probably almost treat it like a co-worker. Too bad it's 3x as expensive as Codex. But I'd rather pay more for something better to waste less time.

I haven't tried it out with Llama yet, as I suspect it'll do worse. But I can't wait til these models I can run on my machine get good enough to edit code, especially if I can run it on other machines. It'll be an accelerator, and the problem will then be how to coordinate the work. But until then, I think the overhead is currently too much...it's just that we're starting to see that it's possible.

So far, I find that writing out the prompt, even if it's generated, is more work than just trying to do it myself. I also don't enjoy having to babysit its output. And when I don't babysit its output every step of the way, I don't enjoy finding out subtle difference in API or mental model the LLM had that I need to correct later. It's hard to catch all of those in diffs that are too large or too small. I haven't yet figured out how to communicate the right size of a chunk to edit to the LLMs yet.

I burned about $40 of credits in total for both. If I actually got a usable changeset out of this, it'd be worth it. But given that I burned a lot of time and I have no changeset, that's pretty expensive.

The only value that I derive is seeing the output for some limited aspect of the design that was fuzzy to me. It was the whole reason for reaching for the LLM in the first place. It's just that its other decisions make it hard for it to proceed and complete the task.

Hence, I think the value for LLMs right now will go back to interpreting error messages and doing design sketches given the repo context, without expecting it to do code changed–with the exception of run-of-the-mill web code.

Lastly being sick this past week made me take a step back for a moment. The original problem is a way to incrementally deal with a collection with reactive functions. For that problem, I don't need DBSP. Diving into DBSP was certainly indulgent, but now I wonder if I should keep going if it doesn't serve my immediate goals. We'll see what happens this next week.


I haven't been sitting at my computer a lot these past two weeks, so I'll have a list of links next week.

[1] That I'm aware of. I haven't taken a survey of the landscape. I've only picked out what looks good and promising in search of a solution.

Read more