Lab note #070 Vibe-coding is half a skill issue

Lab note #070 Vibe-coding is half a skill issue

I've been doing an implementation of DBSP, which is a way of doing incremental computation built up from some basic concepts from digital signal processing. While I use LLMs in my daily work to ask it questions, spitball with it, and do some basic stuff, I hardly ever ask it to explicitly write code for me.

But the fact that some people (not just beginners, but programmers that are better than me) have had success with it made me wonder if I could also leverage it to speed me up. However, the summary from the last three weeks was one of struggle with getting AI to write code for me.

I had written the DBSP implementation and got to a point where I realized it was much better if the description of the circuit was separated from the state of the circuit during execution. So the task was a refactor to separate these two out. Codex had just come out, so I thought I'd give it a shot.

The short version of the story is that it simply didn't work with the way I approached it. I would write what I wanted in the agent CLI, and then tried to review each change that the agent was making. It didn't matter if I switched to Claude Code, or if I wrote out a codex.md/CLAUDE.md file to give it overarching guidelines. Even if it got to a point where the code worked, it was just slightly wrong in subtle ways that didn't take into account where I was going with it later on.

Are the great results limited to certain domains? Was it the nature of the work I was doing? Was there not enough training data for reactive and incremental code? Or was this a skill issue on my part? Do I not know how to prompt or scaffold it? The entire experience was not only frustrating, but also demoralizing. I'd lose days just trying to get it to work, with nothing to show for it and $100 in the hole.

What did finally work is what I heard someone mention before: writing product description docs (PRD). Except that I wouldn't write it myself. Originally, when I heard about doing PRDs, I thought it'd be faster if I'd just implement things myself. But for some reason, I didn't think about just getting LLMs to write it for me. Here's what I did.

  • I'd first spitball with o3 on the design of the next thing I wanted to implement or fix.
  • And then after I was satisfied with all the design decisions that we made, I'd have it write me a PRD.
  • Then I'd make sure I was starting a fresh commit in the repo, so I can come back to it if I need to.
  • Then I'd paste the PRD into a new chat in an agent on its own repo or gittree branch and just let it go to town with auto-accept all the way until it was done.
  • Then I'd review the code and run it against my tests (or ask it to fix any bugs that failed the test, making sure it didn't change tests to pass them).
  • If it didn't work after three tries, I'd just blow the whole implementation away and try again fresh.
  • If it worked, I'd checkpoint it by checking it into version control.
  • Finally, if it almost worked, but it's a little bit off, I'd correct it myself, rather than prompting it again, since it might break other things.

The workflow is a little odd, since I don't usually work in large companies, so I never had to do this sort of thing. But I believe it's a sort of skill issue now. Also, there are some devs out there that are willing to just try and steer it again and again, which builds their intuition for what works and what doesn't.

This reminds me of when Sri and I were doing the Technium Podcast in 2022, and we also had the idea of turning the transcript into social media posts. But we found it lackluster and lacking back in the GPT-3 days, and we believed that it simply wasn't good enough yet. I had no idea that RAG existed, and I didn't push the prompting either. And guess what? spiral.computer is just that exact idea.

On the other hand, in my work doing illustrations for forestfriends.tech, I'd spend 8 hours a day for two or three weeks just generating hundreds of illustrations. By the end, I got a pretty good idea of what would work and what wouldn't.

My point is that we didn't push prompting hard enough to get it to work, and I'm probably not pushing prompting hard enough to get vibe coding to work for me. That said, I don't think people should rely on it completely. There are some things where it's just faster (and more fun) for you to implement yourself.

And I think Jensen Huang is wrong about not needing engineers in the future. It seems like he does more press tours than running a company nowadays. For things that I have taste in, such as programming, there is still a wide gap between what LLMs can do and what I can do. I'm sure it'll get better, but I think we'll always need Geordi LaForges in the future. "Computer" can't do it by itself, and Data is on the bridge.

So my takeaways:

  1. Have a reasoning model rubberduck with you and then have it write the PRD prompt as a result of the conversation. It knows how to prompt itself better than you do.
  2. Don't be afraid to blow the results away and start over when it's not working.
  3. If it got most of the way, just check point it, and follow up with corrections later.
  4. Don't abdicate your seat at the thinking table. The moment you do, it's a spiral of despair. You'll spend money on tokens and time to goal with nothing to show for it.

So I think it's half and half. I couldn't get vibe-coding to work for me before is because half of it is a skill issue, and half of it is that LLMs aren't very good at the domain I'm working in, and need a lot of scaffolding to get descent results on a task with very limited scope.

Finally, I found a thoughtful piece this prompting stuff, and why career developers have such a disparate experience with vibe-coding.: The gap through which we praise the machine.


I've not been reading as much since I've just been focused on the DBSP implementation. But no worries, I have a backlog.

Read more