Lab note #057 Effects and R1
I'm a little indisposed this week, but I did find some time here and there to do research into effects. I spent the time reading and watching a talk about effects by Oleg Kiselyov. The short version is that side-effects can be formulated as a communication protocol between a program and its context.
While I watched the talk and read the post, I conversed with GPT-4o any questions I had. I've found this to be a great way to learn about subjects that are just slightly out of reach. It was really helpful for looking up various jargon and interpreting and read various details about the sample code. Lastly, it was great for extracting the core idea. I find impregnable texts usually have a single core simple idea, once you get over the jargon.
It's not the first time I've conversed with GPT to understand papers and posts, but I've never produced an artifact for the effort. This time, I asked GPT to summarize our conversation and write an outline of the key insights. And based on the outline, I asked it to write a blog post. I figured that other people would gain from a clarified explanation of neat ideas just out of reach. I've marked these kinds of posts with a callout box at the top, so any readers will know that I didn't write the post, but that it stems from a Socratic conversation I had while learning about the topic. The blog post will go up tomorrow morning.
This week has also been a bunch of hubbubs about Deepseek's R1. I think it reinforces the fact that companies in the model layer really have no moat, and everyone's scrambling to find alpha. In the meantime, it's the "GPT wrappers" that benefit from all the competition from the models and building out their own moats. So what's the deal with R1? I'll let others talk about it. Here are some collected opinions.
- 2501.12948 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Effic
- Explainer: What's R1 & Everything Else? - Tim Kellogg
- Edward Kmett on X: "Since DeepSeek-V3-Base dropped on Christmas, a lot of folks have asked me why the models that DeepSeek builds are so good, why they were able to train it so cheaply and what this means. With DeepSeek-R1 and its distillations performing as well as they are, it is worth noting"
- Andrej Karpathy on X: "I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed"
- The Illustrated Deepseek