Now to our super interesting interview.
“Good developers are always looking to optimize their inner loop, but this is a new inner loop that everyone is still figuring out.”
I like this line from Michael Bolin, lead for open-source Codex at OpenAI. It captures a shift that is easy to miss if you only look at benchmarks and model releases. Before OpenAI, Bolin spent years building developer tools and infrastructure across Google and Meta, including work associated publicly with projects like Buck, Nuclide, and DotSlash, so he comes to this moment with a long view on how developer workflows actually change.
The popular narrative around AI coding tools is this: the model writes the code. But when you talk to the people building and using these systems every day, the story becomes more nuanced. The bottleneck is no longer just model capability. Increasingly, it is the environment around the model – the tools it can call, the constraints it operates under, the structure of the repository it navigates, and the feedback loops that let it improve. That emerging layer is often called harness engineering.
In our conversation, Michael walks through how this shift is playing out inside Codex: what the harness actually does, how coding agents are changing developer workflows, why repositories suddenly need to be more legible, and where the balance between model capability and harness design might ultimately land.
We also move to a more personal question that many developers are asking themselves: what does it mean to be a programmer when you are no longer typing most of the code? Bolin has spent two decades writing software, yet he describes this transition not as a loss but as a shift toward shaping systems and artifacts at a higher level – with agents accelerating experimentation, prototypes becoming cheaper, and engineering taste becoming more important than raw typing speed.
I really enjoyed learning from Michael.
Subscribe to our YouTube channel, or listen the interview on Spotify / Apple
We prepared a transcript for reference, but the full experience is in the video. And as always: like and comment. It helps us grow on YouTube and bring you more insights.
Ksenia:
Hello, everyone. I’m happy to have Michael Bolin today. He’s the lead for open-source Codex. Michael, thank you for joining me.
Michael:
Thank you, it’s great to be here.
Ksenia:
People often think the story of AI coding is just: the model writes code. But a lot of teams building agents say the real shift is designing the environment around the model. What side are you on?
Michael:
The model is going to dominate the experience, for sure. But we’ve found there’s still a lot of room for innovation in the harness. It’s not a pure research problem. For our team in particular, it’s been about the relationship between the engineering side and the research side – co-developing the agent together, making sure the harness lets the agent shine and do the best things it can do. Then giving the agent the right tooling, making sure that tooling gets used in training so that things are in-distribution when we ship it as a product.
Ksenia:
Let’s define the harness and why it’s become so important.
Michael:
Sure. The harness is what we sometimes call the agent loop – the bit that calls out to the model, samples it, and gives it context: here’s what I’m trying to do, here are the tools available to you, tell me what to do next. Then it gets a response from the model – often a tool call – that says: here’s the tool I want to call with these arguments, let me know what came back.
Sometimes these tools are pretty straightforward, like just run this executable and tell me what stdout was and what the exit code was. We’ve done a lot more experiments with more sophisticated tools for controlling the machine, for controlling the user’s laptop – more like an interactive terminal session rather than simple command shelling. Or it could say, do this web search, or various other things.
For Codex specifically, because it is primarily a coding agent and we care tremendously about security and sandboxing, a lot of what the harness does is take shell commands or computer-use commands from the model and ensure they run in a sandbox or under whatever policy the user has given the agent. There turns out to be a lot of complexity in that area. It’s critical that we not only expose all the intelligence of the model, but do it safely on the user’s machine.
Ksenia:
How do you handle safety when you’re open-sourcing Codex?
Michael:
You can actually see all of it because it’s in our repo. We do different things for each operating system. On macOS, there’s a technology called Seatbelt. On Linux, we use a collection of libraries – something called Bubblewrap, seccomp, and Landlock. On Windows, we’ve actually built our own sandbox. Some of these things, like Seatbelt, are part of macOS, so they’re not in the open-source repo – just how we call it. But our Windows sandbox code is in the open-source repo. We orchestrate all these calls to go through the sandbox in the appropriate way for each different tool call.
Ksenia:
So when people fork Codex, all the safety rules are baked in?
Michael:
Right – though I should clarify a detail. Safety and security get used interchangeably in AI, but they are subtly different. What I’m describing is more on the security side: yes, you can run this tool, but you can only read these folders or write to these folders, that sort of thing.
What most people in the industry would call safety is actually happening more on the backend – making sure the tool calls the model suggests in the first place are appropriate to run. From the harness’s perspective, it’s following orders in a certain sense: faithfully executing the tool calls. But the decisions about what tool calls are safe or appropriate to run are made by the model.
So if you forked Codex and you’re still talking to our models and relying on our model’s safety, then yes, you get that. If you’re running someone else’s model, it’s a little more up in the air.
Ksenia:
Since you launched Codex, how has it performed? What are you seeing?
Michael:
The response has been very positive. Usage is up roughly five times from the start of the year. We launched in April of last year as part of the o3 and o4 mini launch – we were using reasoning models, but tool calling and instruction following wasn’t quite where we wanted it to be. Then in August, when GPT-5 came out, we did a refresh of the CLI, and that’s really when it started moving. We had growth before, but it really started jumping up. Then we launched the VS Code extension later that summer and into the fall, and people really gravitated toward that – I believe VS Code overtook CLI usage. And then we launched the app at the start of this year, and that has really taken off. I think it’s genuinely the first of its kind in a lot of ways.
Ksenia:
What’s so new about it?
Michael:
Developers have historically spent most of their time in their IDE, so it makes sense to meet users where they are. Some users are in the terminal – that’s why we have the CLI. A lot of users are in an IDE – that’s why we’re in VS Code, and now integrated into JetBrains and Xcode as well. Those are obvious, natural places to go.
With the Codex app, we’ve actually established a new surface. I like to think of it as a mission control interface – now I’m managing many conversations in parallel. But it still has key pieces you’d expect from a traditional IDE: you can browse the diff the agent has made, you can pop open the terminal with Command-J without switching to a different window if you want to do something ad hoc. It’s really breaking the expectation that you have to have all your code in front of you at all times. For a lot of people, there’s more value in being able to organize and work across multiple agents simultaneously. That’s what we bring front and center.
or better watch the full video on YouTube
And here is also the second part ↓