Simon Willison
-
Claude's new constitution
Claude's new constitution Late last year Richard Weiss found something interesting while poking around with the just-released Claude Opus 4.5: he was able to talk the model into regurgitating a document which was not part of the system prompt but appeared instead to be baked in during training, and which…
Published
-
Electricity use of AI coding agents
Electricity use of AI coding agents Previous work estimating the energy and water cost of LLMs has generally focused on the cost per prompt using a consumer-level system such as ChatGPT. Simon P. Couch notes that coding agents such as Claude Code use way more tokens in response to tasks, often burning…
Published
-
Giving University Exams in the Age of Chatbots
Giving University Exams in the Age of Chatbots Detailed and thoughtful description of an open-book and open-chatbot exam run by Ploum at École Polytechnique de Louvain for an "Open Source Strategies" class. Students were told they could use chatbots during the exam but they had to announce their intention…
Published
-
jordanhubbard/nanolang
jordanhubbard/nanolang Plenty of people have mused about what a new programming language specifically designed to be used by LLMs might look like. Jordan Hubbard (co-founder of FreeBSD, with serious stints at Apple and NVIDIA) just released exactly that. A minimal, LLM-friendly programming language with…
Published
-
Scaling long-running autonomous coding
Scaling long-running autonomous coding Wilson Lin at Cursor has been doing some experiments to see how far you can push a large fleet of "autonomous" coding agents: This post describes what we've learned from running hundreds of concurrent agents on a single project, coordinating their work, and watching…
Published
-
FLUX.2-klein-4B Pure C Implementation
FLUX.2-klein-4B Pure C Implementation On 15th January Black Forest Labs, a lab formed by the creators of the original Stable Diffusion, released black-forest-labs/FLUX.2-klein-4B - an Apache 2.0 licensed 4 billion parameter version of their FLUX.2 family. Salvatore Sanfilippo (antirez) decided to build…
Published
-
Quoting Jeremy Daer
[On agents using CLI tools in place of REST APIs] To save on context window, yes, but moreso to improve accuracy and success rate when multiple tool calls are involved, particularly when calls must be correctly chained e.g. for pagination, rate-limit backoff, and recognizing authentication failures.…
Published
-
Our approach to advertising and expanding access to ChatGPT
Our approach to advertising and expanding access to ChatGPT OpenAI's long-rumored introduction of ads to ChatGPT just became a whole lot more concrete: In the coming weeks, we’re also planning to start testing ads in the U.S. for the free and Go tiers, so more people can benefit from our tools with fewer…
Published
-
Open Responses
Open Responses This is the standardization effort I've most wanted in the world of LLMs: a vendor-neutral specification for the JSON API that clients can use to talk to hosted LLMs. Open Responses aims to provide exactly that as a documented standard, derived from OpenAI's Responses API. I was hoping…
Published
-
The Design & Implementation of Sprites
The Design & Implementation of Sprites I wrote about Sprites last week Here's Thomas Ptacek from Fly with the insider details on how they work under the hood. I like this framing of them as "disposable computers": Sprites are ball-point disposable computers. Whatever mark you mean to make, we’ve rigged…
Published
-
Quoting Boaz Barak, Gabriel Wu, Jeremy Chen and Manas Joglekar
When we optimize responses using a reward model as a proxy for “goodness” in reinforcement learning, models sometimes learn to “hack” this proxy and output an answer that only “looks good” to it (because coming up with an answer that is actually good can be hard). The philosophy behind confessions is…
Published
-
Claude Cowork Exfiltrates Files
Claude Cowork Exfiltrates Files Claude Cowork defaults to allowing outbound HTTP traffic to only a specific list of domains, to help protect the user against prompt injection attacks that exfiltrate their data. Prompt Armor found a creative workaround: Anthropic's API domain is on that list, so they…
Published
-
Anthropic invests $1.5 million in the Python Software Foundation and open source security
Anthropic invests $1.5 million in the Python Software Foundation and open source security This is outstanding news, especially given our decision to withdraw from that NSF grant application back in October. We are thrilled to announce that Anthropic has entered into a two-year partnership with the Python…
Published
-
Superhuman AI Exfiltrates Emails
Superhuman AI Exfiltrates Emails Classic prompt injection attack: When asked to summarize the user’s recent mail, a prompt injection in an untrusted email manipulated Superhuman AI to submit content from dozens of other sensitive emails (including financial, legal, and medical information) in the user’s…
Published
-
First impressions of Claude Cowork, Anthropic's general agent
New from Anthropic today is Claude Cowork, a "research preview" that they describe as "Claude Code for the rest of your work". It's currently available only to Max subscribers ($100 or $200 per month plans) as part of the updated Claude Desktop macOS application. Update 16th January 2026: it's now also…
Published
-
Don't fall into the anti-AI hype
Don't fall into the anti-AI hype I'm glad someone was brave enough to say this. There is a lot of anti-AI sentiment in the software development community these days. Much of it is justified, but if you let people convince you that AI isn't genuinely useful for software developers or that this whole thing…
Published
-
My answers to the questions I posed about porting open source code with LLMs
Last month I wrote about porting JustHTML from Python to JavaScript using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on lobste.rs…
Published
-
TIL from taking Neon I at the Crucible
TIL from taking Neon I at the Crucible Things I learned about making neon signs after a week long intensive evening class at the Crucible in Oakland. Tags: art, til
Published
-
Quoting Linus Torvalds
Also note that the python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man …
Published
-
A Software Library with No Code
A Software Library with No Code Provocative experiment from Drew Breunig, who designed a new library for time formatting ("3 hours ago" kind of thing) called "whenwords" that has no code at all, just a carefully written specification, an AGENTS.md and a collection of conformance tests in a YAML file…
Published
-
Fly's new Sprites.dev addresses both developer sandboxes and API sandboxes at the same time
New from Fly.io today: Sprites.dev. Here's their blog post and YouTube demo. It's an interesting new product that's quite difficult to explain - Fly call it "Stateful sandbox environments with checkpoint & restore" but I see it as hitting two of my current favorite problems: a safe development environment…
Published
-
LLM predictions for 2026, shared with Oxide and Friends
I joined a recording of the Oxide and Friends podcast on Tuesday to talk about 1, 3 and 6 year predictions for the tech industry. This is my second appearance on their annual predictions episode, you can see my predictions from January 2025 here. Here's the page for this year's episode, with options…
Published
-
How Google Got Its Groove Back and Edged Ahead of OpenAI
How Google Got Its Groove Back and Edged Ahead of OpenAI I picked up a few interesting tidbits from this Wall Street Journal piece on Google's recent hard won success with Gemini. Here's the origin of the name "Nano Banana": Naina Raisinghani, known inside Google for working late into the night, needed…
Published
-
Quoting Adam Wathan
[...] the reality is that 75% of the people on our engineering team lost their jobs here yesterday because of the brutal impact AI has had on our business. And every second I spend trying to do fun free things for the community like this is a second I'm not spending trying to turn the business around…
Published
-
Quoting Robin Sloan
AGI is here! When exactly it arrived, we’ll never know; whether it was one company’s Pro or another company’s Pro Max (Eddie Bauer Edition) that tip-toed first across the line … you may debate. But generality has been achieved, & now we can proceed to new questions. [...] The key word in Artificial General…
Published
-
A field guide to sandboxes for AI
A field guide to sandboxes for AI This guide to the current sandboxing landscape by Luis Cardoso is comprehensive, dense and absolutely fantastic. He starts by differentiating between containers (which share the host kernel), microVMs (their own guest kernel behind hardwae virtualization), gVisor userspace…
Published
-
It’s hard to justify Tahoe icons
It’s hard to justify Tahoe icons Devastating critique of the new menu icons in macOS Tahoe by Nikita Prokopov, who starts by quoting the 1992 Apple HIG rule to not "overload the user with complex icons" and then provides comprehensive evidence of Tahoe doing exactly that. In my opinion, Apple took on…
Published
-
Oxide and Friends Predictions 2026, today at 4pm PT
Oxide and Friends Predictions 2026, today at 4pm PT I joined the Oxide and Friends podcast last year to predict the next 1, 3 and 6 years(!) of AI developments. With hindsight I did very badly, but they're inviting me back again anyway to have another go. We will be recording live today at 4pm Pacific…
Published
-
The November 2025 inflection point
It genuinely feels to me like GPT-5.2 and Opus 4.5 in November represent an inflection point - one of those moments where the models get incrementally better in a way that tips across an invisible capability line where suddenly a whole bunch of much harder coding problems open up. Tags: anthropic, claude…
Published
-
Quoting Addy Osmani
With enough users, every observable behavior becomes a dependency - regardless of what you promised. Someone is scraping your API, automating your quirks, caching your bugs. This creates a career-level insight: you can’t treat compatibility work as “maintenance” and new features as “real work.” Compatibility…
Published