Simon Willison
-
Introducing Gemma 3n: The developer guide
Introducing Gemma 3n: The developer guide Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered with a focus on efficiency, Gemma 3n models are…
Published
-
Geminiception
Yesterday Anthropic got a bunch of buzz out of their new window.claude.complete() API which allows Claude Artifacts to run their own API calls to execute prompts. It turns out Gemini had beaten them to that feature by over a month, but the announcement was tucked away in a bullet point of their release…
Published
-
New sandboxes from Cloudflare and Vercel
Two interesting new products for running code in a sandbox today. Cloudflare launched their Containers product in open beta, and added a new Sandbox library for Cloudflare Workers that can run commands in a "secure, container-based environment": import { getSandbox } from "@cloudflare/sandbox"; const…
Published
-
Build and share AI-powered apps with Claude
Build and share AI-powered apps with Claude Anthropic have added one of the most important missing features to Claude Artifacts: apps built as artifacts now have the ability to run their own prompts against Claude via a new API. Claude Artifacts are web apps that run in a strictly controlled browser…
Published
-
Quoting Christoph Niemann
Creating art is a nonlinear process. I start with a rough goal. But then I head into dead ends and get lost or stuck. The secret to my process is to be on high alert in this deep jungle for unexpected twists and turns, because this is where a new idea is born. I can't make art when I'm excluded from…
Published
-
Gemini CLI
Gemini CLI First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own version of what I am calling a "terminal agent" - a CLI tool that can read and write files and execute commands on your behalf in the terminal…
Published
-
Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books
Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books Major USA legal news for the AI industry today. Judge William Alsup released a "summary judgement" (a legal decision that results in some parts of a case skipping a trial) in a lawsuit between five authors and…
Published
-
Phoenix.new is Fly's entry into the prompt-driven app development space
Here's a fascinating new entrant into the AI-assisted-programming / coding-agents space by Fly.io, introduced on their blog in Phoenix.new – The Remote AI Runtime for Phoenix: describe an app in a prompt, get a full Phoenix application, backed by SQLite and running on Fly's hosting platform. The official…
Published
-
Disclosures
I've added a Disclosures section to my about page, listing my various sources of income and the companies that directly sponsor my work or have supported it in the recent past. I do not receive any compensation writing about specific topics on this blog - no sponsored content! I plan to continue this…
Published
-
Quoting Kent Beck
So you can think really big thoughts and the leverage of having those big thoughts has just suddenly expanded enormously. I had this tweet two years ago where I said "90% of my skills just went to zero dollars and 10% of my skills just went up 1000x". And this is exactly what I'm talking about - having…
Published
-
My First Open Source AI Generated Library
My First Open Source AI Generated Library Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design: It wrote ~1100 lines of code for the parser It wrote ~1000 lines of tests It configured the entire Python…
Published
-
Edit is now open source
Edit is now open source Microsoft released a new text editor! Edit is a terminal editor - similar to Vim or nano - that's designed to ship with Windows 11 but is open source, written in Rust and supported across other platforms as well. Edit is a small, lightweight text editor. It is less than 250kB…
Published
-
model.yaml
model.yaml From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an "open standard for defining crossplatform, composable AI models". A model can be defined using a YAML file that looks like this: model: mistralai…
Published
-
Quoting FAQ for Your Brain on ChatGPT
Is it safe to say that LLMs are, in essence, making us "dumber"? No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it…
Published
-
AbsenceBench: Language Models Can't Tell What's Missing
AbsenceBench: Language Models Can't Tell What's Missing Here's another interesting result to file under the "jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle in a Haystack" tests recently,…
Published
-
Magenta RealTime: An Open-Weights Live Music Model
Magenta RealTime: An Open-Weights Live Music Model Fun new "live music model" release from Google DeepMind: Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment…
Published
-
Agentic Misalignment: How LLMs could be insider threats
Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible…
Published
-
python-importtime-graph
python-importtime-graph I was exploring why a Python tool was taking over a second to start running and I learned about the python -X importtime feature, documented here. Adding that option causes Python to spit out a text tree showing the time spent importing every module. I tried that like this: python…
Published
-
Mistral-Small 3.2
Mistral-Small 3.2 Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly. This is a minor bump to Mistral Small 3.1, one of my favorite local models. I've been running Small 3.1 via Ollama where it's a 15GB…
Published
-
Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk
Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Stop me if you've heard this one before: A threat actor (acting as an external user) submits a malicious support ticket. An internal user, linked to a tenant, invokes an MCP-connected…
Published
-
playbackrate
Here's a tip that works on YouTube and almost any other web page that shows you a video. You can increase the playback rate beyond the usually-exposed 2x by running this in your browser DevTools console: document.querySelector('video').playbackRate = 2.5 I find this is the fastest I can reasonably watch…
Published
-
How OpenElections Uses LLMs
How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in thousands of different ad…
Published
-
Clarified zucchini consommé
I continue to have fun running fantasy cooking prompts through LLMs - this time I tried "Give me a wildly ambitious recipe for zucchini cooked three ways" followed by "Go more ambitious" and now I need to get myself a centrifuge to help spherify my clarified zucchini consommé. Tags: llms, cooking, ai…
Published
-
Quoting Arvind Narayanan
Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is…
Published
-
Quoting Workaccount2 on Hacker News
They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5). They really…
Published
-
Coding agents require skilled operators
I wrote this recently in a conversation about whether coding agents can work as a replacement for human programmers. The "agentic" coding tools we have right now work like this: A skilled individual with both deep domain understanding and deep understanding of the capabilities of the agent (including…
Published
-
I counted all of the yurts in Mongolia using machine learning
I counted all of the yurts in Mongolia using machine learning Fascinating, detailed account by Monroe Clinton of a geospatial machine learning project. Monroe wanted to count visible yurts in Mongolia using Google Maps satellite view. The resulting project incorporates mercantile for tile calculations…
Published
-
It's a trap
That memvid thing that's been going around recently is a trap. It's an embedding store that records the original text that has been embedded in QR codes in a video file. That's an absurd thing to do, and the only purpose of the repo is to make people who uncritically share it look foolish. Don't fall…
Published
-
Trying out the new Gemini 2.5 model family
After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model with an unmemorable name: gemini-2.5-flash-lite-preview-06-17 is a new Gemini 2.5 Flash Lite model that…
Published
-
Quoting Donghee Na
The Steering Council (SC) approves PEP 779 [Criteria for supported status for free-threaded Python], with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14 [...] With these recommendations and the acceptance of this PEP, we as the Python developer community should…
Published