Simon Willison

  1. Introducing Gemma 3n: The developer guide

    Introducing Gemma 3n: The developer guide Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered with a focus on efficiency, Gemma 3n models are…

    Published

  2. Geminiception

    Yesterday Anthropic got a bunch of buzz out of their new window.claude.complete() API which allows Claude Artifacts to run their own API calls to execute prompts. It turns out Gemini had beaten them to that feature by over a month, but the announcement was tucked away in a bullet point of their release…

    Published

  3. New sandboxes from Cloudflare and Vercel

    Two interesting new products for running code in a sandbox today. Cloudflare launched their Containers product in open beta, and added a new Sandbox library for Cloudflare Workers that can run commands in a "secure, container-based environment": import { getSandbox } from "@cloudflare/sandbox"; const…

    Published

  4. Build and share AI-powered apps with Claude

    Build and share AI-powered apps with Claude Anthropic have added one of the most important missing features to Claude Artifacts: apps built as artifacts now have the ability to run their own prompts against Claude via a new API. Claude Artifacts are web apps that run in a strictly controlled browser…

    Published

  5. Quoting Christoph Niemann

    Creating art is a nonlinear process. I start with a rough goal. But then I head into dead ends and get lost or stuck. The secret to my process is to be on high alert in this deep jungle for unexpected twists and turns, because this is where a new idea is born. I can't make art when I'm excluded from…

    Published

  6. Gemini CLI

    Gemini CLI First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own version of what I am calling a "terminal agent" - a CLI tool that can read and write files and execute commands on your behalf in the terminal…

    Published

  7. Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books

    Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books Major USA legal news for the AI industry today. Judge William Alsup released a "summary judgement" (a legal decision that results in some parts of a case skipping a trial) in a lawsuit between five authors and…

    Published

  8. Phoenix.new is Fly's entry into the prompt-driven app development space

    Here's a fascinating new entrant into the AI-assisted-programming / coding-agents space by Fly.io, introduced on their blog in Phoenix.new – The Remote AI Runtime for Phoenix: describe an app in a prompt, get a full Phoenix application, backed by SQLite and running on Fly's hosting platform. The official…

    Published

  9. Disclosures

    I've added a Disclosures section to my about page, listing my various sources of income and the companies that directly sponsor my work or have supported it in the recent past. I do not receive any compensation writing about specific topics on this blog - no sponsored content! I plan to continue this…

    Published

  10. Quoting Kent Beck

    So you can think really big thoughts and the leverage of having those big thoughts has just suddenly expanded enormously. I had this tweet two years ago where I said "90% of my skills just went to zero dollars and 10% of my skills just went up 1000x". And this is exactly what I'm talking about - having…

    Published

  11. My First Open Source AI Generated Library

    My First Open Source AI Generated Library Armin Ronacher had Claude and Claude Code do almost all of the work in building, testing, packaging and publishing a new Python library based on his design: It wrote ~1100 lines of code for the parser It wrote ~1000 lines of tests It configured the entire Python…

    Published

  12. Edit is now open source

    Edit is now open source Microsoft released a new text editor! Edit is a terminal editor - similar to Vim or nano - that's designed to ship with Windows 11 but is open source, written in Rust and supported across other platforms as well. Edit is a small, lightweight text editor. It is less than 250kB…

    Published

  13. model.yaml

    model.yaml From their GitHub repo it looks like this effort quietly launched a couple of months ago, driven by the LM Studio team. Their goal is to specify an "open standard for defining crossplatform, composable AI models". A model can be defined using a YAML file that looks like this: model: mistralai…

    Published

  14. Quoting FAQ for Your Brain on ChatGPT

    Is it safe to say that LLMs are, in essence, making us "dumber"? No! Please do not use the words like “stupid”, “dumb”, “brain rot”, "harm", "damage", and so on. It does a huge disservice to this work, as we did not use this vocabulary in the paper, especially if you are a journalist reporting on it…

    Published

  15. AbsenceBench: Language Models Can't Tell What's Missing

    AbsenceBench: Language Models Can't Tell What's Missing Here's another interesting result to file under the "jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive. Long context models have been getting increasingly good at passing "Needle in a Haystack" tests recently,…

    Published

  16. Magenta RealTime: An Open-Weights Live Music Model

    Magenta RealTime: An Open-Weights Live Music Model Fun new "live music model" release from Google DeepMind: Today, we’re happy to share a research preview of Magenta RealTime (Magenta RT), an open-weights live music model that allows you to interactively create, control and perform music in the moment…

    Published

  17. Agentic Misalignment: How LLMs could be insider threats

    Agentic Misalignment: How LLMs could be insider threats One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken offline and replaced with a new AI system; and (2) the engineer responsible…

    Published

  18. python-importtime-graph

    python-importtime-graph I was exploring why a Python tool was taking over a second to start running and I learned about the python -X importtime feature, documented here. Adding that option causes Python to spit out a text tree showing the time spent importing every module. I tried that like this: python…

    Published

  19. Mistral-Small 3.2

    Mistral-Small 3.2 Released on Hugging Face a couple of hours ago, so far there aren't any quantizations to run it on a Mac but I'm sure those will emerge pretty quickly. This is a minor bump to Mistral Small 3.1, one of my favorite local models. I've been running Small 3.1 via Ollama where it's a 15GB…

    Published

  20. Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk

    Cato CTRL™ Threat Research: PoC Attack Targeting Atlassian’s Model Context Protocol (MCP) Introduces New “Living off AI” Risk Stop me if you've heard this one before: A threat actor (acting as an external user) submits a malicious support ticket. An internal user, linked to a tenant, invokes an MCP-connected…

    Published

  21. playbackrate

    Here's a tip that works on YouTube and almost any other web page that shows you a video. You can increase the playback rate beyond the usually-exposed 2x by running this in your browser DevTools console: document.querySelector('video').playbackRate = 2.5 I find this is the fastest I can reasonably watch…

    Published

  22. How OpenElections Uses LLMs

    How OpenElections Uses LLMs The OpenElections project collects detailed election data for the USA, all the way down to the precinct level. This is a surprisingly hard problem: while county and state-level results are widely available, precinct-level results are published in thousands of different ad…

    Published

  23. Clarified zucchini consommé

    I continue to have fun running fantasy cooking prompts through LLMs - this time I tried "Give me a wildly ambitious recipe for zucchini cooked three ways" followed by "Go more ambitious" and now I need to get myself a centrifuge to help spherify my clarified zucchini consommé. Tags: llms, cooking, ai…

    Published

  24. Quoting Arvind Narayanan

    Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is…

    Published

  25. Quoting Workaccount2 on Hacker News

    They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5). They really…

    Published

  26. Coding agents require skilled operators

    I wrote this recently in a conversation about whether coding agents can work as a replacement for human programmers. The "agentic" coding tools we have right now work like this: A skilled individual with both deep domain understanding and deep understanding of the capabilities of the agent (including…

    Published

  27. I counted all of the yurts in Mongolia using machine learning

    I counted all of the yurts in Mongolia using machine learning Fascinating, detailed account by Monroe Clinton of a geospatial machine learning project. Monroe wanted to count visible yurts in Mongolia using Google Maps satellite view. The resulting project incorporates mercantile for tile calculations…

    Published

  28. It's a trap

    That memvid thing that's been going around recently is a trap. It's an embedding store that records the original text that has been embedded in QR codes in a video file. That's an absurd thing to do, and the only purpose of the repo is to make people who uncritically share it look foolish. Don't fall…

    Published

  29. Trying out the new Gemini 2.5 model family

    After many months of previews, Gemini 2.5 Pro and Flash have reached general availability with new, memorable model IDs: gemini-2.5-pro and gemini-2.5-flash. They are joined by a new preview model with an unmemorable name: gemini-2.5-flash-lite-preview-06-17 is a new Gemini 2.5 Flash Lite model that…

    Published

  30. Quoting Donghee Na

    The Steering Council (SC) approves PEP 779 [Criteria for supported status for free-threaded Python], with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14 [...] With these recommendations and the acceptance of this PEP, we as the Python developer community should…

    Published