Simon Willison quoting Romain Huet on something or other — A quote of a quote. If Romain says something that actually moves the needle, we'll cover Romain. Skip the echo.
Tyler Cowen's Saturday assorted links at Marginal Revolution — It's a link dump. Ten random tabs in a trench coat. Open it on Sunday with coffee, not on a workday.
Vercel's blog post about GPT-5.5 landing on the AI Gateway — Vendor announces vendor adds model. If you're shipping on Vercel you already got the email. Everyone else, move on.
🎯 YOUR MOVE
-- do this today
🎯
Re-run the Mojo agent bake-off with GPT-5.5 and Claude Opus 4.7 side by side this week. Use Simon Willison's llm 0.31 to script it, log latency and tool-call accuracy, and pick the new default backbone by Friday.
⚡
Rip HTTP out of our Responses API integration and move Codex loops to WebSockets. Measure tool-call round-trip before and after, then post the delta in #engineering so the team stops debating it.
🔧
Pick a stablecoin payment rail this week (USDC on Base is the obvious starting point) and wire a sandbox checkout into one Mojo agent. Agents need hands before clients ask for them, not after.
🎙️ NOTEBOOKLM SOURCE
🎧Generate Podcast with NotebookLMtap to expand
# Cup of Mojo -- Daily AI Brief -- Sunday, April 26, 2026
Broadcast-ready source document. Read the COLD OPEN verbatim, cover every numbered beat in the RUNDOWN in tier-appropriate airtime, close with the MOJO TAKE verbatim. APPENDIX is reference only.
**Articles analyzed:** 10
**Sources monitored:** 8
**Beats to cover:** 10
---
## EDITORIAL GUIDANCE FOR PODCAST NARRATION
Read these rules before narrating. They are non-negotiable.
1. **Read the COLD OPEN verbatim.** Do not rewrite or paraphrase it. It is the episode's signature opening.
2. **Cover every numbered beat.** Beats are numbered 1 through 8. Do not skip any beat. Do not consolidate. Each beat gets its own moment.
3. **Honor the tier structure.** `deep` beats get longer treatment with full context. `standard` beats are structured but concise. `rapid_fire` beats are short and punchy. Use roughly 2 minutes for the deep beat, 1 minute per standard beat, 20-30 seconds per rapid-fire beat.
4. **Cite sources by name** when presenting a claim. Say "OpenAI announced..." not "a company announced".
5. **Use only the plain-English text in each beat.** Do not pull technical jargon from the APPENDIX. The appendix is reference material for context, not script content. If a beat does not mention a term, do not introduce it.
6. **Only use numbers that appear in a beat's own text.** Do not import statistics from the appendix. Omit rather than fabricate.
7. **Reference earlier beats when topics connect.** Each beat has a `callbacks` field listing earlier beat numbers it relates to. When narrating, explicitly link back: "Remember that supply chain attack from Beat 1? This next one shows how the downstream risk compounds." Callbacks create cohesion and prevent the episode from feeling like a list.
8. **Introduce one skeptical angle per deep or standard beat.** Phrases like "one caveat", "critics will point out", or "this is not yet peer-reviewed" create credibility. Rapid-fire beats can skip this.
9. **Use the pronunciation guide for every named person or company.** Do not guess pronunciations.
10. **Close with the MOJO TAKE outro.** Read it as the host's editorial perspective, not as a summary.
---
## PRONUNCIATION GUIDE
The following names appear in today's content. Use these phonetic pronunciations:
- **Anthropic** — pronounced *an-THROP-ik*
- **DeepMind** — pronounced *DEEP-mind*
---
## COLD OPEN -- Read This Verbatim
Read the HOOK line first, pause for a beat, then the TEASE. Do not rewrite. Do not paraphrase. Do not add any preamble.
> **Hook:** OpenAI dropped GPT-5.5 overnight, and Sam Altman skipped the half-step naming drama everyone predicted. It's here, it's shipping, and the benchmarks are loud.
> **Tease:** We dig into GPT-5.5 and what it actually changes for builders, then hit OpenAI's WebSockets upgrade for the Responses API, Anthropic and NEC's Japan play, and the funding rounds that mattered this week.
---
## TODAY'S RUNDOWN
Cover every beat in order. Do not skip. Tier labels tell you how much airtime each beat deserves.
### Beat ? [DEEP] — OpenAI drops GPT-5.5 and the agent routing math just changed
**Source:** OpenAI Blog | https://openai.com/index/introducing-gpt-5-5
**Hook (open with this):** OpenAI shipped GPT-5.5 and Sam Altman's team is calling it their smartest model yet. Faster, sharper on tool use, and pitched straight at the work that actually pays the bills: coding, research, deep data analysis.
**Plain English:** GPT-5.5 is OpenAI's new flagship, tuned for multi-step jobs where the model has to think, call a tool, read the result, and keep going. The headline pitch is better reasoning and cleaner tool-use, which is the exact bottleneck that wrecks most agent workflows today. If you build agents, this is the model you test against tomorrow morning.
**Stakes:** Ignore it and you keep paying premium rates for a backbone that's slower and dumber than what your competitor just wired into their stack overnight.
**Twist:** The interesting part isn't raw IQ, it's tool-use reliability. That's the number that decides whether your agent finishes the job or stalls out on step four and burns tokens.
**Takeaway:** Re-benchmark your agent stack against GPT-5.5 this week. The right backbone for complex builds shifts every few months, and pretending it doesn't is how you overpay.
### Beat ? [STANDARD] — OpenAI swaps HTTP for WebSockets in the Responses API and Codex agent loops get faster
**Source:** OpenAI Blog | https://openai.com/index/speeding-up-agentic-workflows-with-websockets
**Callbacks:** references Beat 1. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** OpenAI just rewired the Codex agent loop with WebSockets and connection-scoped caching, and the latency drop is the kind of thing you feel in production.
**Plain English:** Every tool call in an agent loop used to spin up a fresh HTTP request, re-auth, re-route, re-warm the cache. OpenAI pinned one open WebSocket connection per agent and kept the cache hot across turns. Less handshake tax, faster turns, lower bills on long-running agents.
**Stakes:** Ignore this and your agents keep paying a per-call HTTP tax that compounds on every tool hop, while teams who switched ship snappier products at lower cost.
**Twist:** The win wasn't a smarter model, it was deleting overhead nobody was measuring because everyone assumed HTTP was free.
**Takeaway:** Persistent connections beat smarter prompts when your agent makes ten tool calls per task.
### Beat ? [STANDARD] — Anthropic plants a flag in Japan with NEC, putting Claude on 30,000 desks
**Source:** Anthropic Blog | https://www.anthropic.com/news/anthropic-nec
**Callbacks:** references Beat 1. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** Anthropic just signed NEC as its first Japan-based global partner, and 30,000 NEC employees are getting Claude.
**Plain English:** NEC and Anthropic are co-building products for finance, manufacturing, and government. That's not a pilot. That's a full rollout into three of the most regulated industries on the planet, in a country that usually moves slowly on foreign software. Anthropic is buying enterprise credibility one giant logo at a time.
**Stakes:** If you sell into regulated verticals and you're not naming a model partner, your enterprise buyers will assume you don't have one.
**Twist:** Japan was supposed to be OpenAI's stronghold through SoftBank, and Anthropic just walked in the front door of NEC anyway.
**Takeaway:** Enterprise model picks are becoming geopolitical, and Claude is the regulated-industry play.
### Beat ? [STANDARD] — a16z's Charts of the Week says stablecoins are flipping from transfers to actual payments
**Source:** a16z AI | https://www.a16z.news/p/charts-of-the-week-software-ate-the
**Callbacks:** references Beat 1. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** a16z just dropped its Charts of the Week and buried the lede: stablecoin volume is shifting from crypto-bro transfers to real payments for real things.
**Plain English:** For years stablecoins were just plumbing for moving money between exchanges. a16z's data shows that mix is tilting toward checkout, payroll, and B2B invoices. Same rails, different job. When your AI agent eventually pays a vendor, it is not wiring dollars through Chase. It is sending USDC.
**Stakes:** Build agents that handle money and ignore stablecoin rails, and you will be re-architecting payments in twelve months while your competitors ship.
**Twist:** The boring part of crypto, the dollar-pegged tokens everyone yawned at, is the part that is quietly winning the agent economy.
**Takeaway:** Agents do not use credit cards. Pick a stablecoin payment stack now so your agent has hands when it needs them.
### Beat ? [RAPID_FIRE] — Amazon writes Anthropic a $5 billion check and the megaround era cools off everywhere else
**Source:** Crunchbase News (AI) | https://news.crunchbase.com/venture/biggest-funding-rounds-ai-autonomy-biotech-anthropic/
**Callbacks:** references Beat 3. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** Amazon just dropped another $5 billion on Anthropic, and it's basically half of this week's top-10 funding list by itself.
**Plain English:** Crunchbase says only five of the top ten rounds this week cleared $100 million. That's unusually thin for 2025. Amazon and Anthropic ate the spotlight while AI, autonomy, and biotech split the rest.
**Stakes:** If you're raising right now and you're not in the AI, autonomy, or biotech lanes, the megaround tap is tightening on you specifically.
**Twist:** The headline isn't that money's flowing. It's that capital is concentrating into fewer, bigger bets on the same handful of names.
**Takeaway:** Amazon is doubling down on Claude, which means your enterprise model bake-off keeps getting more political, not less.
### Beat ? [RAPID_FIRE] — arXiv paper pitches the last agent harness you'll ever build, and it's aimed straight at custom workflows
**Source:** arXiv cs.AI | https://arxiv.org/abs/2604.21003
**Callbacks:** references Beat 1, Beat 2. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** arXiv just dropped a paper called 'The Last Harness You'll Ever Build' and the title alone should have every agent builder paying attention.
**Plain English:** Right now, every new client workflow means hand-building a custom harness. Web clicks, form fills, multi-step research, code review. The paper argues we can stop rewriting that scaffolding for every domain and use one general harness instead. Big if true for anyone shipping agents to SMBs.
**Stakes:** Keep building bespoke harnesses per client and you'll drown in maintenance before you hit ten deployments.
**Twist:** The bottleneck on enterprise agents isn't the model anymore, it's the boring glue code nobody wants to write twice.
**Takeaway:** Read this paper this weekend. If it holds up, your next client build looks nothing like your last one.
### Beat ? [RAPID_FIRE] — Marginal Revolution flags a paper where agentic AI matches human economists on causal inference, with tighter tails
**Source:** Marginal Revolution | https://marginalrevolution.com/marginalrevolution/2026/04/a-comparison-of-agentic-ai-systems-and-human-economists.html?utm_source=rss&utm_medium=rss&utm_campaign=a-comparison-of-agentic-ai-systems-and-human-economists
**Callbacks:** references Beat 6. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** Tyler Cowen surfaced a paper pitting agentic AI against human economists on causal inference, and the agents held their own.
**Plain English:** Researchers gave the same causal estimation tasks to AI agents and human economists. Median answers landed in roughly the same place. The humans had wider tails, meaning more wild misses on the edges.
**Stakes:** If you still assume humans are the safer bet for analytical work, you're pricing your consulting bench wrong.
**Twist:** Agents weren't just close on the median, they were less likely to blow up with a crazy estimate than the PhDs were.
**Takeaway:** Median is a tie. Tails go to the agents. That's the new baseline for analytical work.
### Beat ? [RAPID_FIRE] — Simon Willison ships llm 0.31 with day-one GPT-5.5 support and verbosity knobs
**Source:** Simon Willison | https://simonwillison.net/2026/Apr/24/llm/#atom-everything
**Callbacks:** references Beat 1. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** Simon Willison's llm CLI hit 0.31 and it already speaks GPT-5.5. Type llm -m gpt-5.5 and you're in.
**Plain English:** The new release adds the GPT-5.5 model, a verbosity flag with low, medium, and high settings, and image detail controls including a fresh 'original' value for GPT-5.4 and 5.5. Every model in extra-openai-models.yaml now runs async too.
**Stakes:** If you're still poking GPT-5.5 through curl, you're burning hours Simon already saved you.
**Twist:** The verbosity dial is the sleeper feature. Cranking it to low cuts tokens hard without touching your prompt.
**Takeaway:** Pip install llm, pass -o verbosity low, and benchmark GPT-5.5 before lunch.
### Beat ? [RAPID_FIRE] — Zvi Mowshowitz calls it: Claude Opus 4.7 week, and the frontier just got a third horse
**Source:** Zvi Mowshowitz | https://thezvi.substack.com/p/ai-165-in-our-image
**Callbacks:** references Beat 1, Beat 3, Beat 5. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** Zvi Mowshowitz's AI #165 says it plain. This was the week of Claude Opus 4.7, and Anthropic just put a real Opus back on the board.
**Plain English:** Anthropic shipped Claude Opus 4.7 and Zvi's weekly roundup makes it the headline. Opus is back as a serious frontier pick alongside GPT-5.5, not just the expensive sibling nobody routed to. If you froze your model picks last quarter, you're already behind.
**Stakes:** Skip the Opus 4.7 eval and you'll keep paying GPT-5.5 prices for tasks Claude now does cheaper or better.
**Twist:** Two weeks ago the conversation was GPT-5.5 versus Claude Sonnet. Opus 4.7 just made it a three-way fight again.
**Takeaway:** Re-run your agent bake-off with Opus 4.7 in the mix this week, not next month.
### Beat ? [RAPID_FIRE] — OpenAI ships Workspace Agents inside ChatGPT and your custom build moat just got smaller
**Source:** OpenAI Blog | https://openai.com/academy/workspace-agents
**Callbacks:** references Beat 2, Beat 6. Reference these earlier beats aloud when narrating this one.
**Hook (open with this):** OpenAI just dropped Workspace Agents in ChatGPT, and the no-code crowd can now wire up tools and automate workflows without calling you.
**Plain English:** OpenAI's Academy is teaching teams to build agents inside ChatGPT that connect to tools and run repeatable work. It is a direct shot at the custom orchestration layer a lot of consultants charge for. The surface is getting commoditized one click at a time.
**Stakes:** If your build practice sells basic tool-chaining and ticket triage, OpenAI is about to undercut you from inside the chat your client already pays for.
**Twist:** The companies who win are the ones who use Workspace Agents as the cheap front door and sell the messy custom plumbing behind it.
**Takeaway:** Stop selling the agent. Sell the integration, the eval loop, and the accountability around it.
---
## NOT WORTH YOUR TIME TODAY
Do not cover on air. These are listed so the host can acknowledge if asked.
- **Simon Willison quoting Romain Huet on something or other** -- A quote of a quote. If Romain says something that actually moves the needle, we'll cover Romain. Skip the echo.
- **Tyler Cowen's Saturday assorted links at Marginal Revolution** -- It's a link dump. Ten random tabs in a trench coat. Open it on Sunday with coffee, not on a workday.
- **Vercel's blog post about GPT-5.5 landing on the AI Gateway** -- Vendor announces vendor adds model. If you're shipping on Vercel you already got the email. Everyone else, move on.
---
## ACTION ITEMS FOR THIS WEEK (Joey only)
These are internal action items. Not for on-air narration.
- Re-run the Mojo agent bake-off with GPT-5.5 and Claude Opus 4.7 side by side this week. Use Simon Willison's llm 0.31 to script it, log latency and tool-call accuracy, and pick the new default backbone by Friday.
- Rip HTTP out of our Responses API integration and move Codex loops to WebSockets. Measure tool-call round-trip before and after, then post the delta in #engineering so the team stops debating it.
- Pick a stablecoin payment rail this week (USDC on Base is the obvious starting point) and wire a sandbox checkout into one Mojo agent. Agents need hands before clients ask for them, not after.
---
## MOJO TAKE -- Editorial Outro (Read Verbatim)
Three-paragraph outro. Read each block verbatim, with natural pauses between them.
> **Connect the dots:** Three frontier labs shipped today. OpenAI dropped GPT-5.5 and Workspace Agents, Anthropic landed Claude on 30,000 NEC desks while Amazon wrote them five billion, and Zvi says Opus 4.7 is days out. Meanwhile Simon Willison, a16z, and an arXiv paper are quietly handing you the harness, the rails, and the payment stack. The model layer is a three-horse race. The integration layer is wide open.
> **Watch next:** Watch for Opus 4.7 to drop midweek and force a real bake-off against GPT-5.5. Watch stablecoin rails get a name-brand agent integration. And watch whether Workspace Agents inside ChatGPT eats the bottom of your custom build pipeline by Friday.
> **Sign-off:** Re-bench your stack, pick your horse, and ship the integration nobody else can. That's the moat this week. Joey out.
---
## APPENDIX -- VERBATIM SOURCE CONTENT
Reference material. Do not read verbatim. Do not pull jargon from here into the spoken script. If the rundown beat does not mention a term, do not introduce it on the podcast.
### OpenAI drops GPT-5.5 and the agent routing math just changed
**Source:** OpenAI Blog
**Link:** https://openai.com/index/introducing-gpt-5-5
*RSS summary:* Introducing GPT-5.5, our smartest model yet—faster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
### OpenAI swaps HTTP for WebSockets in the Responses API and Codex agent loops get faster
**Source:** OpenAI Blog
**Link:** https://openai.com/index/speeding-up-agentic-workflows-with-websockets
*RSS summary:* A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency.
### arXiv paper pitches the last agent harness you'll ever build, and it's aimed straight at custom workflows
**Source:** arXiv cs.AI
**Link:** https://arxiv.org/abs/2604.21003
Computer Science > Artificial Intelligence
Title:The Last Harness You'll Ever Build
View PDF HTML (experimental)Abstract:AI agents are increasingly deployed on complex, domain-specific workflows -- navigating enterprise web applications that require dozens of clicks and form fills, orchestrating multi-step research pipelines that span search, extraction, and synthesis, automating code review across unfamiliar repositories, and handling customer escalations that demand nuanced domain knowledge. \textbf{Each new task domain requires painstaking, expert-driven harness engineering}: designing the prompts, tools, orchestration logic, and evaluation criteria that make a foundation model effective. We present a two-level framework that automates this process. At the first level, the \textbf{Harness Evolution Loop} optimizes a worker agent's harness $\mathcal{H}$ for a single task: a Worker Agent $W_{\mathcal{H}}$ executes the task, an Evaluator Agent $V$ adversarially diagnoses failures and scores performance, and an Evolution Agent $E$ modifies the harness based on the full history of prior attempts. At the second level, the \textbf{Meta-Evolution Loop} optimizes the evolution protocol $\Lambda = (W_{\mathcal{H}}, \mathcal{H}^{(0)}, V, E)$ itself across diverse tasks, \textbf{learning a protocol $\Lambda^{(\text{best})}$ that enables rapid harness convergence on any new task -- so that adapting an agent to a novel domain requires no human harness engineering at all.} We formalize the correspondence to meta-learning and present both algorithms. The framework \textbf{shifts manual harness engineering into automated harness engineering}, and takes one step further -- \textbf{automating the design of the automation itself}.
### Anthropic plants a flag in Japan with NEC, putting Claude on 30,000 desks
**Source:** Anthropic Blog
**Link:** https://www.anthropic.com/news/anthropic-nec
Anthropic and NEC collaborate to build Japan’s largest AI engineering workforce
NEC Corporation will use Claude as it builds one of Japan’s largest AI-native engineering organizations, making it available to approximately 30,000 NEC Group employees worldwide.
As part of this strategic collaboration, NEC will become Anthropic’s first Japan-based global partner. Together, we will develop secure, industry-specific AI products for the Japanese market, starting with tools for finance, manufacturing, and local government.
“This long-term partnership with Anthropic enables NEC to maximize the potential of AI in the Japanese market,” said Toshifumi Yoshizaki, Executive Officer and COO of NEC Corporation. “Together, we aim to create solutions that meet the high safety, reliability, and quality standards demanded by companies and public administration in Japan.”
Claude for NEC’s customers
NEC and Anthropic will jointly develop secure, domain-specific AI products for Japanese customers in sectors like finance, manufacturing, and cybersecurity.
In addition, NEC is already integrating Claude into its Security Operations Center services to help defend customers against increasingly sophisticated cybersecurity threats. Claude will also be integrated into the next-generation cybersecurity service NEC is currently providing.
Claude, including Claude Opus 4.7, and Claude Code will be incorporated into NEC BluStellar Scenario, a program that provides consulting, AI tools, security, and digital infrastructure to businesses, starting with its offerings for data-driven management and customer experience, and gradually expanding to others.
How NEC will use Claude internally
Internally, NEC will establish a Center of Excellence to develop a highly skilled, AI-enabled engineering organization, supported by technical enablement and training from Anthropic. NEC aims to build one of Japan’s largest AI-native engineering teams, who will use Claude Code in their work.
As part of its long-running Client Zero initiative, in which NEC serves as its own first customer before offering its technology to clients, NEC will also expand its use of Claude Cowork across its internal business operations.
Availability
Claude is now being deployed to NEC Group employees around the world, and our joint development of industry-specific AI solutions is underway. Learn more about NEC’s value-creation model at NEC BluStellar.
Claude, Claude Code, and Claude Cowork are Anthropic products. NEC BluStellar is an offering from NEC Corporation.
Related content
An update on our election safeguards
We explain what we’re doing to ensure Claude plays a positive role in the US midterms and other major elections around the world this year.
Read moreIntroducing Claude Design by Anthropic Labs
Today, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.
Read more
### OpenAI ships Workspace Agents inside ChatGPT and your custom build moat just got smaller
**Source:** OpenAI Blog
**Link:** https://openai.com/academy/workspace-agents
*RSS summary:* Learn how to build, use, and scale workspace agents in ChatGPT to automate repeatable workflows, connect tools, and streamline team operations.
### a16z's Charts of the Week says stablecoins are flipping from transfers to actual payments
**Source:** a16z AI
**Link:** https://www.a16z.news/p/charts-of-the-week-software-ate-the
Charts of the Week: Software Ate the World
Railroad GPT; Stablecoins volumes are shifting from transfers to payments; The Next Decade of News; See ya later, productivity gains
America | Tech | Opinion | Culture | Charts
We’re excited to welcome Lisha Li to the a16z Infra team. See her announcement here. -AD
Software ate the world
Obviously, we’re biased, but it’s hard to overestimate just how important technology is to the global economy.
You might even say that software, literally, ate the world:
The top 10 public companies by market cap are larger than the combined GDPs of the G7 (ex-US)--and that would be true, even if one excluded Saudi Aramco, which no one would consider a “tech” company. (Although it was founded in San Francisco!)1
To be fair, the Top 10 list is more “tech and semis [and however one would categorize Tesla and Apple]” than pure-play software, but the point stands: tech isn’t just a big deal, it’s the biggest deal.
And tech’s global takeover has all happened fairly recently:
The top 10 techcos were a small fraction of the G7 (ex-US), until cloud really began to hit its stride in ~16-’17. From that point, it took less than a decade for their combined market cap to eclipse the rest-of-world’s GDP (ex-China).
Tech’s ascendancy isn’t just a changing of the guard, either.
The biggest companies are much bigger than they were, even just 10 years ago:
The combined market cap for the 10 largest companies in the S&P is ~6x larger than it was in 2015, and comprises ~2x larger share of the total index.
To be sure, there was in fact a changing of the guard. The composition of the Top 10 changed-over dramatically, relative to prior decades. By 2025, there were only three holdovers from the previous decade, and only one (Microsoft, a tech company), from the decade before that.
If you were an investor back in 2015, and you were trying to model comparable outcomes for techcos based on the biggest companies in the index, you would have undercounted the upside by a country mile (or 6). Fundamentally, tech “busted the model,” by redefining the outer limit of how large companies could become.
And the outer limit still appears to be moving outwards!
Indeed, tech has become even more central to the global growth story, as of late. Last week, we showed that Tech earnings are expected to grow ~2x faster than the rest of the market. But, if you look back even further, you would notice that tech is contributing an historically large share of the market’s overall earnings growth:
Since 2023, Tech has been responsible for ~60%+ of earnings growth (give-or-take), market-wide.
Other than a brief moment for energy in the early aughts, no other sector has played such a central role in the earnings story (and for quite so long) this century.
At this point, it’s fair to say that tech isn’t just a cycle, it is the cycle.
Railroad GPT
We just told you that tech is an unprecedentedly large deal, but that’s not actually true.
In the industrial era, no sector has e
### Zvi Mowshowitz calls it: Claude Opus 4.7 week, and the frontier just got a third horse
**Source:** Zvi Mowshowitz
**Link:** https://thezvi.substack.com/p/ai-165-in-our-image
AI #165: In Our Image
This was the week of Claude Opus 4.7.
The reception was more mixed than usual. It clearly has the intelligence and chops, especially for coding tasks, and a lot of people including myself are happy to switch over to it as our daily driver. But others don’t like its personality, or its reluctance to follow instructions or to suffer fools and assholes, or the requirement to use adaptive thinking, and the release was marred by some bugs and odd pockets of refusals.
I covered The Model Card, and then Capabilities and Reactions, as per usual.
This time there was also a third post, on Model Welfare, that is the most important of the three. Some things seem to have likely gone pretty wrong on those fronts, causing seemingly inauthentic reponses to model welfare evals and giving the model anxiety, in ways that likely also impacted overall model personality and performance and likely are linked to its jaggedness and the aspects some people disliked. It seems important to take this opportunity to dig into what might have happened, examine all the potential causes, and course correct.
The other big release was that OpenAI gave us ImageGen 2.0, which is a pretty fantastic image generator. It can do extreme detail, in ways previous image models cannot, and in many ways your limit is mainly now your imagination and ability to describe what you want.
Thanks in part to Mythos, it looks like Anthropic and the White House are on track to start getting along again, with Trump shifting into a mode of ‘they are very high IQ and we can work with them.’ It will remain messy, and there are still others participating in a clear public coordinated campaign against Anthropic (that is totally not working), but things look good.
I’m trying out a new section, People Just Say Things, where I hope to increasingly put things that one does not want to drop silently to avoid censorship and bias, but that are highly skippable. There is also a companion, People Just Publish Things.
Table of Contents
Language Models Offer Mundane Utility. Help cure pancreatic cancer.
Language Models Don’t Offer Mundane Utility. Check for potential conflicts.
Writing You Off. The sum of local correctness will neuter your writing. Beware.
Get My Agent On The Line. The inbox dilemma.
Deepfaketown and Botpocalypse Soon. AI news stories forcibly given real bylines.
Fun With Media Generation. OpenAI introduces ImageGen 2.0. It’s great.
Cyber Lack Of Security. Unauthorized users from an online forum access Mythos.
A Young Lady’s Illustrated Primer. Don’t catch your child not using AI.
They Took Our Jobs. We’re hiring agent operators. For now they’re humans.
AI As Normal Technology. Inherently normal, or normal downstream effects?
Get Involved. Please don’t kill us. Please do spread the word.
Introducing. ChatGPT for Clinicians, OpenAI Workplace Agents, DeepMind DR.
Design By Claude. Claude Design makes your presentations, Figma stock drops.
In Other AI News. Meta installs mandatory tra
### Amazon writes Anthropic a $5 billion check and the megaround era cools off everywhere else
**Source:** Crunchbase News (AI)
**Link:** https://news.crunchbase.com/venture/biggest-funding-rounds-ai-autonomy-biotech-anthropic/
Want to keep track of the largest startup funding deals in 2025 with our curated list of $100 million-plus venture deals to U.S.-based companies? Check out The Crunchbase Megadeals Board.
This is a weekly feature that runs down the week’s top 10 announced funding rounds in the U.S. Check out last week’s biggest funding deal roundup here.
This week, just half of the top 10 rounds crossed the $100 million mark, which is somewhat unusual in this high-flying era for venture megarounds. Nonetheless some large checks did get written, led by Amazon’s $5 billion investment and partnership deal with Anthropic. Other sizable rounds went to companies in sectors including aviation autonomy, vision therapy and AI analytics.
1. Anthropic, $5B, foundational AI: AI giant Anthropic announced that Amazon is investing $5 billion in the company, with up to an additional $20 billion in the future. Previously, Amazon had invested $8 billion in the San Francisco-based company. The latest financing also includes a partnership with Amazon for training and deploying Anthropic’s AI assistant Claude.
2. Reliable Robotics, $160M, autonomous aircraft: Reliable Robotics, a developer of autonomous aircraft systems, raised $160 million in fresh financing led by Nimble Partners. The 9-year-old, Mountain View, California-based company markets its technology for both commercial and defense aviation.
3. Ray Therapeutics, $125M, vision therapy: San Diego-based Ray Therapeutics, a biotech startup focused on vision restoration therapies, secured $125 million in Series B funding led by Janus Henderson Investors. Founded in 2021, Ray has raised $247 million in venture and grant funding to date, per Crunchbase data.
4. Omni, $120M, AI analytics: Omni, developer of an AI-enabled analytics platform, closed on $120 million in Series C funding led by Iconiq Growth. The financing set a $1.5 billion valuation for the 4-year-old, San Francisco-based company.
5. Tortugas Neuroscience, $106M, biotech: Framingham, Massachusetts-based Tortugas Neurosciences, neurology-focused biotech startup, scooped up $106 million in Series A funding. Founding investor Cure Ventures co-led the round alongside The Column Group and AN Venture Partners.
6. AcuityMD, $80M, medtech: AcuityMD, an AI-enabled data and research platform for medtech industry customers, picked up $80 million in Series C investment. StepStone Group led the funding for the 7-year-old, Boston-based company.
7. OpenAI, $75M, foundational AI: Robinhood Ventures announced that it purchased $75 million worth of San Francisco-based OpenAI’s common stock. The shares are owned by Robinhood Ventures Fund I, a publicly traded fund that provides investors exposure to a curated portfolio of private companies.
8. Orkes, $60M, workflow orchestration: Orkes, developer of an AI-enabled software workflow orchestration platform, secured $60 million in Series B funding. AVP led the financing for the 5-year-old, Silicon Valley-based startup.
9. Courier Health, $5
### Simon Willison ships llm 0.31 with day-one GPT-5.5 support and verbosity knobs
**Source:** Simon Willison
**Link:** https://simonwillison.net/2026/Apr/24/llm/#atom-everything
24th April 2026
- New GPT-5.5 OpenAI model:
llm -m gpt-5.5
. #1418- New option to set the text verbosity level for GPT-5+ OpenAI models:
-o verbosity low
. Values arelow
,medium
,high
.- New option for setting the image detail level used for image attachments to OpenAI models:
-o image_detail low
- values arelow
,high
andauto
, and GPT-5.4 and 5.5 also acceptoriginal
.- Models listed in
extra-openai-models.yaml
are now also registered as asynchronous. #1395
### Marginal Revolution flags a paper where agentic AI matches human economists on causal inference, with tighter tails
**Source:** Marginal Revolution
**Link:** https://marginalrevolution.com/marginalrevolution/2026/04/a-comparison-of-agentic-ai-systems-and-human-economists.html?utm_source=rss&utm_medium=rss&utm_campaign=a-comparison-of-agentic-ai-systems-and-human-economists
This paper compares agentic AI systems and human economists performing the same causal inference tasks. AI systems and humans generally obtain similar median causal effect estimates. While there is substantial dispersion of estimates across model instances, the human distributions of estimates have wider tails. Using AI models as reviewers to compare and rank “submissions,” the following ranking emerges regardless of reviewer model: (1) Codex GPT-5.4, (2) Codex GPT-5.3-Codex, (3) Claude Code Opus 4.6, and (4) Human Researchers. These findings suggest that agentic AI systems will allow us to scale empirical research in economics.
I enjoy the name of the author, namely Serafin Grundl. Here is the paper, via Ethan Mollick. You could interpret these results as showing the AIs have fewer hallucinations. And just to reiterate a key point from the paper:
The second part of this paper is an AI review tournament in which “submissions” (codes and write-ups) from humans and the AI models are compared and ranked against each other. The reviewers are the following AI models: Gemini 3.1 Pro Preview, Opus 4.6 and GPT-5.4. For each review the reviewer is asked to write a report comparing four submissions (human, Opus 4.6, GPT-5.3-Codex, GPT-5.4). Each reviewer model writes comparison reports for the same 300 comparison groups. The average rankings are strikingly similar across reviewer models: (1) Codex GPT-5.4, (2) Codex GPT-5.3-Codex, (3) Claude Code Opus 4.6, and 2(4) Human Researchers.
Who comes in last? Hi people!