Stratechery Plus Update

  • 2025.20: Product Dreams and Marketplace Realities

    (Photo by Jesse Grant/Getty Images for Airbnb)

    Welcome back to This Week in Stratechery!

    As a reminder, each week, every Friday, we’re sending out this overview of content in the Stratechery bundle; highlighted links are free for everyone. Additionally, you have complete control over what we send to you. If you don’t want to receive This Week in Stratechery emails (there is no podcast), please uncheck the box in your delivery settings.

    On that note, here were a few of our favorites this week.

    1. Airbnb and the Limits of Founder Mode. Wednesday’s Daily Update was Stratechery at its very best. The focus was this week’s news that Airbnb is relaunching its app and expanding its purview to include a new Services offering and a revamped, curated approach to selling Experiences, all in the hopes of making the Airbnb app more than just a once or twice-a-year destination for users around the world. Responding to those ambitions, Ben synthesizes analysis of Airbnb’s past experiments, comments from Founder and CEO Brian Chesky, honesty about what the Airbnb experience entails today, and reflections on the gap between how Chesky wants his company to be seen and where market forces have taken it instead. The piece is filled with themes that apply more broadly across tech, and if you missed it this week, it’s worth your time over the weekend. Andrew Sharp

    2. Inside the NBA Lottery in Chicago. The NBA Playoffs continue apace, but this week also brought high drama for the dregs of the league, as the NBA held its annual Draft Lottery in Chicago. My co-host Ben Golliver was inside the drawing room for the Washington Post as the Dallas Mavericks(!), with 1.8% odds(!!), took home the number one pick. Greatest of All Talk recapped the whole evening earlier this week, including a look at how the NBA conducts this business every year, abject heartbreak for my Washington Wizards, and conspiracy theories that were ubiquitous and unavoidable after another team lost its once-in-a-generation superstar to Los Angeles and was rewarded with a chance at a franchise player (in this case, Duke’s Cooper Flagg). All of it is the height of absurdity, which is to say, it’s exactly the sort of event that makes for a great podcast. AS

    3. Do You Know What a Femtosecond Is? If you don’t, fear not, because 72 hours ago I was right there with you. Then I watched Jon Yu’s video explaining a) what a femtosecond is (spoiler: one quadrillionth of a second); b) what femtosecond lasers are; and c) the ways in which those lasers are used today. A field of study that began with measuring horses galloping and a machine called a phosphoroscope later spawned micro-chemistry, and today is dominated by lasers that are integral to everything from Lasik surgery to semiconductor manufacturing. This is a video that will make anyone smarter, but what I appreciated was the reminder to be amazed and delighted by the long (and continuing!) history of human curiosity and ingenuity. Check it out below, or, if you prefer it in podcast form, Stratechery subscribers can access the entire Asianometry catalog here. AS

    Stratechery Articles and Updates

    Dithering with Ben Thompson and Daring Fireball’s John Gruber

    Asianometry with Jon Yu

    Sharp China with Andrew Sharp and Sinocism’s Bill Bishop

    Greatest of All Talk with Andrew Sharp and WaPo’s Ben Golliver

    Sharp Tech with Andrew Sharp and Ben Thompson

    This week’s Stratechery video is on Apple and the Ghosts of Companies Past.


    Get notified about new Articles


  • Platform Power Is Underrated

    Listen to this post:

    I cannot accept your canon that we are to judge Pope and King unlike other men, with a favourable presumption that they did no wrong. If there is any presumption it is the other way against holders of power, increasing as the power increases. Historic responsibility has to make up for the want of legal responsibility. Power tends to corrupt and absolute power corrupts absolutely.
    Lord Acton, Letter to Bishop Creighton

    In 2015, Felix Salmon wrote about The Ingredients of a Great Newsletter, and used yours truly as an example. While that page no longer exists, the annotations Salmon made to my February 5, 2015 Update are still on Genius.

    The first thing you might notice is that while the Update was from 2015, I had the wrong year in my email; I guess the positive spin is that that was due to the duct-tape-and-wire nature of my publishing system back then, but it was an embarrassing enough error that I never did link to Salmon’s piece. The reason I mention it now, however, is that while Salmon had positive thing to say about my coverage of net neutrality and Microsoft’s then-new Outlook app, he was mostly bemused by my coverage of the App Store:

    Thompson is proud to have obsessions, and one of his geeky obsessions is the arcane set of rules surrounding apps in Apple’s app store. His conclusion is also a way of continuing a thread which runs through many past and future updates: as such it’s a way of rewarding loyal readers.

    Again, this was 2015, not 2014, but I had indeed been obsessed with the App Store from the beginning of Stratechery; one of my earliest set of Articles was a 2013 three-piece series asking Why Doesn’t Apple Enable Sustainable Businesses on the App Store? The big change since then is that the App Store long ago stopped being a geeky obsession on Stratechery, and instead became one of the biggest stories in tech, culminating in Apple being referred to prosecutors for potential criminal contempt of court charges.

    I am not writing this Article, however, to say “I told you so”; rather, what strikes me about my takes at the time, including the one that Salmon highlighted, is what I got wrong, and how much the nature of my errors bums me out.

    Apple Power

    The anticompetitive nature of Apple’s approach to the App Store revealed itself very early; John Gruber was writing about The App Store’s Exclusionary Policies just months after the App Store’s 2008 launch. The prompt was Apple’s decision to not approve an early podcasting app because “it duplicate[d] the functionality of the Podcast section of iTunes”; Gruber fretted:

    The App Store concept has trade-offs. There are pros and cons to this model versus the wide-open nature of Mac OS X. There are reasonable arguments to be made on both sides. But blatantly anti-competitive exclusion of apps that compete with Apple’s own? There is no trade-off here. No one benefits from such a policy, not even Apple. If this is truly Apple’s policy, it’s a disaster for the platform. And if it’s not Apple’s policy, then Podcaster’s exclusion is proof that the approval process is completely broken.

    Apple eventually started allowing podcast apps a year or so later (without any formal announcement), but the truth is that there wasn’t any evidence that Apple was facing any sort of disaster for the platform. Gruber himself recognized this reality two years later in an Article about Adobe’s unhappiness with the App Store:

    It’s folly to pretend there aren’t trade-offs involved — that for however much is lost, squashed by Apple’s control, that different things have not been gained. Apple’s control over the App Store gives it competitive advantages. Users have a system where they can install apps with zero worries about misconfiguration or somehow doing something wrong. That Adobe and other developers benefit least from this new scenario is not Apple’s concern. Apple first, users second, developers last — those are Apple’s priorities.

    Gruber has returned to this point about Apple’s priority stack regularly over the years, even as some of the company’s more egregious App Store policies seemed to benefit no one but Apple itself. Who benefits from needing to go to Amazon in the browser to buy Kindle books, or there being no “subscription” option in Netflix?1 Judge Yvonne Gonzalez Rogers argued this lack of user consideration extended to Apple’s anti-steering provision, which forbade developers from telling users about better offers on their websites, and linking to them; from Gonzalez Rogers’ original opinion:

    Looking at the combination of the challenged restrictions and Apple’s justifications, and lack thereof, the Court finds that common threads run through Apple’s practices which unreasonably restrains competition and harm consumers, namely the lack of information and transparency about policies which effect consumers’ ability to find cheaper prices, increased customer service, and options regarding their purchases. Apple employs these policies so that it can extract supracompetitive commissions from this highly lucrative gaming industry. While the evidence remains thin as to other developers, the conclusion can likely be extended.

    More specifically, by employing anti-steering provisions, consumers do not know what developers may be offering on their websites, including lower prices. Apple argues that consumers can provide emails to developers. However, there is no indication that consumers know that the developer does not already have the email or what the benefits are if the email was provided. For instance, Apple does not disclose that it serves as the sole source of communication for topics like refunds and other product-related issues and that direct registration through the web would also mean direct communication. Consumers do not know that if they subscribe to their favorite newspaper on the web, all the proceeds go to the newspaper, rather than the reduced amount by subscribing on the iOS device.

    While some consumers may want the benefits Apple offers (e.g., one-stop shopping, centralization of and easy access to all purchases, increased security due to centralized billing), Apple actively denies them the choice. These restrictions are also distinctly different from the brick-and-mortar situations. Apple created an innovative platform but it did not disclose its rules to the average consumer. Apple has used this lack of knowledge to exploit its position. Thus, loosening the restrictions will increase competition as it will force Apple to compete on the benefits of its centralized model or it will have to change its monetization model in a way that is actually tied to the value of its intellectual property.

    This all seems plausible, and, thanks to Judge Gonzalez Rogers’ latest ruling, is set to be tested in a major way: apps like Spotify have already been updated to inform users about offers on their websites, complete with external links. Moreover, those links don’t have to follow Apple’s proposed link entitlement rules, which means they can be tokenized to the app user, facilitating a fairly seamless checkout experience, without the need to login separately.

    Still, there are strong arguments to be made that many apps may be disappointed in their web purchase experience; Apple’s in-app purchase flow is so seamless and integrated — and critically, linked to an up-to-date payment method — that it will almost certainly convert better than web-based flows. At the same time, a 30% margin difference is a strong incentive to close that gap; the 15% margin difference for subscriptions is smaller, but at the same time, the payoffs from a web-based subscription — no Apple tax for the entire lifetime of the user — are so significant that the incentives might even be stronger.

    Those incentives are likely to accrue to users: Spotify could, for example, experiment with offering some number of months free, or lower prices for a year, or just straight up lower prices overall; this is in addition to the ability to offer obvious products that have previously been impossible, like individual e-books. This is good for users, and it’s good for Spotify.

    What is notable — and what I got wrong all those years ago — is the extent to which this is an unequivocally bad thing for Apple. They will, most obviously, earn less App Store revenue than they might have otherwise; while not every purchase on the web is one not made in the App Store — see previously impossible products, like individual e-books — the vast majority of web-based revenue earned by app makers will be a direct substitute for revenue Apple previously took a 15–30% cut of. Apple could, of course, lower their take rate, but that makes the point!

    At the same time, I highly doubt that web-based purchases will lead to any increase in Apple selling more iPhones (they might, however, sell more advertising). This is the inverse of Gruber’s long-ago concern about Apple’s policies being “a disaster for the platform”, or my insistence that the company’s policies were “unsustainable”. In fact, they were quite sustainable, and extremely profitable.

    The Chicken-and-Egg Problem

    Before I started Stratechery, I worked at Microsoft recruiting developers for the Windows App Store; the discussion then was about the “chicken-and-egg problem” of building out a new platform: to get users you needed apps, but to get developers to build those apps you needed users that they wished to reach. Microsoft tried to cold-start this conundrum on the only side where they had a hope of exerting influence, which was developers: this meant lots of incentives for app makers, up-to-and-including straight up paying them to build for the platform.

    This made no difference at all: most developers said no, cognizant that the true cost of building an app for a new platform was the ongoing maintenance of said app for a limited number of people, and those that said yes put forth minimal effort. Even if they had built the world’s greatest apps, however, I don’t think it would have mattered.

    The reality is that platforms are not chicken-and-egg problems: it is very clear what comes first, and that is users. Once there are users there is demand for applications, and that is the only thing that incentivizes developers to build. Moreover, that incentive is so strong that it really doesn’t matter how many obstacles need to be overcome to reach those users: that is why Apple’s longstanding App Store policies, egregious though they may have been, ultimately did nothing to prevent the iPhone from having a full complement of apps, and, by extension, did nothing to diminish the attractiveness of the iPhone to end users.

    Indeed, you could imagine a counterfactual where another judge in another universe decided that Apple should actually lock down the App Store even further, and charge an even higher commission: I actually think that this would have no meaningful difference on the perceived number of apps or on overall iPhone sales. Sure, developers would suffer, and some number of apps would prove to be unviable or, particularly in the case of apps that depend on advertising for downloads, less successful given their decreased ROAS, but the market is so large and liquid that the overall user experience and perceived value of apps would be largely the same.

    This stark reality does, perhaps surprisingly, give me some amount of sympathy for Apple’s App Store intransigence. The fact of the matter is that everyone demanding more leniency in the App Store, whether that be in terms of commission rates or steering provisions or anything else, are appealing to nothing more than Apple’s potential generosity. The company’s self interest — and fiduciary duty to shareholders — has been 100% on the side of keeping the App Store locked down and commissions high.

    Products → Platforms

    That, by extension, is what bums me out about this entire affair. I would prefer to live in the tech world I and so many others mythologize, where platforms enable developers to make new apps, with those apps driving value to the underlying platform and making it even more attractive and profitable. That is what happened with the PC, and the creation of applications like VisiCalc and Photoshop.

    That was also a much smaller market than today. VisiCalc came out in 1979, when 40,000–50,000 computers were sold; Photoshop launched on the Mac in 1990, with an addressable market of around a million Macs. The Vision Pro, meanwhile, is a flop for having sold only 500,000 units in 2024, nowhere near enough to attract a killer app, even if Apple’s App Store policies were not a hindrance.

    None other than Meta CEO Mark Zuckerberg seems to recognize this new reality; one of my longest running critiques of Zuckerberg has been his continual obsession with building a platform a la Bill Gates and Windows, but as he told me last week, that’s not necessarily the primary goal now, even for Quest devices:

    If you continue to deliver on [value] long term, is it still okay if that long term doesn’t include a platform, if you’re just an app?

    MZ: It depends on what you’re saying. I think early on, I really looked up to Microsoft and I think that that shaped my thinking that, “Okay, building a developer platform is really cool”.

    It is cool.

    MZ: Yeah, but it’s not really the kind of company fundamentally that we have been historically. At this point, I actually see the tension between being primarily a consumer company and primarily a developer company, so I’m less focused on that at this point.

    Now, obviously we do have developer surfaces in terms of all the stuff in Reality Labs, our developer platforms. We need to empower developers to build the content to make the devices good. The Llama stuff, we obviously want to empower people to use that and get as much of the world on open source as possible because that has this virtuous flywheel of effects that make it so that the more developers that are using Llama, the more Nvidia optimizes for Llama, the more that makes all our stuff better and drives costs down, because people are just designing stuff to work well with our systems and making their efficiency improvements to that. So, that’s all good.

    But I guess the thing that I really care about at this point is just building the best stuff and the way to do that, I think, is by doing more vertical integration. When I think about why do I want to build glasses in the future, it’s not primarily to have a developer platform, it’s because I think that this is going to be the hardware platform that delivers the best ability to create this feeling of presence and the ultimate sense of technology delivering a social connection and I think glasses are going to be the best form factor for delivering AI because with glasses, you can let your AI assistant see what you see and hear what you hear and talk in your ear throughout the day, you can whisper to it or whatever. It’s just hard to imagine a better form factor for something that you want to be a personal AI that kind of has all the context about your life.

    The great irony of Zuckerberg’s evolution — which he has been resisting for over a decade — is that this actually makes it more likely he will get a platform in the end. It seems clear in retrospect that DOS/Windows was the exception, not the rule; platforms, at least when it comes to the consumer space, are symptoms of products that move the needle. The only way to be a platform company is to be a product company first, and acquire the users that incentivize developers.

    Takings and the Public Interest

    Notice, however, the implication of this reality: an honest accounting of modern platforms, including iOS, is not simply that Apple, or whoever the platform provider is, owns intellectual property for which they have a right to be compensated; as I noted on Friday Apple has a viable appeal predicated on arguing Judge Gonzalez Rogers is “taking” their IP without compensation. What they actually own that is of the most value is user demand itself; to put it another way, Apple could charge a high commission and have stringent rules because developers wanted to be on their platform regardless. Demand draws supply, no matter the barriers.

    This, in the end, is the oddity of this case, and the true “takings” violation: what is actually being taken from Apple is simply money. I don’t think anything is going to change about iPhone sales or app maker motivation; the former will simply be less profitable, and the latter more so. Small wonder Apple has fought to keep its position so strenuously!

    This also leaves me more conflicted about Judge Gonzalez Rogers’ decision than I expected: I don’t like depriving Apple of their earned rewards, or diminishing the incentive to pursue this most difficult of goals — building a viable platform — in any way. Yes, Apple has made tons of money on the App Store, but the iPhone and associated ecosystem is of tremendous value.

    At the same time, everything is a trade-off, and the fact that products that produce demand are the key to creating platforms significantly increases the public interest in regulating platform policies. I don’t think it is to society’s benefit to effectively delegate all innovation to platform providers, given how few there inevitably are; what needs incentivizing is experimentation on top of platforms, and that means recognizing that modern platform providers are in fact incentivized to tax that out of existence.

    In fact you could, from a business perspective, make the case that Microsoft’s biggest mistake with Windows, at least from a shareholder perspective, was actually not harvesting as much value as they should have: two-sided network effects are so powerful that, once established, you can skim off as much money as you want with no ill effects. Apple certainly showed that was the case with the iPhone, and Google followed them; Meta has similar policies for Quest.

    To that end, Congress should be prepared to act if Judge Gonzalez Rogers’ order is overturned on appeal; in fact, they should act anyways. My proposed law is clear and succinct:

    • A platform is a product with an API that runs 3rd-party applications.
    • A platform has 25 million+ U.S. users.
    • 3rd-party applications should have the right, but not the compulsion, to (1) conduct commerce as they choose and (2) publish speech as they choose.

    That’s it! If you want the benefit of 3rd-party applications (which are real — there’s an app for that!) then you have to offer fundamental economic and political freedom. This is in the American interest for the exact same reason that this wouldn’t kill the incentive to build the sort of product that leads to a platforms in the first place: platforms are so powerful that everyone in tech has, for decades, been both obsessed with them even as they underrated them.


    1. You can subscribe to Netflix using the App Store by downloading one of Netflix’s games. 


    Get notified about new Articles


  • Apple and the Ghosts of Companies Past

    Listen to this post:

    Apple is not doomed, although things were feeling pretty shaky a couple of weeks ago, when the so-called “Liberation Day” tariffs were poised to make the company’s manufacturing model massively more expensive; the Trump administration granted Apple a temporary reprieve, and, for the next couple of months, things are business as usual.

    Of course that’s not Apple’s only problem: a month ago the company had to admit that it couldn’t deliver on the AI promises it made at last year’s WWDC, leading John Gruber to declare that Something Is Rotten in the State of Cupertino. Still, just because Apple can’t ship a Siri that works is not necessarily cause for short-term concern: one of the Siri features Apple did ship was its ChatGPT integration, and you can run all of the best models as apps on your iPhone.

    So no, Apple is not doomed, at least not for now. There is, however, real cause for concern: just as tech success is built years in advance, so is failure, and there are three historical examples of once-great companies losing the future that Apple and its board ought to consider carefully.

    Microsoft and the Internet

    I bet you think you already know the point I’m going to make here: Microsoft and the Internet is like Apple and AI. And you would be right! What may surprise you, however, is that I think this is actually good news for Apple, at least in part.

    The starting point for the Internet is considered to be either 1991, when Tim Berners-Lee created the World Wide Web, or 1993 when Mosaic released the first consumer-accessible browser. In other words, Bill Gates’ famous memo about The Internet Tidal Wave was either two or four years late. This is from his opening:

    Developments on the Internet over the next several years will set the course of our industry for a long time to come. Perhaps you have already seen memos from me or others here about the importance of the Internet. I have gone through several stages of increasing my views of its importance. Now I assign the Internet the highest level of importance. In this memo I want to make clear that our focus on the Internet is crucial to every part of our business. The Internet is the most important single development to come along since the IBM PC was introduced in 1981. It is even more important than the arrival of the graphical user interface (GUI). The PC analogy is apt for many reasons. The PC wasn’t perfect. Aspects of the PC were arbitrary or even poor. However a phenomena grew up around the IBM PC that made it a key element of everything that would happen for the next 15 years. Companies that tried to fight the PC standard often had good reasons for doing so but they failed because the phenomena overcame any weaknesses that resisters identified.

    It’s unfair to call this memo “late”: it’s actually quite prescient, and Microsoft pivoted hard into the Internet — so hard that just a few years later they faced a DOJ lawsuit that was primarily centered around Internet Explorer. In fact, you could make a counterintuitive argument that Microsoft actually suffered from Gates’ prescience; this was what he wrote about Netscape:

    A new competitor “born” on the Internet is Netscape. Their browser is dominant, with 70% usage share, allowing them to determine which network extensions will catch on. They are pursuing a multi-platform strategy where they move the key API into the client to commoditize the underlying operating system. They have attracted a number of public network operators to use their platform to offer information and directory services. We have to match and beat their offerings including working with MCI, newspapers, and other who are considering their products.

    Microsoft beat Netscape, but to what end? The client was in fact commoditized — the Internet Explorer team actually introduced the API that made web apps possible — but that was OK for business because everyone used Windows already.

    What actually mattered was openness, in two regards: first, because the web was open, Microsoft ultimately could not contain it to just its platform. Second, because Windows was open, it didn’t matter: Netscape, to take the most pertinent example, was a Windows app; so was Firefox, which dethroned Internet Explorer after Microsoft lost interest, and so is Chrome, which dominates the web today.

    That’s not to say that the Internet didn’t matter to Microsoft’s long-term prospects, because it was a bridge to the paradigm that Microsoft actually fumbled, which was mobile. Last fall I wrote The Gen AI Bridge to the Future, where I made the argument that paradigm shifts in hardware were enabled by first building “bridges” at the application layer. Here is the section on Windows and the Internet:

    PCs underwent their own transformation over their two decades of dominance, first in terms of speed and then in form factor, with the rise of laptops. The key innovation at the application layer, however, was the Internet:

    A drawing of The Shift to the Internet

    The Internet differed from traditional applications by virtue of being available on every PC, facilitating communication between PCs, and by being agnostic to the actual device it was accessed on. This, in turn, provided the bridge to the next device paradigm, the smartphone, with its touch interface:

    A drawing of The Internet Bridge to Smartphones

    I’ve long noted that Microsoft did not miss mobile; their error was in trying to extend the PC paradigm to mobile. This not only led to a focus on the wrong interface (WIMP via stylus and built-in keyboard), but also an assumption that the application layer, which Windows dominated, would be a key differentiator.

    Apple, famously, figured out the right interface for the smartphone, and built an entirely new operating system around touch. Yes, iOS is based on macOS at a low level, but it was a completely new operating system in a way that Windows Mobile was not; at the same time, because iOS was based on macOS, it was far more capable than smartphone-only alternatives like BlackBerry OS or PalmOS. The key aspect of this capability was that the iPhone could access the real Internet… that was the key factor in reinventing the phone, because it was the bridge that linked a device in your pocket to the world of computing writ large.

    To reiterate Microsoft’s failure, the company attempted to win in mobile by extending the Windows interface and applications to smartphones; what the company should have done is “pursu[e] a multi-platform strategy where they move the key API into the client to commoditize the underlying operating system.” In other words, Microsoft should have embraced and leveraged the Netscape threat, instead of trying to neutralize it.


    Apple and the iPhone is analogous to Microsoft and Windows, for better and for worse: the better part is that there are many more smartphones sold than PCs, which means that Apple, even though it controls less than half the market, has more iOS devices than there are Windows devices. That’s the “for worse” part, however: Apple exerts more control on iOS than Microsoft ever did on Windows, but also doesn’t have a monopoly like Microsoft did.

    The most obvious consequence of smartphones being a duopoly is that Apple can’t unilaterally control the entire industry’s layer like Microsoft wanted to. However, you can look at this in a different way: Microsoft couldn’t have dared to exert Apple-like control of Windows because it was a monopoly; the Windows API was, as I noted above, an open one, and that meant that the Internet largely happened on Windows PCs.

    Consider this in the context of AI: the iPhone does have AI apps from everyone, including ChatGPT, Claude, Gemini, DeepSeek, etc. The system-wide assistant interface, however, is not open: you’re stuck with Siri. Imagine how much more attractive the iPhone would be as an AI device if it were a truly open platform: the fact that Siri stinks wouldn’t matter, because everyone would be running someone else’s model.

    Where this might matter more is the next device paradigm: the point of The Gen AI Bridge to the Future is in the title:

    We already established above that the next paradigm is wearables. Wearables today, however, are very much in the pre-iPhone era. On one hand you have standalone platforms like Oculus, with its own operating system, app store, etc.; the best analogy is a video game console, which is technically a computer, but is not commonly thought of as such given its singular purpose. On the other hand, you have devices like smart watches, AirPods, and smart glasses, which are extensions of the phone; the analogy here is the iPod, which provided great functionality but was not a general computing device.

    Now Apple might dispute this characterization in terms of the Vision Pro specifically, which not only has a PC-class M2 chip, along with its own visionOS operating system and apps, but can also run iPad apps. In truth, though, this makes the Vision Pro akin to Microsoft Mobile: yes, it is a capable device, but it is stuck in the wrong paradigm, i.e. the previous one that Apple dominated. Or, to put it another way, I don’t view “apps” as the bridge between mobile and wearables; apps are just the way we access the Internet on mobile, and the Internet was the old bridge, not the new one.

    The new bridge is a user interface that gives you exactly what you need when you need it, and disappears otherwise; it is based on AI, not apps. The danger for Apple is that trying to keep AI in a box in its current paradigm will one day be seen like Microsoft trying to keep the Internet locked to its devices: fruitless to start, and fatal in the end.

    Intel and the Foundry Model

    Intel was the other company that dominated the PC era: while AMD existed, they were more of an annoyance than an actual threat (thanks in part to Intel’s own anticompetitive behavior). And, like Microsoft, Intel also missed mobile, for somewhat similar reasons: they were over-indexed on the lessons of the PC.

    Back in the 1980s and 1990s, when PCs were appearing on every desk and in every home, the big limitation was performance; Intel, accordingly, was focused on exactly that: every generation of Intel chips was massively faster than the previous one, and the company delivered so regularly that developers learned to build for the future, and not waste time optimizing for the soon-to-be-obsolete present.

    Mobile, however, meant battery power, and Intel just wasn’t that concerned about efficiency; while the popular myth is that Intel turned Apple down when it came to building chips for the iPhone, Tony Fadell told me in a Stratechery Interview that they were never under consideration:

    The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life…when you take that microscopic view of what you’re building, you look at the world very differently.

    For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down…”We’re just going to have Moore’s Law take over” and so in a way that locks you into a path and that’s why Intel, not under the Pat days but previous to the Pat days, was all driven by manufacturing capability and legal. It wasn’t driven by architectural decisions.

    Missing mobile was a big problem for Intel’s integrated device manufacturing model: the company, in the long run, would not have the volume and the associated financial support of mobile customers to keep up with TSMC. Today the company is struggling to turn itself into a foundry — a company that manufactures chips for external customers — and would like nothing more than to receive a contract from the likes of Apple, not for an Intel chip, but for an ARM-based one.

    What is notable about this example, however, is how long it took to play out. One of my first Articles on Stratechery was 2013’s The Intel Opportunity, where I urged the company to get into the foundry business, a full six years after the iPhone came out; I thought I was late. In fact, Intel’s stock nearly reached its dot-com era highs in 2020, after steady growth in the seven years following that Article:

    Intel's stock grew in the decade where the company's fate was sealed

    The reason for that growth was, paradoxically enough, mobile: the rise of smartphones was mirrored by the rise of cloud computing, for which Intel made the processors. Better yet, those Xeon processors were much more expensive than PC processors (much less mobile ones), which meant margins kept growing; investors didn’t seem to care that Intel’s decline — so apparent today — was already locked in.


    While Microsoft and the Internet is more directly analogous to Apple and AI, it’s the collective blindness of Intel shareholders and management to the company’s long-term risks that offers a lesson for the iPhone maker. To summarize the Intel timeline:

    • Intel missed mobile because it was focused on the wrong thing (performance over efficiency).
    • Intel failed to leverage its greatest strength (manufacturing) into an alternative position in mobile (being a foundry).
    • Intel’s manufacturing fell behind the industry’s collective champion (TSMC), which raised challenges to Intel’s core business (AMD server chips are now better than Intel’s).

    Now, a decade-and-a-half after that first mistake, Intel is on the ropes, despite all of the money it made and stock market increases it enjoyed in the meantime.

    If a similar story unfolds for Apple, it might look like this:

    • Apple misses AI because it’s focused on the wrong thing (privacy).
    • Apple fails to leverage its greatest strength (the iPhone platform) into an alternative position in AI (being the platform for the best model makers).
    • Apple’s platform falls behind the industry’s collective champion (Android or perhaps TBD), which raises challenges to Apple’s core business (AI is so important that the iPhone has a worse user experience).

    The questions about Apple’s privacy focus being a hindrance in AI are longstanding ones; I raised them in this 2015 Update when I noted that the company’s increasingly strident stance on data collection ran the risk of diminishing product quality as machine learning rose in importance.

    In fact, those fears turned out to be overblown for a good long while; many would argue that Apple’s stance (strategy credit or not) was a big selling point. I think it’s fair to wonder, however, if those concerns were not wrong but simply early:

    • An Apple completely unconcerned with privacy would have access to a vast trove of exclusive user data on which to train models.
    • An Apple that refused to use user data for training could nonetheless deliver a superior experience by building out its AI as a fully scaled cloud service, instead of the current attempt to use on-device processing and a custom-built private cloud compute infrastructure that, by necessity, has to rely on less capable models and worse performance.
    • An Apple that embraced third party model providers could, as noted above, open up its operating systems so that users could replace Siri with the model of their choice.

    Apple’s absolutist and paternalistic approach to privacy have taken all of these options off the table, leaving the company to provide platform-level AI functionality on its own with a hand tied behind its back, and to date the company has not been able to deliver; given how different AI is than building hardware or operating systems, it’s fair to wonder if they ever will.

    And, critically, this won’t matter for a long time: Apple’s AI failures will not impact iPhone sales for years, and most AI use cases will happen in apps that run on the iPhone. What won’t happen, however, is the development of the sort of platform capabilities that will build that bridge to the future.

    This, in the end, was Intel’s ultimate failing: today there is massive demand for foundry capacity, but not for mobile; what the world wants is more AI chips, particularly from a company (Nvidia) which has regularly been willing to dual source its supply. Intel, though, has yet to meet the call; the cost of the company not opening itself up after its mobile miss is that it wasn’t prepared for the next opportunity that came along.

    Apple and China

    This last analogy is, I admit, the shakiest, but perhaps the most important: it’s Apple itself. From the New York Times:

    In 1983, Mr. Jobs oversaw the construction of a state-of-the-art plant where the new Macintosh computer would be built. Reporters who toured it early on were told that the plant, located just across San Francisco Bay from Apple’s headquarters, was so advanced that factory labor would account for 2 percent of the cost of making a Macintosh. Ultimately, the Macintosh factory closed in 1992, in part because it never realized the production volume that Mr. Jobs had envisioned — such sales numbers for the Mac would only come later…

    That failure taught Mr. Jobs the lesson. He returned to Apple in 1997, and the next year, he hired Tim Cook as Apple’s senior vice president for worldwide operations. Mr. Cook had mastered the art of global manufacturing supply chains, first in IBM’s personal computer business and then at Compaq Computer.

    It was admirable that Jobs wanted to build in America, but realistically the company needed to follow the rest of the tech industry to Asia if it wanted to survive, much less thrive, and Cook, just as much as Jobs, both saved the company and set it on the course for astronomical growth.

    The challenge today is that that growth has been mirrored by China itself, and the current administration is determined to decouple the U.S. from China; that potentially increases Apple’s most existential threat, which is a war over Taiwan. This is a very different problem than what has long concerned Cook; from a 2008 profile in Fortune:

    Almost from the time he showed up at Apple, Cook knew he had to pull the company out of manufacturing. He closed factories and warehouses around the world and instead established relationships with contract manufacturers. As a result, Apple’s inventory, measured by the amount of time it sat on the company’s balance sheet, quickly fell from months to days. Inventory, Cook has said, is “fundamentally evil,” and he has been known to observe that it declines in value by 1% to 2% a week in normal times, faster in tough times like the present. “You kind of want to manage it like you’re in the dairy business,” he has said. “If it gets past its freshness date, you have a problem.” This logistical discipline has given Apple inventory management comparable with Dell’s, then as now the gold standard for computer-manufacturing efficiency.

    There are things worse than dairy going bad: it’s cows being blown up. Evil? Absolutely. Possible? Much more so today than at any other point in Cook’s tenure.

    This, then, is the analogy: the Apple that Cook arrived at in 1998 was at existential risk from its supply chain; so is Apple today. Everything else is different, including the likelihood of disaster; Apple’s China risk may be elevated, whereas Apple’s bankruptcy in the 1990’s seemed a matter of when, not if:

    At the same time, that also means that Apple has cash flow, and power; what is necessary now is not making obvious choices out of necessity, but making uncertain ones out of prudence. Cook built the Apple machine in China; the challenge now will be in dismantling it.

    The Cook Question

    Cook is the common variable across all of these analogies:

    • Cook has led the company as it has continually closed down iOS, controlling developers through the stick of market size instead of the carrot of platform opportunity.
    • Cook has similarly been at the forefront of Apple’s absolutist approach to privacy, which has only increased in intensity and impact, not just on 3rd parties but also on Apple itself.
    • Cook, as I just documented, built Apple’s dependency on China, and has adroitly managed the politics of that reality, both with China and the U.S.

    All of these decisions — even the ones I have most consistently disagreed with — were defensible and, in some cases, essential to Apple’s success; Cook has been a very effective CEO for Apple and its shareholders. And, should he stay on for several more years, the company would probably seem fine (assuming nothing existential happens with China and Taiwan), particularly in terms of the stock price.

    Tech fortunes, however, are cast years in advance; Apple is not doomed, but it is, for the first time in a long time, fair to wonder about the long-term: the questions I have about the company are not about 2025, but 2035, and the decisions that will answer those questions will be made now. I certainly have my point of view:

    • Apple should execute an AI Platform Pivot, enabling developers to build with AI instead of trying to do everything itself; more broadly, it should increase the opportunities for developers economically and technically.
    • Apple should not abandon its privacy brand, but rather accept the reality that all of computing is ultimately about trust: the device will always have root. To that end, users do trust Apple, not because Apple is so strident about user data that they make their products worse, but because the company’s business model is aligned with users, with a multi-decade track record of doing right by them; in this case, doing right by users means doing what is necessary to have an actually useful AI offering.
    • Whereas I once thought it was reasonable for Apple to maintain its position in China — the costs of hedging would be so large that it would be better to take the minuscule risk of war, which Apple itself minimized through its position in China — that position no longer seems feasible; at a minimum Apple needs to rapidly accelerate its diversification efforts. This doesn’t just mean building up final assembly in places like India and Brazil, but also reversing its long-running attempts to undercut non-Chinese suppliers with Chinese alternatives.

    All of these run counter to the decisions Cook has made over the last three decades, but again, it’s not that Cook was wrong at the time he made them; rather, times change, and Apple needs to change before the time comes where the necessity for change is obvious, because that means the right time for that change has already passed.



    Get notified about new Articles


  • American Disruption

    Listen to this post:

    I have a few informal guidelines that govern my writing on Stratechery, including “Don’t post more than one front-page Article a week”, “Don’t talk about my writing process”, and “Don’t start Articles with ‘I’”; it’s an extraordinary week, though, so I’m breaking a few rules.

    There are three old Stratechery Articles that, after reflection, missed the mark in different ways.

    • The most proximate cause for my rule-breaking was Monday’s Trade, Tariffs, and Tech; I stand by everything I wrote, but it was incomplete, lacking an overall framework and satisfying conclusion. That’s not surprising given the current uncertainty, but that means I should have waited to publish a front-page Article until I had more clarity (I do much more musing in my Updates, which that Article is now categorized as). Now I have to break my rule and write another Article.
    • The second Article to revisit is November’s A Chance to Build. This Article was in fact deeply pessimistic about President Trump’s promised trade regime, particularly in terms of what it meant for tech; the title and conclusion, however, tried to find some positives. Clearly that was a mistake; that Article was predictive of what was happening, but I obscured the prediction.
    • The third Article to revisit is January 2021’s Internet 3.0 and the Beginning of (Tech) History. This Article was right about tech exiting an economically-defined era — the Aggregation era — and entering a new politically-defined era. It was, however, four years too early, and misdiagnosed the reason for the transition. The driver is not foreign countries closing their doors to America; it’s America closing its door to the world.

    The proximate cause of all of this reflection is of course Trump’s disastrous “liberation day” tariffs. The secondary cause is what I wrote about Monday: the U.S. has a genuine problem on its hands thanks to its inability to make things pertinent to modern warfare and high tech. The root cause, however, is very much in Stratechery’s wheelhouse, and worthy of another Article: it’s disruption.

    The Disruption of American Manufacturing

    The late Professor Clayton Christensen — one of my personal heroes and inspirations — coined the term disruption in a seminal paper called Disruptive Technologies: Catching the Wave, which he expanded to book-length in The Innovator’s Dilemma. However, Christensen’s most concise summary comes from this 20-year retrospective in Harvard Business Review:

    “Disruption” describes a process whereby a smaller company with fewer resources is able to successfully challenge established incumbent businesses. Specifically, as incumbents focus on improving their products and services for their most demanding (and usually most profitable) customers, they exceed the needs of some segments and ignore the needs of others. Entrants that prove disruptive begin by successfully targeting those overlooked segments, gaining a foothold by delivering more-suitable functionality—frequently at a lower price. Incumbents, chasing higher profitability in more-demanding segments, tend not to respond vigorously. Entrants then move upmarket, delivering the performance that incumbents’ mainstream customers require, while preserving the advantages that drove their early success. When mainstream customers start adopting the entrants’ offerings in volume, disruption has occurred.

    This is almost a perfect summary of what has happened in manufacturing, and, as I noted in that November article, it started with chips:

    That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.

    In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.

    Notice what did still happen in the United States, at least back then: actual chip fabrication. That was where innovation happened, and where margins were captured, so of course U.S. chip companies kept that for themselves. It was the tedious and labor-intensive assembly and testing that was available to poor Asian economies led by authoritarian governments eager to provide some sort of alternative to communism.

    One important point about new market disruption — which Asian manufacturing was — is that it is downstream of a technological change that fundamentally changes cost structures. In the case of the Asian manufacturing market, there were actually three; from 2016’s The Brexit Possibility:

    In the years leading up to the 1970s, three technological advances completely transformed the meaning of globalization:

    • In 1963 Boeing produced the 707-320B, the first jet airliner capable of non-stop service from the continental United States to Asia; in 1970 the 747 made this routine.
    • In 1964 the first transpacific telephone cable between the United States and Japan was completed; over the next several years it would be extended throughout Asia.
    • In 1968 ISO 668 standardized shipping containers, dramatically increasing the efficiency with which goods could be shipped over the ocean in particular.

    These three factors in combination, for the first time, enabled a new kind of trade. Instead of manufacturing products in the United States (or Europe or Japan or anywhere else) and trading them to other countries, multinational corporations could invert themselves: design products in their home markets, then communicate those designs to factories in other countries, and ship finished products back to their domestic market. And, thanks to the dramatically lower wages in Asia (supercharged by China’s opening in 1978), it was immensely profitable to do just that.

    Christensen, somewhat confusingly, actually has two theories of disruption; the other one is called “low-end disruption”, but it is also pertinent to this story. From The Innovator’s Solution:

    The pressure of competing along this new trajectory of improvement [(speed, convenience, and customization)] forces a gradual evolution in product architecture, as depicted in Figure 5-1 — away from the interdependent, proprietary architectures that had the advantage in the not-good-enough era toward modular designs in the era of performance surplus. Modular architectures help companies to compete on the dimensions that matter in the lower-right portions of the disruption diagram. Companies can introduce new products faster because they can upgrade individual subsystems without having to redesign everything. Although standard interfaces invariably force compromise in system performance, firms have the slack to trade away some performance with these customers because functionality is more than good enough.

    Modularity has a profound impact on industry structure because it enables independent, nonintegrated organizations to sell, buy, and assemble components and subsystems. Whereas in the interdependent world you had to make all of the key elements of the system in order to make any of them, in a modular world you can prosper by outsourcing or by supplying just one element. Ultimately, the specifications for modular interfaces will coalesce as industry standards. When that happens, companies can mix and match components from best-of-breed suppliers in order to respond conveniently to the specific needs of individual customers. As depicted in Figure 5-1, these nonintegrated competitors disrupt the integrated leader.

    This is exactly what happened to categories like PCs: everything became modular, commoditized, and low margin — and thus followed chip test and assembly to Asia. One aspect that was under-discussed in Christensen’s theory, however, was scale, which mattered more than the customization point. It was less important that a customer be able to use any chip they wanted than it was that a lot of customers wanted to use the same chip. Moreover, this scale point applied up-and-down the stack, to both components and assemblers.

    Note also the importance of scale to the new market disruption above: while outsourcing got easier thanks to technology, it’s difficult to be easier than working locally; the best way to overcome those coordination costs is to operate at scale. This helps explain why manufacturing in Asia is fundamentally different than the manufacturing we remember in the United States decades ago: instead of firms with product-specific factories, China has flexible factories that accommodate all kinds of orders, delivering on that vector of speed, convenience, and customization that Christensen talked about.

    This scale has, as I noted last November, been particularly valuable for tech companies; software scales to the world, and Asian factories, particularly Chinese ones, scale with it, providing the hardware complements to American software. That is why every single tech company — even software ones — is damaged by these tariffs; more expensive complements means lower usage overall.

    The other scale point that is particularly pertinent to technology is chips. Every decrease in node size comes at increasingly astronomical costs; the best way to afford those costs is to have one entity making chips for everyone, and that has turned out to be TSMC. Indeed, one way to understand Intel’s struggles is that it was actually one of the last massive integrated manufacturers: Intel made chips almost entirely for itself. However, once the company missed mobile, it had no choice but to switch to a foundry model; the company is trying now, but really should have started fifteen years ago. Now the company is stuck, and I think they will need government help.

    iPhone Jobs

    There is one other very important takeaway from disruption: companies that go up-market find it impossible to go back down, and I think this too applies to countries. Start with the theory: Christensen had a chapter in The Innovator’s Dilemma entitled “What Goes Up, Can’t Go Down”:

    Three factors — the promise of upmarket margins, the simultaneous upmarket movement of many of a company’s customers, and the difficulty of cutting costs to move downmarket profitably — together create powerful barriers to downward mobility. In the internal debates about resource allocation for new product development, therefore, proposals to pursue disruptive technologies generally lose out to proposals to move upmarket. In fact, cultivating a systematic approach to weeding out new product development initiatives that would likely lower profits is one of the most important achievements of any well-managed company.

    Now consider this in the context of the United States: every single job in this country, even at the obsolete federal minimum wage of $7.25/hour, makes much more money than an iPhone factory line worker. And, critically, we have basically full employment; that is what makes this statement from White House Press Secretary Karoline Leavitt ridiculous; from 9to5Mac:

    In response to a question from Maggie Haberman of The New York Times about the types of jobs Trump hopes to create in the U.S. with these tariffs, Leavitt said:

    “The president wants to increase manufacturing jobs here in the United States of America, but he’s also looking at advanced technologies. He’s also looking at AI and emerging fields that are growing around the world that the United States needs to be a leader in as well. There’s an array of diverse jobs. More traditional manufacturing jobs, and also jobs in advanced technologies. The president is looking at all of those. He wants them to come back home.”

    Haberman followed up with a question about iPhone manufacturing specifically, asking whether Trump thinks this is “the kind of technology” that could move to the United States. Leavitt responded:

    “[Trump] believes we have the labor, we have the workforce, we have the resources to do it. As you know, Apple has invested $500 billion here in the United States. So, if Apple didn’t think the United States could do it, they probably wouldn’t have put up that big chunk of change.”

    So could Apple pay more to get U.S. workers? I suppose — leaving aside the questions of skills and whatnot — but there is also the question of desirability; the iPhone assembly work that is not automated is highly drudgerous, sitting in a factory for hours a day delicately assembling the same components over and over again. It’s a good job if the alternative is working in the fields or in a much more dangerous and uncomfortable factory, but it’s much worse than basically any sort of job that is available in the U.S. market.

    At the same time, it is important to note that this drudgerous final assembly work is a center of gravity for the components that actually need to be assembled, and these parts are all of significantly higher value, and far more likely to be produced through automation. As I noted yesterday, Apple has probably done more than any other company to move China up the curve in terms of the ability to manufacture components, often to the detriment of suppliers in the U.S., Taiwan, South Korea, Japan, etc.; from Apple’s perspective spending time and money to bring Chinese component suppliers online provides competition for its most important suppliers, giving them greater negotiating leverage. From the U.S.’s perspective this means that a host of technologies and capabilities downstream from the smartphone — which is to say nearly all electronics, including those with significant military applicability like drones — are being developed in China.

    Beyond Disruption

    Fortunately, while true disruption is often the ultimate death knell for an individual company with a specific value proposition, I don’t think it is a law of nature. Disruption is about supply, but success on the Internet, to take one example familiar to Stratechery readers, is about demand — and controlling demand is more important than controlling supply. I expanded on this in a 2015 Article called Beyond Disruption:

    The Internet has completely transformed business by making both distribution and transaction costs effectively free. In turn, this has completely changed the calculus when it comes to adding new customers: specifically, it is now possible to build businesses where every incremental customer has both zero marginal costs and zero opportunity costs. This has profound implications: instead of some companies serving the high end of a market with a superior experience while others serve the low-end with a “good-enough” offering, one company can serve everyone. And, given the choice between a superior experience and one that is “good-enough,” of course the superior experience will win.

    To be sure, it takes time to scale such a company, but given the end game of owning the entire market, the rational approach is not to start on the low-end, but rather the exact opposite. After all, while marginal costs may be zero, providing a superior experience in the age of the Internet entails significant upfront (fixed) costs, and while those fixed costs are minimized on a per-customer basis at scale, they can have a significant impact with a small customer base. Therefore, it makes sense to start at the high-end with customers who have a greater willingness-to-pay, and from there scale downwards, decreasing your price along with the decrease in your per-customer cost base (because of scale) as you go (and again, without accruing material marginal costs).

    A drawing of Decision-Making Effect of Owning a Market

    This is exactly what Uber has done: the company spent its early years building its core technology and delivering a high-end experience with significantly higher prices than incumbent taxi companies. Eventually, though, the exact same technology was deployed to deliver a lower-priced experience to a significantly broader customer base; said customer base was brought on board at zero marginal cost.

    I want to be careful not to draw too many lessons from Aggregation Theory in an Article about manufacturing, given there are by definition marginal costs involved in physical goods. However, I would note two things:

    • First, marginal manufacturing costs are, for many goods, going down over time, thanks to automation; indeed, this is why the U.S. still has a significant amount of manufacturing output even if an ever-decreasing number of people are employed in the manufacturing sector.
    • Second, the idea that demand matters most does still hold. The takeaway from that Article isn’t that Uber is a model for the rebirth of American manufacturing; rather it’s that you can leverage demand to fundamentally reshape supply.

    It’s not as if the Trump administration doesn’t know this: the entire premise of these tariffs is that everyone wants access to the U.S. market, and rightly so given the outsized buying power driven both by our wealth and by the capacity for borrowing afforded us by the dollar being the reserve currency. It’s also true that China has an excess of supply; given that supply is usually built with debt that means the country needs cash flow, and even if factories are paid off, the country needs the employment opportunties. China’s hand is not as strong as many of Trump’s strongest critics believe.

    The problem with these tariffs is that their scale and indiscriminate nature will have the effect of destroying demand and destroying the capability to develop alternative supply. I suppose if the only goal is to hurt China then shooting yourself in the foot, such that you no longer need to buy shoes for stumps, is a strategy you could choose, but that does nothing to help with what should be the primary motivation: shoring up the U.S. national security base.

    Those national security concerns are real. The final stage of disruption is when the entity that started on the bottom is uniquely equipped to deliver what is necessary for a new paradigm, and that is exactly what happened with electronics generally and drones specifically. Moreover, this capability is only going to grow more important with the rise of AI, which will be substantiated in the physical world through robotics. And, of course, robots will be the key to building other robots; if the U.S. wants to be competitive in the future, and not be dependent on China, it really does need to make changes — just not these ones.

    A Better Plan

    The key distinguishing feature of a better plan is that it doesn’t seek to own supply, but rather control it in a way the U.S. does not today.

    First, blanket tariffs are a mistake. I understand the motivation: a big reason why Chinese imports to the U.S. have actually shrunk over the last few years is because a lot of final assembly moved to countries like Vietnam, Thailand, Mexico, etc. Blanket tariffs stop this from happening, at least in theory.

    The problem, however, is that those final assembly jobs are the least desirable jobs in the value chain, at least for the American worker; assuming the Trump administration doesn’t want to import millions of workers — that seems rather counter to the foundation of his candidacy! — the United States needs to find alternative trustworthy countries for final assembly. This can be accomplished through selective tariffs (which is exactly what happened in the first Trump administration).

    Secondly, using trade flows to measure the health of the economic relationship with these countries — any country, really, but particularly final assembly countries — is legitimately stupid. Go back to the iPhone: the value-add of final assembly is in the single digit dollar range; the value-add of Apple’s software, marketing, distribution, etc. is in the hundreds of dollars. Simply looking at trade flows — where an imported iPhone is calculated as a trade deficit of several hundred dollars — completely obscures this reality. Moreover, the criteria for a final assembly country is that they have low wages, which by definition can’t pay for an equivalent amount of U.S. goods to said iPhone.

    At the same time, the overall value of final assembly does exceed its economic value, for the reasons noted above: final assembly is gravity for higher value components, and it’s those components that are the biggest national security problem. This is where component tariffs might be a useful tool: the U.S. could use a scalpel instead of a sledgehammer to incentivize buying components from trusted allies, or from the U.S. itself, or to build new capacity in trusted locations. This does, admittedly, start to sound a lot like central planning, but that is why the gravity argument is an important one: simply moving final assembly somewhere other than China is a win — but not if there are blanket tariffs, at which point you might as well leave the supply chain where it is.

    Third, the most important components for executing a fundamental shift in trade are those that go into building actual factories, or equipment for those factories. In the vast sea of stupidity that are these tariffs this is perhaps the stupidest detail of all: the U.S. is tariffing raw materials and components for factory equipment, like CNC machines. Consider this announcement from Haas:

    You can certainly make the case that things like castings and other machine components are of sufficient importance to the U.S. that they ought to be manufactured here, but you have to ramp up to that. What is much more problematic is that raw materials and components are now much cheaper for Haas’ foreign competitors; even if those competitors face tariffs in the United States, their cost of goods sold will be meaningfully lower than Haas, completely defeating the goal of encouraging the purchase of U.S. machine tools.

    I get the allure of blanket tariffs; politics is often the art of the possible, and the perfect is the enemy of the good. The problem is this approach simply isn’t good: it’s actively detrimental to what should be the U.S.’s goals. It’s also ignoring the power of demand: China would supply factories in the U.S., even if the point of those factories was to displace China, because supply needs to sell. This is how you move past disruption: you not only exert control on alternatives to China, you exert control on China itself.

    Fourth, there remains the problem of chips. Trump just declared economic war on China, which definitionally increases the possibility of kinetic war. A kinetic war, however, will mean the destruction of TSMC, leaving the U.S. bereft of chips at the very moment that A.I. is poised to create tremendous opportunities for growth and automation. And, even if A.I. didn’t exist, it’s enough to note that modern life would grind to a halt without chips. That’s why this is the area that most needs direct intervention from the federal government, particularly in terms of incentivizing demand for both leading and trailing edge U.S. chips.

    I do, as I noted on Monday, have more sympathy than many of Trump’s critics for the need to make fundamental changes to trade; that, however, doesn’t mean any change is ipso facto good: things could get a lot worse, and these “liberation day” tariffs will do exactly that.

    The Melancholy of Internet 3.0

    I started this essay being solipsistic, so let me conclude with some more navel-gazing: my prevailing emotion over the past week — one I didn’t fully come to grips with until interrogating why Monday’s Article failed to live up to my standards — is sadness over the end of an era in technology, and frustration-bordering-on-disillusionment over the demise of what I thought was a uniquely American spirit.

    The first emotion goes back to that January 2021 Article about Internet 3.0 and the Beginning of (Tech) History:

    • Internet 1.0 was about technology. This was the early web, when technology was made for technology’s sake. This was when we got standards like TCP/IP, DNS, HTTP, etc. This was obviously the best era, but one that was impossible to maintain once there was big money to be made on the Internet.
    • Internet 2.0 was about economics. This was the era of Aggregators — the era of Stratechery, in other words — when the Internet developed, for better or worse, in ways that made maximum economic sense. This was a massive boon for the U.S., which sits astride the world of technology; unfortunately none of the value that comes from that position is counted in the trade statistics, so the administration doesn’t seem to care.
    • Internet 3.0 is about politics. This is the era when countries make economically sub-optimal choices for reasons that can’t be measured in dollars and cents. In that Article I thought that Big Tech exercising its power against the President might be a spur for other countries to seek to wean themselves away from American companies; instead it is the U.S. that may be leaving other countries little choice but to retaliate against U.S. tech.

    One can certainly make the case that the Internet 2.0 era wasn’t ideal, or even actively detrimental; it’s similar to the case that while free trade might have made everyone — especially the U.S. — richer, it wasn’t worth national security sacrifices that we are only now waking up to. For me, though, it was the era that has defined my professional life, and I’m sad to see it slipping away. Stratechery has always been non-political; it bums me out if we are moving to an era where politics are inescapable — they certainly are this week.

    The second emotion — the frustration-bordering-on-disillusionment — is about the defeatist and backwards-looking way that the U.S. continues to approach China. These tariffs, particularly to the extent they are predicated on hurting China, are a great example: whether through malice or incompetence this particular tariff plan seems designed to inflict maximal pain, even though that means hurting the U.S. along the way. What is worse is that this is a bipartisan problem: Biden’s chip controls are similarly backwards looking, seeking to stay ahead by pulling up the ladder of U.S. technology, instead of trying to stay ahead through innovation.

    There is, admittedly, a hint of that old school American can-do attitude embedded in these tariffs: the Trump administration seems to believe the U.S. can overcome all of the naysayers and skeptics through sheer force of will. That force of will, however, would be much better spent pursuing a vision of a new world order in 2050, not trying to return to 1950. That is possible to do, by the way, but only if you accept 1950’s living standards, which weren’t nearly as attractive as nostalgia-colored glasses paint them, and if we’re not careful, 1950’s technology as well. I think we can do better than that; I know we can do better than this.



    Get notified about new Articles


  • YouTube TV, Wiz, and Why Monopolies Buy Innovation

    Listen to this post:

    While “March Madness” refers to the NCAA basketball tournaments, the maddest weekend of all is the first one, when fields of 641 are trimmed down to the Sweet 16; this means there are 16 games a day the first two days, and 8 games a day for the next two. Inevitably this means that multiple games are on at the same time, and Max has a solution for you; from The Streamable:

    Max's March Madness multiview

    The 2025 NCAA Men’s Basketball Tournament starts today, and just in time, Warner Bros. Discovery has announced the addition of some very modern features for games that stream on its on-demand service Max. Fans can use Max to stream all March Madness games on TNT, TBS, and truTV, and that viewing experience is about to improve in a big way.

    The new Max feature that fans will likely appreciate most while watching NCAA Men’s Basketball Tournament games is a multiview. This will allow fans to watch up to three games at once, ensuring they never miss a single bucket, block, or steal from the tournament.

    Except that’s not correct; Warner Bros. Discovery shares the rights to the NCAA Men’s Basketball Tournament with CBS, and there were times over the weekend when there were games on CBS and a Warner Bros. Discovery property — sometimes four at once. That means that Max multiview watchers were in fact missing buckets, blocks, and steals, and likely from the highest profile games, which were more likely to be on the broadcast network.

    Notice, however, that I specified Max multiview watchers; YouTube TV has offered multiview for the NCAA Tournament since last year. Critically, YouTube TV’s offering includes CBS, and, starting this upcoming weekend, will also let you watch the women’s tournament as well; from Sportico:

    Generally, events from the same leagues are kept together. On Friday, for instance, men’s and women’s multiviews will be offered separately. If you truly want to watch all of March Madness live, it’ll be time to break out that second screen again. However, in part due to user demand, YouTube TV says mixed gender multiviews will be available starting with the Sweet 16.

    The job of prioritizing selections has only gotten more complicated as interest in women’s hoops has boomed. Through the first two rounds in 2024, viewership of the women’s tourney was up 108% over the year prior. Though the “March Madness” brand is now used for both men’s and women’s competitions, separate media deals dictate their distribution. CBS and TNT Sports networks split the men’s games, including streaming on March Madness Live apps, while ESPN’s channels host women’s action. Disney+ will also carry the Final Four. Cable providers, then, are required for fans hoping to seamlessly hop back and forth between the two brackets, even as fans shift to a streaming-first future.

    That last sentence is the key: Warner Bros. Discovery only has access to the games it owns rights to; YouTube TV, by virtue of being a virtual Multichannel Video Programming Distributor (vMVPD), has access to every game that is on cable, which is all of them. That lets the service offer an objectively better multiview experience.

    YouTube TV’s Virtual Advantage

    Multiview isn’t a new idea; in 1983 George Schnurle III invented the MultiVision:

    The Multivision 1.1
    Mrmazda, CC-SA

    This image is of the MultiVision 1.1, which took in four composite inputs; the 3.1 model included two built-in tuners — you provided the antenna. The Multivision didn’t provide multiview a la YouTube TV, but rather picture-in-picture, support for which was eventually built into TVs directly.

    Picture-in-picture, however, assumed that consumers had easy access to TV signals; this was a reasonable assumption when signals came in over-the-air or via basic cable. That changed in the late 1990s with the shift to digital cable, which required a set-top box to decrypt; most TVs only had one, and the picture-in-picture feature faded away. This loss was made up in part by the addition of DVR functionality to most of those set-top boxes; with time-shifting you couldn’t watch two things at once, but you could watch two things that aired at the same time.

    Cable companies offered DVR functionality in response to the popularity of TiVo; when the first model launched in 1999 it too relied on the relative openness of TV signals. Later models needed cable cards, which were mandated by the FCC in 2007; that mandate was repealed in 2020, as the good-enough nature of cable set-top boxes effectively killed the market for TiVo and other 3rd-party tuners.

    The first vMVPD, meanwhile, was Sling TV, which launched in 2015.2 YouTube TV launched two years later, with an old Google trick: unlimited storage for your cloud DVR, which you could watch anywhere in the U.S. on any device. That was possible because the point of integration for YouTube TV, unlike traditional cable, was on Google’s servers, not a set-top box (which itself was a manifestation of traditional MVPD’s point of integration being the cable into your house).

    This point of integration also explains why it was YouTube TV that came up with the modern implementation of multiview: Google could create this new feature centrally and make it available to everyone without needing to install high-powered set-top boxes in people’s homes. Indeed, this explains one of the shortcomings of multiview: because Google can not rely on viewers having high powered devices capable of showing four independent video streams, Google actually pre-mixes the streams into a single video feed on their servers.

    YouTube TV + NFL Sunday Ticket

    I mentioned above that YouTube TV offered multiview for March Madness starting last year, but that’s not quite right: a subset of the consumer base actually got access for March Madness in 2023; that was a beta test for the real launch, which was the 2023 NFL season. That was the first year that Google had the rights to NFL Sunday Ticket, which lets subscribers view out-of-market games. NFL Sunday Ticket was a prerequisite for multiview, because without it you would have access to at most two football games at a time; once you could watch all of the games, the utility was obvious.

    The point of this Article is not multiview; it’s a niche use case for events like March Madness or football fanatics on Sunday afternoons. What is notable about the latter example, however, is that Google needed to first secure the rights to NFL Sunday Ticket. This, unlike March Madness, wasn’t a situation where every game was already on cable, and thus accessible to YouTube TV; Google needed to pay $2 billion/year to secure the necessary rights to make multiview work.

    That’s a high price, even if multiview is cool; it seems unlikely that Google will ever make its money back directly. That, though, is often the case with the NFL. Back in 1993 Rupert Murdoch shocked the world by buying NFL broadcasting rights for a then-unprecedented $395 million/year, $100 million/year more than CBS was offering for the same package. Sports Illustrated explained his reasoning:

    There are skeptics who think that Murdoch will lose his custom-made shirt over the NFL deal; one estimate has him losing $500 million over the next four years. Says Murdoch, “I’ve seen those outrageous numbers. We’ll lose a few million in the first year, but even if it was 40 or 50 million, it would be tax deductible. It was a cheap way of buying a network.”

    What Murdoch meant was that demand for the NFL — which had already built ESPN — would get Fox into the cities where it didn’t yet exist, and improve its affiliate station’s standing (many of which Murdoch owned) in cities where they were weak and buried on the inferior UHF band. And, of course, that is exactly what happened.

    NFL Sunday Ticket is not, to be sure, the same as regular NFL rights; it is much more of a niche product with a subscription business model. That, though, is actually a good thing from Google’s perspective: the company’s opportunity is not to build a TV station, but rather a TV Aggregator.

    YouTube TV’s Aggregation Potential

    Google announced the NFL deal a month after it launched Primetime Channels, a marketplace for streaming services along the lines of Amazon’s Prime Video Channels or Apple TV Channels; I wrote in early 2023:

    The missing piece has been — in contrast to Apple and Amazon in particular — other streaming services. Primetime Channels, though, is clearly an attempt to build up YouTube’s own alternative to the Apple TV App Store or Amazon Prime Video Marketplace. This, as I noted last month, is why I think YouTube’s extravagant investment in NFL Sunday Ticket makes sense: it is a statement of intent and commitment that the service wants to use to convince other streaming services to come on board. The idealized future is one where YouTube is the front-door of all video period, whether that be streaming, linear, or user-generated.

    YouTube’s big advantage, as I noted in that Update, is that it has exclusive access to YouTube content; it is the only service that can offer basically anything you might want to watch on TV:

    • YouTube TV has linear television, which remains important for sports
    • YouTube proper dominates user-generated content
    • Primetime Channels is a way to bring other streaming services on board

    The real potential with streaming channels, however, is to go beyond selling subscriptions on an ad-hoc basis and actually integrating them into a single interface to drive discoverability and on-demand conversions. How useful would it be to see everything that is on in one place, and be able to either watch with one click, or subscribe with two?

    This is going to be an increasingly pressing need as sports in particular move to streaming. It used to be that all of the sports you might watch were in a centralized place: the channel guide on your set-top box. Today, however, many sports are buried in apps. Prominent examples include Amazon Thursday Night Football and Peacock’s exclusive NFL playoff games, but as a Wisconsin fan I’ve already experienced the challenge of an increasing number of college basketball games being exclusively streamed on Peacock; the problem is only going to get worse next season when an increasing number of NBA games are on Amazon and Peacock, and when ESPN releases a standalone streaming app with all of its games.

    The challenge for any one of these services is the same one seen with Max’s multiview offering: any particular streaming service is limited to its own content. Sure, any one of these services could try and build this offering anyways — ESPN is reportedly considering it — but then they run into the problem of not being a platform or marketplace with a massive audience already in place.

    The reason why that is an essential prerequisite is that executing on this vision will require forming partnerships with all of the various streamers — or at least those with live events like sports. On one hand, of course each individual streamer wants to own the customer relationship; on the other hand, sports rights both cost a lot of money and also lose their value the moment an event happens. That means they are motivated to trade away customer control and a commission for more subscribers, which works to the benefit of whoever can marshal the most demand, and YouTube, thanks primarily to its user-generated content, has the largest audience of all, and thanks to YouTube TV, is the only service that can actually offer everything.

    Google’s Product Problem

    Two quick questions for the audience:

    1. Did you know that Primetime Channels existed?
    2. How do you subscribe to Primetime Channels?

    The answer to number 2 is convoluted, to say the least; on a PC, you click the hamburger button in the upper left, then click “Your movies & TV”, then click the “Browse” tab, and there you will finally find Primetime channels; on mobile the “Your movies & TV” is found by clicking your profile photo on the bottom right.

    And, once you finally figure this out, you see a pretty pathetic list:

    YouTube Primetime Channels offerings

    As the arrow indicates, there are more options, but the only one of prominence is Paramount+; there is no Disney+, Peacock, Amazon Prime Video, Apple TV+, or Netflix.

    • Netflix’s resistance to being aggregated is long-running; they were the stick in the mud when Apple tried to aggregate streaming a decade ago. The company gets away with it — and are right to resist — because they have the largest user base amongst subscription platforms. The biggest bull case for Netflix is that many of the other streamers throw in the towel and realize they are better off just selling content to Netflix.
    • Disney+ actually could pull off a fair bit of what YouTube is primed to do: no, Disney doesn’t have YouTube’s user-generated content, but the company does have Hulu Live, which gives a potential Aggregation offering access to content still on linear TV.
    • Amazon and Apple are Google’s most obvious competitors when it comes to building an Aggregator for streaming services, and they have the advantage of owning hardware to facilitate transactions.

    That leaves Peacock, and this is where I hold Google responsible. Peacock has large bills and a relatively small userbase; there is also a Peacock app for both Amazon devices (although you have to subscribe to Peacock directly) and Apple devices (where Apple enforces an in-app subscription offering). If Google is serious about Primetime Channels specifically, and being a streaming and sports Aggregator generally, then it should have Peacock available as an offering.

    That’s the thing, though: it’s not clear that Google has made any sort of progress in achieving the vision I perceived two years ago in the wake of the launch of Primetime Channels and the NFL Sunday Ticket deal. Yes, YouTube continues to grow, particularly on TVs, and yes, multivision is slowly getting better, but both of those are products of inertia; is Google so arthritic that it can’t make a play to dominate an entertainment industry that is getting religion about the need to acquire and keep customers profitably? That’s exactly why Aggregators gain power over suppliers: they solve their demand problem. And yet Primetime Channels might as well not even exist, given how buried it is, and it might as well be, given that Google hasn’t signed a meaningful new deal since launch.

    Google’s Wiz Acquisition

    This is all convoluted way to explain why I approve of Google’s decision to pay $32 billion in cash for Wiz, a cybersecurity firm that has absolutely nothing to do with the future of TV. From Bloomberg:

    Google parent Alphabet Inc. agreed to acquire cybersecurity firm Wiz Inc. for $32 billion in cash, reaching a deal less than a year after initial negotiations fell apart because the cloud-computing startup wanted to stay independent. Wiz will join the Google Cloud business once the deal closes, the companies said in a statement on Tuesday. The takeover is subject to regulatory approvals and is likely to close next year, they said.

    The deal, which would be Alphabet’s largest to date, comes after Wiz turned down a $23 billion bid from the internet search leader last year after several months of discussions. At the time, Wiz walked away after deciding it could ultimately be worth more by pursuing an initial public offering company. Concerns about regulatory challenges also influenced the decision. The companies have agreed to a breakup fee of about 10% of the deal value, or $3.2 billion, if the deal doesn’t close, according to a person familiar with the matter. Shares of Alphabet fell nearly 3% in New York on Tuesday.

    Wiz provides cybersecurity solutions for multi-cloud environments, and is growing fast. This makes it a natural fit for Google Cloud, which is a distant third place to AWS and Microsoft Azure. Google Cloud’s biggest opportunity for growth is to be a service that is used in addition to a large corporation’s existing cloud infrastructure, and Wiz provides both a beachhead into those organizations and also a solution to managing a multi-cloud setup.

    Google Cloud’s selling point — the reason it might expand beyond a Wiz beachhead — are Google’s AI offerings. Google continues to have excellent AI research and the best AI infrastructure; where the company is struggling is product, particularly in the consumer space, thanks to some combination of fear of disruption and, well, the fact that product capability seems to be the first casualty of a monopoly (Apple’s declining product chops, particularly in software and obviously AI, is another example).

    The company’s tortoise-like approach to TV lends credence to the latter explanation: Google is in an amazing position in TV, thanks to the long-ago acquisition of YouTube and the launch of YouTube TV, but it has accomplished little since then beyond agreeing to pay the NFL a lot of money. Arguably the ideal solution to this sort of malaise, at least from a shareholder perspective, would be to simply collect monopoly rents and return the money to shareholders at a much higher rate than Google has to date; absent that, buying product innovation seems like the best way to actually accomplish anything.

    In other words, while I understand the theory of people who think that Google ought to just build Wiz’s functionality instead of paying a huge revenue multiple for a still-unprofitable startup, I think the reality of a company like Google is that said theory would run into the morass that is product development in a monopoly. It simply would not ship, and would suck if it did. Might as well pay up for momentum in a market that has some hope of leveraging the still considerable strengths that exist beneath the flab.



    1. Technically 68; there are four games on Tuesday that trim the field to 64 

    2. The original Sling TV was a cable card device that allowed you to watch your TV from anywhere in the world; it was massively popular amongst expats here in Taiwan 


    Get notified about new Articles


  • Apple AI’s Platform Pivot Potential

    Listen to this post:

    It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way — in short, the period was so far like the present period that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only.
    Charles Dickens, A Tale of Two Cities

    Apple’s Bad Week

    Apple has had the worst of weeks when it comes to AI. Consider this commercial which the company was running incessantly last fall:

    In case you missed the fine print in the commercial, it reads:

    Apple Intelligence coming fall 2024 with Siri and device language set to U.S. English. Some features and languages will be coming over the next year.

    “Next year” is doing a lot of work, now that the specific feature detailed in this commercial — Siri’s ability to glean information from sources like your calendar — is officially delayed. Here is the statement Apple gave to John Gruber at Daring Fireball:

    Siri helps our users find what they need and get things done quickly, and in just the past six months, we’ve made Siri more conversational, introduced new features like type to Siri and product knowledge, and added an integration with ChatGPT. We’ve also been working on a more personalized Siri, giving it more awareness of your personal context, as well as the ability to take action for you within and across your apps. It’s going to take us longer than we thought to deliver on these features and we anticipate rolling them out in the coming year.

    It was a pretty big surprise, even at the time, that Apple, a company renowned for its secrecy, was so heavily advertising features that did not yet exist; I also, in full disclosure, thought it was all an excellent idea. From my post-WWDC Update:

    The key part here is the “understanding personal context” bit: Apple Intelligence will know more about you than any other AI, because your phone knows more about you than any other device (and knows what you are looking at whenever you invoke Apple Intelligence); this, by extension, explains why the infrastructure and privacy parts are so important.

    What this means is that Apple Intelligence is by-and-large focused on specific use cases where that knowledge is useful; that means the problem space that Apple Intelligence is trying to solve is constrained and grounded — both figuratively and literally — in areas where it is much less likely that the AI screws up. In other words, Apple is addressing a space that is very useful, that only they can address, and which also happens to be “safe” in terms of reputation risk. Honestly, it almost seems unfair — or, to put it another way, it speaks to what a massive advantage there is for a trusted platform. Apple gets to solve real problems in meaningful ways with low risk, and that’s exactly what they are doing.

    Contrast this to what OpenAI is trying to accomplish with its GPT models, or Google with Gemini, or Anthropic with Claude: those large language models are trying to incorporate all of the available public knowledge to know everything; it’s a dramatically larger and more difficult problem space, which is why they get stuff wrong. There is also a lot of stuff that they don’t know because that information is locked away — like all of the information on an iPhone. That’s not to say these models aren’t useful: they are far more capable and knowledgable than what Apple is trying to build for anything that does not rely on personal context; they are also all trying to achieve the same things.

    So is Apple more incompetent than these companies, or was my evaluation of the problem space incorrect? Much of the commentary this week assumes point one, but as Simon Willison notes, you shouldn’t discount point two:

    I have a hunch that this delay might relate to security. These new Apple Intelligence features involve Siri responding to requests to access information in applications and then performing actions on the user’s behalf. This is the worst possible combination for prompt injection attacks! Any time an LLM-based system has access to private data, tools it can call, and exposure to potentially malicious instructions (like emails and text messages from untrusted strangers) there’s a significant risk that an attacker might subvert those tools and use them to damage or exfiltrating a user’s data.

    Willison links to a previous piece of his on the risk of prompt injections; to summarize the problem, if your on-device LLM is parsing your emails, what happens if one of those emails contains malicious text perfectly tuned to make your on-device AI do something you don’t want it to? We intuitively get why code injections are bad news; LLMs expand the attack surface to text generally; Apple Intelligence, by being deeply interwoven into the system, expands the attack surface to your entire device, and all of that precious content it has unique access to.

    Needless to say, I regret not raising this point last June, but I’m sure my regret pales in comparison to Apple executives and whoever had to go on YouTube to pull that commercial over the weekend.

    Apple’s Great Week

    Apple has had the best of weeks when it comes to AI. Consider their new hardware announcements, particularly the Mac Studio and its available M3 Ultra; from the company’s press release:

    Apple today announced M3 Ultra, the highest-performing chip it has ever created, offering the most powerful CPU and GPU in a Mac, double the Neural Engine cores, and the most unified memory ever in a personal computer. M3 Ultra also features Thunderbolt 5 with more than 2x the bandwidth per port for faster connectivity and robust expansion. M3 Ultra is built using Apple’s innovative UltraFusion packaging architecture, which links two M3 Max dies over 10,000 high-speed connections that offer low latency and high bandwidth. This allows the system to treat the combined dies as a single, unified chip for massive performance while maintaining Apple’s industry-leading power efficiency. UltraFusion brings together a total of 184 billion transistors to take the industry-leading capabilities of the new Mac Studio to new heights.

    “M3 Ultra is the pinnacle of our scalable system-on-a-chip architecture, aimed specifically at users who run the most heavily threaded and bandwidth-intensive applications,” said Johny Srouji, Apple’s senior vice president of Hardware Technologies. “Thanks to its 32-core CPU, massive GPU, support for the most unified memory ever in a personal computer, Thunderbolt 5 connectivity, and industry-leading power efficiency, there’s no other chip like M3 Ultra.”

    That Apple released a new Ultra chip wasn’t a shock, given there was an M1 Ultra and M2 Ultra; almost everything about this specific announcement, however, was a surprise.

    Start with the naming. Apple chip names have two components: M_ refers to the core type, and the suffix to the configuration of those cores. Therefore, to use the M1 series of chips as an example:

    Perf Cores Efficiency Cores GPU Cores Max RAM Bandwidth
    M1 4 4 8 16GB 70 GB/s
    M1 Pro 8 4 16 32GB 200 GB/s
    M1 Max 8 2 32 64GB 400 GB/s
    M1 Ultra 16 4 64 128GB 800 GB/s

    The “M1” cores in question were the “Firestorm” high-performance core, “Icestorm” energy-efficient core, and a not-publicly-named GPU core; all three of these cores debuted first on the A14 Bionic chip, which shipped in the iPhone 12.

    The suffix, meanwhile, referred to some combination of increased core count (both CPU and GPU), as well as an increased number of memory controllers and associated bandwidth (and, in the case of the M1 series, faster RAM). The Ultra, notably, was simply two Max chips fused together; that’s why all of the numbers simply double.

    The M2 was broadly similar to the M1, at least in terms of the relative performance of the different suffixes. The M2 Ultra, for example, simply doubled up the M2 Max. The M3 Ultra, however, is unique when it comes to max RAM:

    Perf Cores Efficiency Cores GPU Cores Controllers Max RAM Bandwidth
    M3 4 4 10 8 32GB 100 GB/S
    M3 Pro 6 6 18 12 48GB 150 GB/s
    M3 Max 12 4 40 32 128GB 400 GB/s
    M3 Ultra 24 8 80 64 512GB 800 GB/s

    I can’t completely vouch for every number on this table (which was sourced from Wikipedia), as Apple hasn’t yet released the full technical details of the M3 Ultra, and it’s not yet available for testing. What seems likely, however, is that instead of simply doubling up the M3 Max, Apple also reworked the memory controllers to address double the memory. That also explains why the M3 Ultra came out so much later than the rest of the family — indeed, the Mac Studio base chip is actually the M4 Max.

    The wait was worth it, however: what makes Apple’s chip architecture unique is that that RAM is shared by the CPU and GPU, and not in the carve-out way like integrated graphics of old; rather, every part of the chip — including the Neural Processing Units, which I didn’t include on these tables — has full access to (almost1) all of the memory all of the time.

    What that means in practical terms is that Apple just shipped the best consumer-grade AI computer ever. A Mac Studio with an M3 Ultra chip and 512GB RAM can run a 4-bit quantized version of DeepSeek R1 — a state-of-the-art open-source reasoning model — right on your desktop. It’s not perfect — quantization reduces precision, and the memory bandwidth is a bottleneck that limits performance — but this is something you simply can’t do with a standalone Nvidia chip, pro or consumer. The former can, of course, be interconnected, giving you superior performance, but that costs hundreds of thousands of dollars all-in; the only real alternative for home use would be a server CPU and gobs of RAM, but that’s even slower, and you have to put it together yourself.

    Apple didn’t, of course, explicitly design the M3 Ultra for R1; the architectural decisions undergirding this chip were surely made years ago. In fact, if you want to include the critical decision to pursue a unified memory architecture, then your timeline has to extend back to the late 2000s, whenever the key architectural decisions were made for Apple’s first A4 chip, which debuted in the original iPad in 2010.

    Regardless, the fact of the matter is that you can make a strong case that Apple is the best consumer hardware company in AI, and this week affirmed that reality.

    Apple Intelligence vs. Apple Silicon

    It’s probably a coincidence that the delay in Apple Intelligence and the release of the M3 Ultra happened in the same week, but it’s worth comparing and contrasting why one looks foolish and one looks wise.

    Apple Silicon

    Start with the latter: Tony Fadell told me the origin story of Apple Silicon in a 2022 Stratechery Interview; the context of the following quote was his effusive praise for Samsung, which made the chips for the iPod and the first several models of the iPhone:

    Samsung was an incredible partner. Even though they got sued, they were an incredible partner, they had to exist for the iPod to be as successful and for the iPhone to even exist. That happened. During that time, obviously Samsung was rising up in terms of its smartphones and Android and all that stuff, and that’s where things fell apart.

    At the same time, there was the strategic thing going on with Intel versus ARM in the iPad, and then ultimately iPhone where there’s that fractious showdown that I had with various people at Apple, including Steve, which was Steve wanted to go Intel for the iPad and ultimately the iPhone because that’s the way we went with the Mac and that was successful. And I was saying, “No, no, no, no! Absolutely not!” And I was screaming about it and that’s when Steve was, well after Intel lost the challenge, that’s when Steve was like, “Well, we’re going to go do our own ARM.” And that’s where we bought P.A. Semi.

    So there was the Samsung thing happening, the Intel thing happening, and then it’s like we need to be the master of our own destiny. We can’t just have Samsung supplying our processors because they’re going to end up in their products. Intel can’t deliver low power embedded the way we would need it and have the culture of quick turns, they were much more standard product and non custom products and then we also have this, “We got to have our own strategy to best everyone”. So all of those things came together to make what happened happen to then ultimately say we need somebody like TSMC to build more and more of our chips. I just want to say, never any of these things are independently decisions, they were all these things tied together for that to pop out of the oven, so to speak.

    This is such a humbling story for me as a strategy analyst; I’d like to spin up this marvelous narrative about Apple’s foresight with Apple Silicon, but like so many things in business, it turns out the best consumer AI chips were born out of pragmatic realities like Intel not being competitive in mobile, and Samsung becoming a smartphone competitor.

    Ultimately, though, the effort is characterized by four critical qualities:

    Time: Apple has been working on Apple Silicon for 17 years.

    Motivation: Apple was motivated to build Apple Silicon because having competitive and differentiated mobile chips was deemed essential to their business.

    Differentiation: Apple’s differentiation has always been rooted in the integration of hardware and software, and controlling their own chips let them do exactly that, wringing out unprecedented efficiency in particular.

    Iteration: The M3 Ultra isn’t Apple’s first chip; it’s not even the first M chip; heck, it’s not even the first M3! It’s the result of 17 years of iteration and experimentation.

    Apple Intelligence

    Notice how these qualities differ when it comes to Apple Intelligence:

    Time: The number one phrase that has been used to characterize Apple’s response to the ChatGPT moment in November 2022 is flat-footed, and that matches what I have heard anecdotally. That, by extension, means that Apple has been working on Apple Intelligence for at most 28 months, and that is almost certainly generous, given that the company likely took a good amount of time to figure out what its approach would be. That not nothing — xAI went from company formation to Grok 3 in 19 months — but it’s certainly not 17 years!

    Motivation: If you look at Apple’s earnings calls in the wake of ChatGPT, February 2023, May 2023, and August 2023, all contain some variation of “AI and machine learning have been integrated into our products for years, and we’ll continue to be thoughtful about how we implement them”; finally in November 2023 CEO Tim Cook said the company was working on something new:

    In terms of generative AI, we have — obviously, we have work going on. I’m not going to get into details about what it is, because, as you know, we don’t — we really don’t do that. But you can bet that we’re investing, we’re investing quite a bit, we’re going to do it responsibly and it will — you will see product advancements over time that where the — those technologies are at the heart of them.

    First, this obviously has bearing on the “time” point above; secondly, one certainly gets the sense that Apple, after tons of industry hype and incessant questions from analysts, very much representing the concerns of shareholders, felt like they had no choice but to be doing something with generative AI. In other words — and yes, this is very much driving with the rearview mirror — Apple didn’t seem to be working on generative AI because they felt it was essential to their product vision, but rather because they had to keep up with what everyone else was doing.

    Differentiation: This is the most alluring part of the Apple Intelligence vision, which I myself hyped up from the beginning: Apple’s exclusive access to its users’ private information. What is interesting to consider, however, beyond the security implications, is the difference between “exclusivity” and “integration”.

    Consider your address book: the iOS SDK included the Contacts API, which gave any app on the system full access to your contacts without requiring explicit user permission. This was essential to the early success of services like WhatsApp, which cleverly bootstrapped your network by using phone numbers as unique IDs; this meant that pre-existing username-based networks like Skype and AIM were actually at a disadvantage on iOS. iMessage did the same thing when it launched in 2011, and then Apple started requiring user permission to access your contacts in 2012.

    Even this amount of access, however, paled in comparison to the Mac, where developers could access information from anywhere on the system. iOS, on the other hand, put apps in sandboxes, cut off from other apps and system information outside of APIs like the Contacts API, all of which have become more and more restricted over time. Apple made these decisions for very good reasons, to be clear: iOS is a much safer and secure environment than macOS; increased restrictions generally mean increased privacy, albeit at the cost of decreased competition.

    Still, it’s worth pointing out that exclusive access to data is downstream of a policy choice to exclude third parties; this is distinct from the sort of hardware and software integration that Apple can exclusively deliver in the pursuit of superior performance. This distinction is subtle, to be sure, but I think it’s notable that Apple Silicon’s differentiation was in the service of building a competitive moat, while Apple Intelligence’s differentiation was about maintaining one.

    Iteration: From one perspective, Apple Intelligence is the opposite of an evolved system: Apple put together an entire suite of generative AI capabilities, and aimed to launch them all in iOS 18. Some of these, like text manipulation and message summaries, were straightforward and made it out the door without a problem; others, particularly the reimagined Siri and its integration with 3rd party apps and your personal data, are now delayed. It appears Apple tried to do too much all at once.

    The Incumbent Advantage

    At the same time, it’s not as if Siri is new; the voice assistant launched in 2011, alongside iMessage. In fact, though, Siri has always tried to do too much too soon; I wrote last week about the differences between Siri and Alexa, and how Amazon was wise to focus their product development on the basics — speed and accuracy — while making Alexa “dumber” than Siri tried to be, particularly in its insistence on precise wording instead of attempting to figure out what you meant.

    To that end, this speaks to how Apple could have been more conservative in its generative AI approach (and, I fear, Amazon too, given my skepticism of Alexa+): simply make a Siri that works. The fact of the matter is that Siri has always struggled with delivering on its promised functionality, but a lot of its shortcomings could have been solved by generative AI. Apple, however, promised much more than this at last year’s WWDC: Siri wasn’t simply going to work better, it was actually going to understand and integrate your personal data and 3rd-party apps in a way that had never been done before.

    Again, I applauded this at the time, so this is very much Monday-morning quarterbacking. I increasingly suspect, however, we are seeing a symptom of big-company disease that I hadn’t previously considered: while one failure state in the face of new technology is moving too slowly, the opposite failure state is assuming you can do too much too quickly, when simply delivering the basics would be more than good enough.

    Consider home automation: the big three players in the space are Siri and Alexa and Google Assistant. What makes these companies important is not simply that they have devices you can put in your home and talk to, but also that there is an entire ecosystem of products which work with them. Given that, consider two possible products in the space:

    • OpenAI releases a ChatGPT speaker that you can talk to and interact with; it works brilliantly and controls, well, it doesn’t control anything, because the ecosystem hasn’t adopted it. OpenAI would need to work diligently to build out partnerships with everyone from curtain makers to smart light to locks and more; that’s hard enough in its own right, and even more difficult when you consider that many of these objects are only installed once and updated rarely.
    • Apple or Amazon or Google update their voice assistants with basic LLMs. Now, instead of needing to use precise language, you can just say whatever you want, and the assistant can figure it out, along with all of the other LLM niceties like asking about random factoids.

    In this scenario the Apple/Amazon/Google assistants are superior, even if their underlying LLMs are worse, or less capable than OpenAI’s offering, because what the companies are selling is not a standalone product but an ecosystem. That’s the benefit of being a big incumbent company: you have other advantages you can draw on beyond your product chops.

    What is striking about new Siri — and, I worry, Alexa+ — is the extent to which they are focused on being compelling products in their own right. It’s very clever for Siri to remember who I had coffee with; it’s very useful — and probably much more doable — to reliably turn my lights on and off. Apple (and I suspect Amazon) should have absolutely nailed the latter before promising to deliver the former.

    If you want to be generous to Apple you could make the case that this was what they were trying to deliver with the Siri Intents expansion: developers could already expose parts of their apps to Siri for things like music playback, and new Siri was to build on that framework to enhance its knowledge about a user’s context to provide useful answers. This, though, put Apple firmly in control of the interaction layer, diminishing and commoditizing apps; that’s what an Aggregator does, but what if Apple went in a different direction?

    An AI Platform

    While my clearest delineation of the difference between Aggregators and Platforms is probably in A Framework for Regulating Competition on the Internet, perhaps the most romantic was in Tech’s Two Philosophies:

    There is certainly an argument to be made that these two philosophies arise out of their historical context; it is no accident that Apple and Microsoft, the two “bicycle of the mind” companies, were founded only a year apart, and for decades had broadly similar business models: sure, Microsoft licensed software, while Apple sold software-differentiated hardware, but both were and are at their core personal computer companies and, by extension, platforms.

    A drawing of Platform Businesses Attract Customers by Third Parties

    Google and Facebook, on the other hand, are products of the Internet, and the Internet leads not to platforms but to Aggregators. While platforms need 3rd parties to make them useful and build their moat through the creation of ecosystems, Aggregators attract end users by virtue of their inherent usefulness and, over time, leave suppliers no choice but to follow the Aggregators’ dictates if they wish to reach end users.

    A drawing of Aggregators Own Customer Relationships and Suppliers Follow

    The business model follows from these fundamental differences: a platform provider has no room for ads, because the primary function of a platform is to provide a stage for the applications that users actually need to shine. Aggregators, on the other hand, particularly Google and Facebook, deal in information, and ads are simply another type of information. Moreover, because the critical point of differentiation for Aggregators is the number of users on their platform, advertising is the only possible business model; there is no more important feature when it comes to widespread adoption than being “free.”

    Still, that doesn’t make the two philosophies any less real: Google and Facebook have always been predicated on doing things for the user, just as Microsoft and Apple have been built on enabling users and developers to make things completely unforeseen.

    I said this was romantic, but the reality of Apple’s relationship with developers, particularly over the last few years as the growth of the iPhone has slowed, has been considerably more antagonistic. Apple gives lip service to the role developers played in making the iPhone a compelling platform — and in collectively forming a moat for iOS and Android — but its actions suggest that Apple views developers as a commodity: necessary in aggregate, but mostly a pain in the ass individually.

    This is all very unfortunate, because Apple — in conjunction with its developers — is being presented with an incredible opportunity by AI, and it’s one that takes them back to their roots: to be a platform.

    Start with the hardware: while the M3 Ultra is the biggest beast on the block, all of Apple’s M chips are highly capable, particularly if you have plenty of RAM. I happen to have an M2 MacBook Pro with 96GB of memory (I maxed out for this specific use case), which lets me run Mixtral 8x22B, an open-source model from Mistral with 141 billion parameters, at 4-bit quantization; I asked it a few questions:

    You don’t need to actually try and read the screen-clipping; the output is pretty good, albeit not nearly as detailed and compelling as what you might expect from a frontier model. What’s amazing is that it exists at all: that answer was produced on my computer with my M2 chip, not in the cloud on an Nvidia datacenter GPU. I didn’t need to pay a subscription, or worry about rate limits. It’s my model on my device.

    What’s arguably even more impressive is seeing models run on your iPhone:

    This is a much smaller model, and correspondingly less capable, but the fact it is running locally on a phone is amazing!

    Apple is doing the same thing with the models that undergird Apple Intelligence — some models run on your device, and others on Apple’s Private Cloud Compute — but those models aren’t directly accessible by developers; Apple only exposes writing tools, image playground, and Genmoji. And, of course, they ask for your app’s data for Siri, so they can be the AI Aggregator. If a developer wants to do something unique, they need to bring their own model, which is not only very large, but hard to optimize for a specific device.

    What Apple should do instead is make its models — both local and in Private Cloud Compute — fully accessible to developers to make whatever they want. Don’t limit them to cutesy-yet-annoying frameworks like Genmoji or sanitized-yet-buggy image generators, and don’t assume that the only entity that can create something compelling using developer data is the developer of Siri; instead return to the romanticism of platforms: enabling users and developers to make things completely unforeseen. This is something only Apple could do, and, frankly, it’s something the entire AI industry needs.


    When the M1 chip was released I wrote an Article called Apple’s Shifting Differentiation. It explained that while Apple had always been about the integration of hardware and software, the company’s locus of differentiation had shifted over time:

    • When OS X first came out, Apple’s differentiation was software: Apple hardware was stuck on PowerPC chips, woefully behind Intel’s best offerings, but developers in particular were lured by OS X’s beautiful UI and Unix underpinnings.
    • When Apple moved to Intel chips, its hardware was just as fast as Windows hardware, allowing its software differentiation to truly shine.
    • Over time, as more and more applications moved to the web, the software differences came to matter less and less; that’s why the M1 chip was important for the Mac’s future.

    Apple has the opportunity with AI to press its hardware advantage: because Apple controls the entire device, they can guarantee to developers the presence of particular models at a particular level of performance, backed by Private Cloud Compute; this, by extension, would encourage developers to experiment and build new kinds of applications that only run on Apple devices.

    This doesn’t necessarily preclude finally getting new Siri to work; the opportunity Apple is pursuing continues to make sense. At the same time, the implication of the company’s differentiation shifting to hardware is that the most important job for Apple’s software is to get out of the way; to use Apple’s history as analogy, Siri is the PowerPC of Apple’s AI efforts, but this is a self-imposed shortcoming. Apple is uniquely positioned to not do everything itself; instead of seeing developers as the enemy, Apple should deputize them and equip them in a way no one else in technology can.

    I wrote a follow-up to this Article in this Daily Update.



    1. Apple reserves some memory for the CPU at all times, so that the computer can actually run 


    Get notified about new Articles


  • AI Promise and Chip Precariousness

    Listen to this post:

    Yesterday Anthropic released Claude Sonnet 3.7; Dylan Patel had the joke of the day about Anthropic’s seeming aversion to the number “4”, which means “die” in Chinese:

    Jokes aside, the correction on this post by Ethan Mollick suggests that Anthropic did not increment the main version number because Sonnet 3.7 is still in the GPT-4 class of models as far as compute is concerned.

    After publishing this piece, I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered a 10^26 FLOP model and cost a few tens of millions of dollars to train, though future models will be much bigger. I updated the post with that information. The only significant change is that Claude 3 is now referred to as an advanced model but not a Gen3 model.

    I love Mollick’s work, but reject his neutral naming scheme: whoever gets to a generation first deserves the honor of the name. In other words, if Gen2 models are GPT-4 class, then Gen3 models are Grok 3 class.

    And, whereas Sonnet 3.7 is an evolution of Sonnet 3.5’s fascinating mixture of personality and coding prowess, likely a result of some Anthropic special sauce in post-training, Grok 3 feels like a model that is the result of a step-order increase in compute capacity, with a much lighter layer of reinforcement learning with human feedback (RLHF). Its answers are far more in-depth and detailed (model good!), but frequently becomes too verbose (RLHF lacking); it gets math problems right (model good!), but its explanations are harder to follow (RLHF lacking). It is also much more willing to generate forbidden content, from erotica to bomb recipes, while having on the surface the political sensibilities of Tumblr, with something more akin to 4chan under the surface if you prod.1 Grok 3, more than any model yet, feels like the distilled Internet; it’s my favorite so far.

    Grok 3 is also a reminder of how much speed matters, and, by extension, why base models are still important in a world of AI’s that reason. Grok 3 is tangibly faster than the competition, which is a better user experience; more generally, conversation is the realm of quick wits, not deep thinkers. The latter is who I want doing research or other agentic-type tasks; the former makes for a better consumer user experience in a chatbot or voice interface.

    ChatGPT, meanwhile, still has the best product experience — its Mac app in particular is dramatically better than Claude’s2 — and it handles more consumer-y use cases like math homework in a much more user-friendly way. Deep Research, meanwhile, is significantly better than all of its competitors (including Grok’s “Deep Search”), and, for me anyways, the closest experience yet to AGI.

    OpenAI’s biggest asset, however, is the ChatGPT brand and associated mindshare; COO Brad Lightcap just told CNBC that the service had surpassed 400 million weekly active users, a 33% increase in less than 3 months. OpenAI is, as I declared four months after the release of ChatGPT, the accidental consumer tech company. Consumer tech companies are the hardest to build and have the potential to be the most valuable; they also require a completely different culture and value chain than a research organization with an API on the side. That is the fundamental reality that I suspect has driven much of the OpenAI upheaval over the last two-and-a-half years: long-time OpenAI employees didn’t sign up to be the next Google Search or Meta, nor is Microsoft interested in being a mere component supplier to a company that must own the consumer relationship to succeed.

    In fact, though, OpenAI has moved too slowly: the company should absolutely have an ad-supported version by now, no matter how much the very idea might make AI researchers’ skin crawl; one of the takeaways from the DeepSeek phenomenon was how many consumers didn’t understand how good OpenAI’s best models were because they were not paying customers. It is very much in OpenAI’s competitive interest to make it cost-effective to give free users the best models, and that means advertising. More importantly, the only way for a consumer tech company to truly scale to the entire world is by having an ad model, which maximizes the addressable market while still making it possible to continually increase the average revenue per user (this doesn’t foreclose a subscription model of course; indeed, ads + subscriptions is the ultimate destination for a consumer content business).

    DeepSeek, meanwhile, has both been the biggest story of the year, in part because it is the yin to Grok 3’s yang. DeepSeek’s V3 and R1 models are excellent and worthy competitors in the GPT-4 class, and they achieved this excellence through extremely impressive engineering on both the infrastructure and model layers; Grok 3, on the other hand, simply bought the most top-of-the-line Nvidia chips, leveraging the company’s networking to build the biggest computing cluster yet, and came out with a model that is better, but not astronomically so.

    The fact that DeepSeek is Chinese is critically important, for reasons I will get to below, but it is just as important that it is an open lab, regularly publishing papers, full model weights, and underlying source code. DeepSeek’s models — which are both better than Meta’s Llama models and more open (and unencumbered by an “openish” license) — set the bar for “minimum open capability”; any model at or below DeepSeek’s models has no real excuse to not be open. Safety concerns are moot when you can just run DeepSeek, while competitive concerns are dwarfed by the sacrifice in uptake and interest entailed in having a model that is both worse and closed.

    Both DeepSeek and Llama, meanwhile, provide significant pressure on pricing; API costs in both the U.S. and China have come down in response to the Chinese research lab’s releases, and the only way to have a sustainable margin in the long run is to either have a cost advantage in infrastructure (i.e. Google), have a sustainable model capability advantage (potentially Claude and coding), or be an Aggregator (which is what OpenAI ought to pursue with ChatGPT).

    The State of AI Chips

    All of this is — but for those with high p-doom concerns — great news. AI at the moment seems to be a goldilocks position: there is sufficient incentive for the leading research labs to raise money and continue investing in new foundation models (in the hope of building an AI that improves itself), even as competition drives API prices down relentlessly, further incentivizing model makers to come up with differentiated products and capabilities.

    The biggest winner, of course, continues to be Nvidia, whose chips are fabbed by TSMC: DeepSeek’s success is causing Chinese demand for the H20, Nvidia’s reduced-compute-and-reduced-bandwidth-to-abide-by-export-controls version of the H200, to skyrocket, even as xAI just demonstrated that the fastest way to compete is to pay for the best chips. DeepSeek’s innovations will make other models more efficient, but it’s reasonable to argue that those efficiences are downstream from the chip ban, and that it’s understandable why companies who can just buy the best chips haven’t pursued — but will certainly borrow! — similar gains.

    That latter point is a problem for AMD in particular: SemiAnalysis published a brutal breakdown late last year demonstrating just how poor the Nvidia competitor’s software is relative to its hardware; AMD promises to do better, but, frankly, great chips limited by poor software has been the story of AMD for its entire five decades of existence. Some companies, like Meta or Microsoft, might put in the work to write better software, but leading labs don’t have the time nor expertise.

    The story is different for Huawei and its Ascend line of AI chips. Those chips are fabbed on China’s Semiconductor Manufactoring International Corporation’s (SMIC) 7nm process, using western-built deep ultraviolet lighography (DUV) and quad-patterning; that this is possible isn’t a surprise, but it’s reasonable to assume that the fab won’t progress further without a Chinese supplier developing extreme ultraviolet lithography (EUV) (and no, calling an evolution of the 7nm process 5.5nm doesn’t count).

    Still, the primary limitation for AI chips — particularly when it comes to inference — isn’t necessarily chip speed, but rather memory bandwidth, and that can be improved at the current process level. Moreover, one way to (somewhat) overcome the necessity of using less efficient chips is to simply build more data centers with more power, something that China is much better at than the U.S. Most importantly, however, is that China’s tech companies have the motivation — and the software chops — to make the Ascend a viable contender, particularly for inference.

    There is one more player who should be mentioned alongside Nvidia/TSMC and Huawei/SMIC, and that is the hyperscalers who design their own chips, either on their own (AWS with Trainium and Microsoft with Maia) or in collaboration with Broadcom (Google with TPUs and Meta with MTIA). The capabilities and importance of these efforts varies — Google has been investing in TPUs for a decade now, and trains its own models on them, while the next-generation Anthropic model is being trained on Trainium; Meta’s MTIA is about recommendations and not generative AI, while Microsoft’s Maia is a much more nascent effort — but what they all have in common is that their chips are fabbed by TSMC.

    TSMC and Intel

    That TSMC is dominant isn’t necessarily a surprise. Yes, much has been written, including on this site, about Intel’s stumbles and TSMC’s rise, but even if Intel had managed to stay on the leading edge — and 18A is looking promising — there is still the matter of the company needing to transform itself from an integrated device manufacturer (IDM) who designs and makes its own chips, to a foundry that has the customer service, IP library, and experience to make chips for 3rd parties like all of the entities I just discussed.

    Nvidia, to take a pertinent example, was making its chips at TSMC (and Samsung) even when Intel had the leading process; indeed, it was the creation of TSMC and its pure-play foundry model that even made Nvidia possible.3 This also means that TSMC doesn’t just have leading edge capacity, but trailing edge capacity as well. There are a lot of chips in the world — both on AI servers and also in everything from cars to stereos to refrigerators — that don’t need to be on the cutting edge and which benefit from the low costs afforded by the fully depreciated foundries TSMC still maintains, mostly in Taiwan. And TSMC, in turn, can take that cash flow — along with increasing prices for the leading edge — and invest in new fabs on the cutting edge.

    Those leading edge fabs continue to skyrocket in price, which means volume is critical. That is why it was clear to me back when this site started in 2013 that Intel needed to become a foundry; unfortunately the company didn’t follow my advice, preferring to see their stock price soar on the back of cloud server demand. Fast forward to 2021 and Intel — now no longer on the leading edge, and with its cloud server business bleeding share to a resurgent AMD on TSMC’s superior process — tried, under the leadership of Pat Gelsinger, to become a foundry; unfortunately the company’s diminishing cash position is larger than its foundry customer base, which is mostly experimental chips or x86 variants.

    Intel’s core problem goes back to the observation above: becoming a foundry is about more than having the leading edge process; Intel might have been able to develop those skills in conjunction with customers eager to be on the best process in the world, but once Intel didn’t even have that, it had nothing to offer. There simply is no reason for an Apple or AMD or Nvidia to take the massive risk entailed in working with Intel when TSMC is an option.

    China and a Changing World

    TSMC is, of course, headquartered in Taiwan; that is where the company’s R&D and leading edge fabs are located, along with most of its trailing edge capacity. SMIC, obviously, is in China; another foundry is Samsung, in South Korea. I told the story as to why so much of this industry ended up in Asia last fall in A Chance to Build:

    Semiconductors are so integral to the history of Silicon Valley that they give the region its name, and, more importantly, its culture: chips require huge amounts of up-front investment, but they have, relative to most other manufactured goods, minimal marginal costs; this economic reality helped drive the development of the venture capital model, which provided unencumbered startup capital to companies who could earn theoretically unlimited returns at scale. This model worked even better with software, which was perfectly replicable.

    That history starts in 1956, when William Shockley founded the Shockley Semiconductor Laboratory to commercialize the transistor that he had helped invent at Bell Labs; he chose Mountain View to be close to his ailing mother. A year later the so-called “Traitorous Eight”, led by Robert Noyce, left and founded Fairchild Semiconductor down the road. Six years after that Fairchild Semiconductor opened a facility in Hong Kong to assemble and test semiconductors. Assembly required manually attaching wires to a semiconductor chip, a labor-intensive and monotonous task that was difficult to do economically with American wages, which ran about $2.50/hour; Hong Kong wages were a tenth of that. Four years later Texas Instruments opened a facility in Taiwan, where wages were $0.19/hour; two years after that Fairchild Semiconductor opened another facility in Singapore, where wages were $0.11/hour.

    In other words, you can make the case that the classic story of Silicon Valley isn’t completely honest. Chips did have marginal costs, but that marginal cost was, within single digit years of the founding of Silicon Valley, exported to Asia.

    I recounted in that Article about how this outsourcing was an intentional policy of the U.S. government, and launched into a broader discussion about the post-War Pax Americana global order that placed the U.S. consumer market at the center of global trade, denominated by the dollar, and why that led to an inevitable decline in American manufacturing and the rise of a country in China that, in retrospect, was simply too big, and thus too expensive, for America to bear.

    That, anyways, is how one might frame many of the signals coming out of the 2nd Trump administration, including what appears to be a Monroe 2.0 Doctrine approach to North America, an attempt to extricate the U.S. from the Ukraine conflict specifically and Europe broadly, and, well, a perhaps tamer approach to China to start, at least compared to Trump’s rhetoric on the campaign trail.

    One possibility is that Trump is actually following through on the “pivot to Asia” that U.S. Presidents have been talking about but failing to execute on for years; in this view the U.S. is girding itself up to defend Taiwan and other entities in Asia, and hopefully break up the burgeoning China-Russia relationship in the process.

    The other explanation is more depressing, but perhaps more realistic: President Trump may believe that the unipolar U.S.-dominated world that has been the norm since the fall of the Soviet Union is drawing to a close, and it’s better for the U.S. to proactively shift to a new norm than to have it forced upon them.

    The important takeaway that is relevant to this Article is that Taiwan is the flashpoint in both scenarios. A pivot to Asia is about gearing up to defend Taiwan from a potential Chinese invasion or embargo; a retrenchment to the Americas is about potentially granting — or acknowledging — China as the hegemon of Asia, which would inevitably lead to Taiwan’s envelopment by China.

    This is, needless to say, a discussion where I tread gingerly, not least because I have lived in Taipei off and on for over two decades. And, of course, there is the moral component entailed in Taiwan being a vibrant democracy with a population that has no interest in reunification with China. To that end, the status quo has been simultaneously absurd and yet surprisingly sustainable: Taiwan is an independent country in nearly every respect, with its own border, military, currency, passports, and — pertinent to tech — economy, increasingly dominated by TSMC; at the same time, Taiwan has not declared independence, and the official position of the United States is to acknowledge that China believes Taiwan is theirs, without endorsing either that position or Taiwanese independence.

    Chinese and Taiwanese do, in my experience, handle this sort of ambiguity much more easily than do Americans; still, gray zones only go so far. What has been just as important are realist factors like military strength (once in favor of Taiwan, now decidedly in favor of China), economic ties (extremely deep between Taiwan and China, and China and the U.S.), and war-waging credibility. Here the Ukraine conflict and the resultant China-Russia relationship looms large, thanks to the sharing of military technology and overland supply chains for oil and food that have resulted, even as the U.S. has depleted itself. That, by extension, gets at another changing factor: the hollowing out of American manufacturing under Pax Americana has been directly correlated with China’s dominance of the business of making things, the most essential war-fighting capability.

    Still, there is — or rather was — a critical factor that might give China pause: the importance of TSMC. Chips undergird every aspect of the modern economy; the rise of AI, and the promise of the massive gains that might result, only make this need even more pressing. And, as long as China needs TSMC chips, they have a powerful incentive to leave Taiwan alone.

    Trump, Taiwan, and TSMC

    Anyone who has been following the news for the last few years, however, can surely see the problem: the various iterations of the chip ban, going back to the initial action against ZTE in 2018, have the perhaps-unintended effect of making China less dependent on TSMC. I wrote at the time of the ZTE ban:

    What seems likely to happen in the long run is a separation at the hardware layer as well; China is already investing heavily in chips, and this action will certainly spur the country to focus on the sort of relatively low-volume high-precision components that other countries like the U.S., Taiwan, and Japan specialize in (to date it has always made more sense for Chinese companies to focus on higher-volume lower-precision components). To catch up will certainly take time, but if this action harms ZTE as much as it seems it will I suspect the commitment will be even more significant than it already is.

    I added two years later, after President Trump barred Huawei from TSMC chips in 2020:

    I am, needless to say, not going to get into the finer details of the relationship between China and Taiwan (and the United States, which plays a prominent role); it is less that reasonable people may disagree and more that expecting reasonableness is probably naive. It is sufficient to note that should the United States and China ever actually go to war, it would likely be because of Taiwan.

    In this TSMC specifically, and the Taiwan manufacturing base generally, are a significant deterrent: both China and the U.S. need access to the best chip maker in the world, along with a host of other high-precision pieces of the global electronics supply chain. That means that a hot war, which would almost certainly result in some amount of destruction to these capabilities, would be devastating…one of the risks of cutting China off from TSMC is that the deterrent value of TSMC’s operations is diminished.

    Now you can see the fly in Goldilocks’ porridge! China would certainly like the best chips from TSMC, but they are figuring out how to manage with SMIC and the Ascend and surprisingly efficient state-of-the-art models; the entire AI economy in the U.S., on the other hand — the one that is developing so nicely, with private funding pursuing the frontier, and competition and innovation up-and-down the stack — is completely dependent on TSMC and Taiwan. We have created a situation where China is less dependent on Taiwan, even while we are more dependent on the island.

    This is the necessary context for two more will-he-or-won’t-he ideas floated by President Trump; both are summarized in this Foreign Policy article:

    U.S. President Donald Trump has vowed to impose tariffs on Taiwan’s semiconductor industry and has previously accused Taiwan of stealing the U.S. chip industry…The primary strategic goal for the administration is to revitalize advanced semiconductor manufacturing in the United States…As the negotiations between TSMC and the White House unfold, several options are emerging.

    The most discussed option is a deal between TSMC, Intel, the U.S. government, and U.S. chip designers such as Broadcom and Qualcomm. Multiple reports indicate that the White House has proposed a deal that would have TSMC acquire a stake in Intel Foundry Services and take a leading role in its operations after IFS separated from Intel. Other reports suggest a potential joint venture involving TSMC, Intel, the U.S. government, and industry partners, with technology transfer and technical support from TSMC.

    The motivation for such a proposal is clear: Intel’s board, who fired Gelsinger late last year, seems to want out of the foundry business, and Broadcom or Qualcomm are natural landing places for the design division; the U.S., however, is the entity that needs a leading edge foundry in the U.S., and the Trump administration is trying to compel TSMC to make it happen.

    Unfortunately, I don’t think this plan is a good one. It’s simply not possible for one foundry to “take over” another: while the final output is the same — a microprocessor — nearly every step of the process is different in a multitude of ways. Transistors — even ones of the same class — can have different dimensions, with different layouts (TSMC, for example, packs its transistors more densely); production lines can be organized differently, to serve different approaches to lithography; chemicals are tuned to individual processes, and can’t be shared; equipment is tailored to a specific line, and can’t be switched out; materials can differ, throughout the chip, along with how exactly they are prepared and applied. Sure, most of the equipment could be repurposed, but one doesn’t simply layer a TSMC process onto an Intel fab! The best you could hope for is that TSMC could rebuild the fabs using the existing equipment according to their specifications.

    That, though, doesn’t actually solve the Taiwan problem: TSMC is still headquartered in Taiwan, still has its R&D division there, and is still beholden to a Taiwanese government directive to not export its most cutting edge processes (and yes, there is truth to Trump’s complaints that Taiwan sees TSMC as leverage to guarantee that the U.S. defends Taiwan in the event of a Chinese invasion). Moreover, the U.S. chip problem isn’t just about the leading edge, but also the trailing edge. I wrote in Chips and China:

    It’s worth pointing out, though, that this is producing a new kind of liability for the U.S., and potentially more danger for Taiwan…these aren’t difficult chips to make, but that is precisely why it makes little sense to build new trailing edge foundries in the U.S.: Taiwan already has it covered (with the largest marketshare in both categories), and China has the motivation to build more just so it can learn.

    What, though, if TSMC were taken off the board?

    Much of the discussion around a potential invasion of Taiwan — which would destroy TSMC (foundries don’t do well in wars) — centers around TSMC’s lead in high end chips. That lead is real, but Intel, for all of its struggles, is only a few years behind. That is a meaningful difference in terms of the processors used in smartphones, high performance computing, and AI, but the U.S. is still in the game. What would be much more difficult to replace are, paradoxically, trailing node chips, made in fabs that Intel long ago abandoned…

    The more that China builds up its chip capabilities — even if that is only at trailing nodes — the more motivation there is to make TSMC a target, not only to deny the U.S. its advanced capabilities, but also the basic chips that are more integral to everyday life than we ever realized.

    It’s good that the administration is focused on the issue of TSMC and Taiwan: what I’m not sure anyone realizes is just how deep the dependency goes, and just how vulnerable the U.S. — and our future in AI — really is.

    What To Do

    Everything that I’ve written until now has been, in some respects, trivial: it’s easy to identify problems and criticize proposed solutions; it’s much more difficult to come up with solutions of one’s own. The problem is less the need for creative thinking and more the courage to make trade-offs: the fact of the matter is that there are no good solutions to the situation the U.S. has got itself into with regards to Taiwan and chips. That is a long-winded way to say that the following proposal includes several ideas that, in isolation, I find some combination of distasteful, against my principles, and even downright dangerous. So here goes.

    End the China Chip Ban

    The first thing the U.S. should do — and, by all means, make this a negotiating plank in a broader agreement with China — is let Chinese companies, including Huawei, make chips at TSMC, and further, let Chinese companies buy top-of-the-line Nvidia chips.

    The Huawei one is straightforward: Huawei’s founder may have told Chinese President Xi Jinping that Huawei doesn’t need external chip makers, but I think that the reality of having access to cutting edge TSMC fabrication would show that the company’s revealed preference would be for better chips than Huawei can get from SMIC — and the delta is only going to grow. Sure, Huawei would still work with SMIC, but the volume would go down; critically, so would the urgency of having no other choice. This, by extension, would restart China’s dependency on TSMC, thereby increasing the cost of making a move on Taiwan.

    At the same time, giving Huawei access to cutting edge chips would be a significant threat to Nvidia’s dominance; the reason the company is so up-in-arms about the chip ban isn’t simply foregone revenue but the forced development of an alternative to their CUDA ecosystem. The best way to neuter that challenge — and it is in the U.S.’s interest to have Nvidia in control, not Huawei — is to give companies like Bytedance, Alibaba, and DeepSeek the opportunity to buy the best.

    This does, without question, unleash China in terms of AI; preventing that has been the entire point of the various flavors of chip bans that came down from the Biden administration. DeepSeek’s success, however, should force a re-evaluation about just how viable it is to completely cut China off from AI.

    It’s also worth noting that success in stopping China’s AI efforts has its own risks: another reason why China has held off from moving against Taiwan is the knowledge that every year they wait increases their relative advantages in all the real world realities I listed above; that makes it more prudent to wait. The prospect of the U.S. developing the sort of AI that matters in a military context, however, even as China is cut off, changes that calculus: now the prudent course is to move sooner rather than later, particularly if the U.S. is dependent on Taiwan for the chips that make that AI possible.

    Double Down on the Semiconductor Equipment Ban

    While I’ve continually made references to “chip bans”, that’s actually incomplete: the U.S. has also made moves to limit China’s access to semiconductor equipment necessary for making leading edge chips (SMIC’s 7nm process, for example, is almost completely dependent on western semiconductor equipment). Unfortunately, this effort has mostly been a failure, thanks to generous loopholes that are downstream from China being a large market for U.S. semiconductor equipment manufacturers.

    It’s time for those loopholes to go away; remember, the overriding goal is for China to increase its dependence on Taiwan, and that means cutting SMIC and China’s other foundries off at the knees. Yes, this increases the risk that China will develop its own alternatives to western semiconductor manufacturers, leading to long-term competition and diminished money for R&D, but this is a time for hard choices and increasing Taiwan’s importance to China is more important.

    Build Trailing Edge Fabs in the U.S.

    The U.S.’s dependency on TSMC for trailing edge chip capacity remains a massive problem; if you think the COVID chip shortages were bad, then a scenario where the U.S. is stuck with GlobalFoundries and no one else is a disaster so great it is hard to contemplate. However, as long as TSMC exists, there is zero economic rationale for anyone to build more trailing edge fabs.

    This, then, is a textbook example of where government subsidies are the answer: there is a national security need for trailing edge capacity, and no economic incentive to build it. And, as an added bonus, this helps fill in some of the revenue for semiconductor manufacturers who are now fully cut off from China. TSMC takes a blow, of course, but they are also being buttressed by orders from Huawei and other Chinese chip makers.

    Intel and the Leading Edge

    That leaves Intel and the need for native leading edge capacity, and this is in some respects the hardest problem to solve.

    First, the U.S. should engineer a spin-off of Intel’s x86 chip business to Broadcom or Qualcomm at a nominal price; the real cost for the recipient company will be guaranteed orders for not just Intel chips but also a large portion of their existing chips for Intel Foundry. This will provide the foundational customer to get Intel Foundry off the ground.

    Second, the U.S. should offer to subsidize Nvidia chips made at Intel Foundry. Yes, this is an offer worth billions of dollars, but it is the shortest, fastest route to ground the U.S. AI industry in U.S. fabs.

    Third, if Nvidia declines — and they probably will, given the risks entailed in a foundry change — then the U.S. should make a massive order for Intel Gaudi AI accelerators, build data centers to house them, and make them freely available to companies and startups who want to build their own AI models, with the caveat that everything is open source.

    Fourth, the U.S. should heavily subsidize chip startups to build at Intel Foundry, with the caveat that all of the resultant IP that is developed to actually build chips — the basic building blocks, that are separate from the “secret sauce” of the chip itself — is open-sourced.

    Fifth, the U.S. should indemnify every model created on U.S.-manufactured chips against any copyright violations, with the caveat that the data used to train the model must be made freely available.


    Here is the future state the U.S. wants to get to: a strong AI industry running on U.S.-made chips, along with trailing edge capacity that is beyond the reaches of China. Getting there, however, will take significant interventions into the market to undo the overwhelming incentives for U.S. companies to simply rely on TSMC; even then, such a shift will take time, which is why making Taiwan indispensable to China’s technology industry is the price that needs to be paid in the meantime.

    AI is in an exciting place; it’s also a very precarious one. I believe this plan, with all of the risks and sacrifices it entails, is the best way to ensure that all of the trees that are sprouting have time to actually take root and change the world.

    I wrote a follow-up to this Article in this Daily Update.



    1. This suggests a surprising takeaway: it’s possible that while RLHF on ChatGPT and especially Claude block off the 4chan elements, they also tamp down the Tumblr elements, which is to say the politics don’t come from the post-training, but from the dataset — i.e. the Internet. In other words, if I’m right about Grok 3 having a much lighter layer of RLHF, then that explains both the surface politics, and what is available under the surface. 

    2. Grok doesn’t yet have a Mac app, but its iPhone app is very good 

    3. Although Nvidia’s first chip was made by SGS-Thomson Microelectronics 


    Get notified about new Articles


  • Deep Research and Knowledge Value

    Listen to this post:

    “When did you feel the AGI?”

    This is a question that has been floating around AI circles for a while, and it’s a hard one to answer for two reasons. First, what is AGI, and second, “feel” is a bit like obscenity: as Supreme Court Justice Potter Stewart famously said in Jacobellis v. Ohio, “I know it when I see it.”

    I gave my definition of AGI in AI’s Uneven Arrival:

    What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford…My definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place).

    The “feel” part of that question is a more recent discovery: DeepResearch from OpenAI feels like AGI; I just got a new employee for the shockingly low price of $200/month.

    Deep Research Bullets

    OpenAI announced Deep Research in a February 2 blog post:

    Today we’re launching deep research in ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours.

    Deep research is OpenAI’s next agent that can do work for you independently — you give it a prompt, and ChatGPT will find, analyze, and synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst. Powered by a version of the upcoming OpenAI o3 model that’s optimized for web browsing and data analysis, it leverages reasoning to search, interpret, and analyze massive amounts of text, images, and PDFs on the internet, pivoting as needed in reaction to information it encounters.

    The ability to synthesize knowledge is a prerequisite for creating new knowledge. For this reason, deep research marks a significant step toward our broader goal of developing AGI, which we have long envisioned as capable of producing novel scientific research.

    It’s honestly hard to keep track of OpenAI’s AGI definitions these days — CEO Sam Altman, just yesterday, defined it as “a system that can tackle increasingly complex problems, at human level, in many fields” — but in my rather more modest definition Deep Research sits right in the middle of that excerpt: it synthesizes research in an economically valuable way, but doesn’t create new knowledge.

    I already published two examples of Deep Research in last Tuesday’s Stratechery Update. While I suggest reading the whole thing, to summarize:

    • First, I published my (brief) review of Apples recent earnings, including three observations:

      • It was notable that Apple earned record revenue even though iPhone sales were down year-over-year, in the latest datapoint about the company’s transformation into a Services juggernaut.
      • China sales were down again, but this wasn’t a new trend: it actually goes back nearly a decade, but you can only see that if you realize how the Huawei chip ban gave Apple a temporary boost in the country.
      • While Apple executives claimed that Apple Intelligence drove iPhone sales, there really wasn’t any evidence in the geographic sales numbers supporting that assertion.
    • Second, I published a Deep Research report using a generic prompt:

      I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings in the style and voice of Stratechery that is in line with my previous analysis.

    • Third, I published a Deep Research report using a prompt that incorporated my takeaways from the earnings:

      I am Ben Thompson, the author of Stratechery. This is important information because I want you to understand my previous analysis of Apple, and the voice in which I write on Stratechery. I want a research report about Apple's latest earnings for fiscal year 2025 q1 (calendar year 2024 q4). There are a couple of angles I am particularly interested in:

      - First, there is the overall trend of services revenue carrying the companies earnings. How has that trend continued, what does it mean for margins, etc.

      - Second, I am interested in the China angle. My theory is that Apple's recent decline in China is not new, but is actually part of a longer trend going back nearly a decade. I believe that trend was arrested by the chip ban on Huawei, but that that was only a temporary bump in terms of a long-term decline. In addition, I would like to marry this to deeper analysis of the Chinese phone market, the distinction between first tier cities and the rest of China, and what that says about Apple's prospects in the country.

      - Third, what takeaways are there about Apple's AI prospects? The company claims that Apple Intelligence is helping sales in markets where it has launched, but isn't this a function of not being available in China?

      Please deliver this report in a format and style that is suitable for Stratechery.

    You can read the Update for the output, but this was my evaluation:

    The first answer was decent given the paucity of instruction; it’s really more of a summary than anything, but there are a few insightful points. The second answer was considerably more impressive. This question relied much more heavily on my previous posts, and weaved points I’ve made in the past into the answer. I don’t, to be honest, think I learned anything new, but I think that anyone encountering this topic for the first time would have. Or, to put it another way, were I looking for a research assistant, I would consider hiring whoever wrote the second answer.

    In other words, Deep Research isn’t a rifle barrel, but for this question at least, it was a pretty decent piece of ammunition.

    DeepResearch Examples

    Still, that ammunition wasn’t that valuable to me; I read the transcript of Apple’s earnings call before my 8am Dithering recording and came up with my three points immediately; that’s the luxury of having thought about and covered Apple for going on twelve years. And, as I noted above, the entire reason that the second Deep Research report was interesting was because I came up with the ideas and Deep Research substantiated them; the substantiation, however, wasn’t nearly to the standard (in my very biased subjective opinion!) of a Stratechery Update.

    I found a much more beneficial use case the next day. Before I conduct a Stratechery Interview I do several hours of research on the person I am interviewing, their professional background, the company they work for, etc.; in this case I was talking to Bill McDermott, the Chairman and CEO of ServiceNow, a company I am somewhat familiar with but not intimately so. So, I asked Deep Research for help:

    I am going to conduct an interview with Bill McDermott, the CEO of ServiceNow, and I need to do research about both McDermott and ServiceNow to prepare my questions.

    First, I want to know more about McDermott and his background. Ideally there are some good profiles of him I can read. I know he used to work at SAP and I would like to know what is relevant about his experience there. Also, how and why did he take the ServiceNow job?

    Then, what is the background of ServiceNow? How did it get started? What was its initial product-market fit, and how has it expanded over time? What kind of companies use ServiceNow?

    What is the ServiceNow business model? What is its go-to-market strategy?

    McDermott wants to talk about ServiceNow's opportunities in AI. What are those opportunities, and how are they meaningfully unique, or different from simple automation?

    What do users think of ServiceNow? Is it very ugly and hard to use? Why is it very sticky? What attracts companies to it?

    What competitors does ServiceNow have? Can it be a platform for other companies? Or is there an opportunity to disrupt ServiceNow?

    What other questions do you have that would be useful for me to ask?

    You can use previous Stratechery Interviews as a resource to understand the kinds of questions I typically ask.

    I found the results eminently useful, although the questions were pretty mid; I did spend some time doing some additional reading of things like earnings reports before conducting the Interview with my own questions. In short, it saved me a fair bit of time and gave me a place to start from, and that alone more than paid for my monthly subscription.

    Another compelling example came in researching a friend’s complicated medical issue; I’m not going to share my prompt and results for obvious reasons. What I will note is that this friend has been struggling with this issue for over a year, and has seen multiple doctors and tried several different remedies. Deep Research identified a possible issue in ten minutes that my friend has only just learned about from a specialist last week; while it is still to be determined if this is the answer he is looking for, it is notable that Deep Research may have accomplished in ten minutes what has taken my friend many hours over many months with many medical professionals.

    It is the final example, however, that is the most interesting, precisely because it is the question on which Deep Research most egregiously failed. I generated a report about another friend’s industry, asking for the major players, supply chain analysis, customer segments, etc. It was by far my most comprehensive and detailed prompt. And, sure enough, Deep Research came back with a fully fleshed out report answering all of my questions.

    It was also completely wrong, but in a really surprising way. The best way to characterize the issue is to go back to that famous Donald Rumsfeld quote:

    There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns — the ones we don’t know we don’t know.

    The issue with the report I generated — and once again, I’m not going to share the results, but this time for reasons that are non-obvious — is that it completely missed a major entity in the industry in question. This particular entity is not a well-known brand, but is a major player in the supply chain. It is a significant enough entity that any report about the industry that did not include them is, if you want to be generous, incomplete.

    It is, in fact, the fourth categorization that Rumsfeld didn’t mention: “the unknown known.” Anyone who read the report that Deep Research generated would be given the illusion of knowledge, but would not know what they think they know.

    Knowledge Value

    One of the most painful lessons of the Internet was the realization by publishers that news was worthless. I’m not speaking about societal value, but rather economic value: something everyone knows is both important and also non-monetizable, which is to say that the act of publishing is economically destructive. I wrote in Publishers and the Pursuit of the Past:

    Too many newspaper advocates utterly and completely fail to understand this; the truth is that newspapers made money in the past not by providing societal value, but by having quasi-monopolistic control of print advertising in their geographic area; the societal value was a bonus. Thus, when Chavern complains that “today’s internet distribution systems distort the flow of economic value derived from good reporting”, he is in fact conflating societal value with economic value; the latter does not exist and has never existed.

    This failure to understand the past leads to a misdiagnosis of the present: Google and Facebook are not profitable because they took newspapers’ reporting, they are profitable because they took their advertising. Moreover, the utility of both platforms is so great that even if all newspaper content were magically removed — which has been tried in Europe — the only thing that would change is that said newspapers would lose even more revenue as they lost traffic.

    This is why this solution is so misplaced: newspapers no longer have a monopoly on advertising, can never compete with the Internet when it comes to bundling content, and news remains both valuable to society and, for the same reasons, worthless economically (reaching lots of people is inversely correlated to extracting value, and facts — both real and fake ones — spread for free).

    It is maybe a bit extreme to say has always been such; in truth it is very hard to draw direct lines from the analog era, defined as it was by friction and scarcity, to the Internet era’s transparency and abundance. It may have technically been the case that those of us old enough to remember newsstands bought the morning paper because a local light manufacturing company owned printing presses, delivery trucks, and an advertising sales team, but we too believed we simply wanted to know what was happening. Now we get that need fulfilled for free, and probably by social media (for better or worse); I sometimes wish I knew less!

    Still, what Deep Research reveals is how much more could be known. I read a lot of things on the Internet, but it’s not as if I will ever come close to reading everything. Moreover, as the amount of slop increases — whether human or AI generated — the difficulty in finding the right stuff to read is only increasing. This is also one problem with Deep Research that is worth pointing out: the worst results are often, paradoxically, for the most popular topics, precisely because those are the topics that are the most likely to be contaminated by slop. The more precise and obscure the topic, the more likely it is that Deep Research will have to find papers and articles that actually cover the topic well:

    This graph, however, is only half complete, as the example of my friend’s industry shows:

    There is a good chance that Deep Research, particularly as it evolves, will become the most effective search engine there has ever been; it will find whatever information there is to find about a particular topic and present it in a relevant way. It is the death, in other words, of security through obscurity. Previously we shifted from a world where you had to pay for the news to the news being fed to you; now we will shift from a world where you had to spend hours researching a topic to having a topic reported to you on command.

    Unless, of course, the information that matters is not on the Internet. This is why I am not sharing the Deep Research report that provoked this insight: I happen to know some things about the industry in question — which is not related to tech, to be clear — because I have a friend who works in it, and it is suddenly clear to me how much future economic value is wrapped up in information not being public. In this case the entity in question is privately held, so there aren’t stock market filings, public reports, barely even a webpage! And so AI is blind.

    There is another example, this time in tech, of just how valuable secrecy can be. Amazon launched S3, the first primitive offered by AWS, in 2006, followed by EC2 later that year, and soon transformed startups and venture capital. What wasn’t clear was to what extent AWS was transforming Amazon; the company slowly transitioned Amazon.com to AWS, and that was reason enough to list AWS’s financials under Amazon.com until 2012, and then under “Other” — along with things like credit card and (then small amounts of) advertising revenue — after that.

    The grand revelation would come in 2015, when Amazon announced in January that it would break AWS out into a separate division for reporting purposes. From a Reuters report at the time:

    After years of giving investors the cold shoulder, Amazon.com Inc is starting to warm up to Wall Street. The No. 1 U.S. online retailer was unusually forthcoming during its fourth-quarter earnings call on Thursday, saying it will break out results this year, for the first time, for its fast-growing cloud computing unit, Amazon Web Services

    The additional information shared during Amazon’s fourth-quarter results as well as its emphasis on becoming more efficient signaled a new willingness by Amazon executives to listen to investors as well. “This quarter, Amazon flexed its muscles and said this is what we can do when we focus on profits,” said Rob Plaza, senior equity analyst for Key Private Bank. “If they could deliver that upper teens, low 20s revenue growth and be able to deliver profits on top of that, the stock is going to respond.” The change is unlikely to be dramatic. When asked whether this quarter marked a permanent shift in Amazon’s relationship with Wall Street, Plaza laughed: “I wouldn’t be chasing the stock here based on that.”

    Still, the shift is a good sign for investors, who have been clamoring for Amazon to disclose more about its fastest-growing and likely most profitable division that some analysts say accounts for 4 percent of total sales.

    In fact, AWS accounted for nearly 7 percent of total sales, and it was dramatically more profitable than anyone expected. The revelation caused such a massive uptick in the stock price that I called it The AWS IPO:

    One of the technology industry’s biggest and most important IPOs occurred late last month, with a valuation of $25.6 billion dollars. That’s more than Google, which IPO’d at a valuation of $24.6 billion, and certainly a lot more than Amazon, which finished its first day on the public markets with a valuation of $438 million. Don’t feel too bad for the latter, though: the “IPO” I’m talking about was Amazon Web Services, and it just so happens to still be owned by the same e-commerce company that went public nearly 20 years ago.

    I’m obviously being facetious; there was no actual IPO for AWS, just an additional line item on Amazon’s financial reports finally breaking out the cloud computing service Amazon pioneered nine years ago. That line item, though, was almost certainly the primary factor in driving an overnight increase in Amazon’s market capitalization from $182 billion on April 23 to $207 billion on April 24. It’s not only that AWS is a strong offering in a growing market with impressive economics, it also may, in the end, be the key to realizing the potential of Amazon.com itself.

    That $25.6 billion increase in market cap, however, came with its own costs: both Microsoft and Google doubled down on their own cloud businesses in response, and while AWS is still the market leader, it faces stiff competition. That’s a win for consumers and customers, but also a reminder that known unknowns have a value all their own.

    Surfacing Data

    I wouldn’t go so far as to say that Amazon was wrong to disclose AWS’s financials. In fact, SEC rules would have required as much once AWS revenue became 10% of the company’s overall business (today it is 15%, which might seem low until you remember that Amazon’s top-line revenue includes first-party e-commerce sales). Moreover, releasing AWS’s financials gave investors renewed confidence in the company, giving management freedom to continue investing heavily in capital expenditures for both AWS and the e-commerce business, fueling Amazon’s transformation into a logistics company. The point, rather, is to note that secrets are valuable.

    What is interesting to consider is what this means for AI tools like Deep Research. Hedge funds have long known the value of proprietary data, paying for everything from satellite images to traffic observers and everything in between in order to get a market edge. My suspicion is that work like this is going to become even more valuable as security by obscurity disappears; it’s going to be more difficult to harvest alpha from reading endless financial filings when an AI can do that research in a fraction of the time.1

    The problem with those hedge fund reports, however, is that they themselves are proprietary; however, they are not a complete secret. After all, the way to monetize that research is through making trades on the open market, which is to say those reports have an impact on prices. Pricing is a signal that is available to everyone, and it’s going to become an increasingly important one.

    That, by extension, is why AIs like Deep Research are one of the most powerful arguments yet for prediction markets. Prediction markets had their moment in the sun last fall during the U.S. presidential election, when they were far more optimistic about a Trump victory than polls. However, the potential — in fact, the necessity — of prediction markets is only going to increase with AI. AI’s capability of knowing everything that is public is going to increase the incentive to keep things secret; prediction markets in everything will provide a profit incentive for knowledge to be disseminated, by price if nothing else.

    It is also interesting that prediction markets have become associated with crypto, another technology that is poised to come into its own in an AI-dominated world; infinite content generation increases the value of digital scarcity and verification, just as infinite transparency increases the value of secrecy. AI is likely to be the key to tying all of this together: a combination of verifiable information and understandable price movements may the only way to derive any meaning from the slop that is slowly drowning the Internet.

    This is the other reality of AI, and why it is inescapable. Just as the Internet’s transparency and freedom to publish has devolved into torrents of information of questionable veracity, requiring ever more heroic efforts to parse, and undeniable opportunities to thrive by building independent brands — like this site — AI will both be the cause of further pollution of the information ecosystem and, simultaneously, the only way out.

    Deep Research Impacts

    Much of this is in the (not-so-distant) future; for now Deep Research is one of the best bargains in technology. Yes, $200/month is a lot, and yes, Deep Research is limited by the quality of information on the Internet and is highly dependent on the quality of the prompt. I can’t say that I’ve encountered any particular sparks of creativity, at least in arenas that I know well, but at the same time, there is a lot of work that isn’t creative in nature, but necessary all the same. I personally feel much more productive, and, truth be told, I was never going to hire a researcher anyways.

    That, though, speaks to the peril in two distinct ways. First, one reason I’ve never hired a researcher is that I see tremendous value in the search for and sifting of information. There is so much you learn on the way to a destination, and I value that learning; will serendipity be an unwelcome casualty to reports on demand? Moreover, what of those who haven’t — to take the above example — been reading Apple earnings reports for 12 years, or thinking and reading about technology for three decades? What will be lost for the next generation of analysts?

    And, of course, there is the job question: lots of other entities employ researchers, in all sorts of fields, and those salaries are going to be increasingly hard to justify. I’ve known intellectually that AI would replace wide swathes of knowledge work; it is another thing to feel it viscerally.

    At the same time, that is why the value of secrecy is worth calling out. Secrecy is its own form of friction, the purposeful imposition of scarcity on valuable knowledge. It speaks to what will be valuable in an AI-denominated future: yes, the real world and human-denominated industries will rise in economic value, but so will the tools and infrastructure that both drive original research and discoveries, and the mechanisms to price it. The power of AI, at least on our current trajectory, comes from knowing everything; the (perhaps doomed) response of many will be to build walls, toll gates, and marketplaces to protect and harvest the fruits of their human expeditions.


    1. I don’t think Deep Research is good at something like this, at least not yet. For example, I generated a report about what happened in 2015 surrounding Amazon’s disclosure, and the results were pretty poor; this is, however, the worst the tool will ever be. 


    Get notified about new Articles


  • DeepSeek FAQ

    Listen to this post:

    It’s Monday, January 27. Why haven’t you written about DeepSeek yet?

    I did! I wrote about R1 last Tuesday.

    I totally forgot about that.

    I take responsibility. I stand by the post, including the two biggest takeaways that I highlighted (emergent chain-of-thought via pure reinforcement learning, and the power of distillation), and I mentioned the low cost (which I expanded on in Sharp Tech) and chip ban implications, but those observations were too localized to the current state of the art in AI. What I totally failed to anticipate were the broader implications this news would have to the overall meta-discussion, particularly in terms of the U.S. and China.

    Is there precedent for such a miss?

    There is. In September 2023 Huawei announced the Mate 60 Pro with a SMIC-manufactured 7nm chip. The existence of this chip wasn’t a surprise for those paying close attention: SMIC had made a 7nm chip a year earlier (the existence of which I had noted even earlier than that), and TSMC had shipped 7nm chips in volume using nothing but DUV lithography (later iterations of 7nm were the first to use EUV). Intel had also made 10nm (TSMC 7nm equivalent) chips years earlier using nothing but DUV, but couldn’t do so with profitable yields; the idea that SMIC could ship 7nm chips using their existing equipment, particularly if they didn’t care about yields, wasn’t remotely surprising — to me, anyways.

    What I totally failed to anticipate was the overwrought reaction in Washington D.C. The dramatic expansion in the chip ban that culminated in the Biden administration transforming chip sales to a permission-based structure was downstream from people not understanding the intricacies of chip production, and being totally blindsided by the Huawei Mate 60 Pro. I get the sense that something similar has happened over the last 72 hours: the details of what DeepSeek has accomplished — and what they have not — are less important than the reaction and what that reaction says about people’s pre-existing assumptions.

    So what did DeepSeek announce?

    The most proximate announcement to this weekend’s meltdown was R1, a reasoning model that is similar to OpenAI’s o1. However, many of the revelations that contributed to the meltdown — including DeepSeek’s training costs — actually accompanied the V3 announcement over Christmas. Moreover, many of the breakthroughs that undergirded V3 were actually revealed with the release of the V2 model last January.

    Is this model naming convention the greatest crime that OpenAI has committed?

    Second greatest; we’ll get to the greatest momentarily.

    Let’s work backwards: what was the V2 model, and why was it important?

    The DeepSeek-V2 model introduced two important breakthroughs: DeepSeekMoE and DeepSeekMLA. The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each.

    DeepSeekMoE, as implemented in V2, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.

    DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.

    I’m not sure I understood any of that.

    The key implications of these breakthroughs — and the part you need to understand — only became apparent with V3, which added a new approach to load balancing (further reducing communications overhead) and multi-token prediction in training (further densifying each training step, again reducing overhead): V3 was shockingly cheap to train. DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a mere $5.576 million.

    That seems impossibly low.

    DeepSeek is clear that these costs are only for the final training run, and exclude all other expenses; from the V3 paper:

    Lastly, we emphasize again the economical training costs of DeepSeek-V3, summarized in Table 1, achieved through our optimized co-design of algorithms, frameworks, and hardware. During the pre-training stage, training DeepSeek-V3 on each trillion tokens requires only 180K H800 GPU hours, i.e., 3.7 days on our cluster with 2048 H800 GPUs. Consequently, our pre- training stage is completed in less than two months and costs 2664K GPU hours. Combined with 119K GPU hours for the context length extension and 5K GPU hours for post-training, DeepSeek-V3 costs only 2.788M GPU hours for its full training. Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.

    So no, you can’t replicate DeepSeek the company for $5.576 million.

    I still don’t believe that number.

    Actually, the burden of proof is on the doubters, at least once you understand the V3 architecture. Remember that bit about DeepSeekMoE: V3 has 671 billion parameters, but only 37 billion parameters in the active expert are computed per token; this equates to 333.3 billion FLOPs of compute per token. Here I should mention another DeepSeek innovation: while parameters were stored with BF16 or FP32 precision, they were reduced to FP8 precision for calculations; 2048 H800 GPUs have a capacity of 3.97 exoflops, i.e. 3.97 billion billion FLOPS. The training set, meanwhile, consisted of 14.8 trillion tokens; once you do all of the math it becomes apparent that 2.8 million H800 hours is sufficient for training V3. Again, this was just the final run, not the total cost, but it’s a plausible number.

    Scale AI CEO Alexandr Wang said they have 50,000 H100s.

    I don’t know where Wang got his information; I’m guessing he’s referring to this November 2024 tweet from Dylan Patel, which says that DeepSeek had “over 50k Hopper GPUs”. H800s, however, are Hopper GPUs, they just have much more constrained memory bandwidth than H100s because of U.S. sanctions.

    Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense if you are using H800s.

    Meanwhile, DeepSeek also makes their models available for inference: that requires a whole bunch of GPUs above-and-beyond whatever was used for training.

    So was this a violation of the chip ban?

    Nope. H100s were prohibited by the chip ban, but not H800s. Everyone assumed that training leading edge models required more interchip memory bandwidth, but that is exactly what DeepSeek optimized both their model structure and infrastructure around.

    Again, just to emphasize this point, all of the decisions DeepSeek made in the design of this model only make sense if you are constrained to the H800; if DeepSeek had access to H100s, they probably would have used a larger training cluster with much fewer optimizations specifically focused on overcoming the lack of bandwidth.

    So V3 is a leading edge model?

    It’s definitely competitive with OpenAI’s 4o and Anthropic’s Sonnet-3.5, and appears to be better than Llama’s biggest model. What does seem likely is that DeepSeek was able to distill those models to give V3 high quality tokens to train on.

    What is distillation?

    Distillation is a means of extracting understanding from another model; you can send inputs to the teacher model and record the outputs, and use that to train the student model. This is how you get models like GPT-4 Turbo from GPT-4. Distillation is easier for a company to do on its own models, because they have full access, but you can still do distillation in a somewhat more unwieldy way via API, or even, if you get creative, via chat clients.

    Distillation obviously violates the terms of service of various models, but the only way to stop it is to actually cut off access, via IP banning, rate limiting, etc. It’s assumed to be widespread in terms of model training, and is why there are an ever-increasing number of models converging on GPT-4o quality. This doesn’t mean that we know for a fact that DeepSeek distilled 4o or Claude, but frankly, it would be odd if they didn’t.

    Distillation seems terrible for leading edge models.

    It is! On the positive side, OpenAI and Anthropic and Google are almost certainly using distillation to optimize the models they use for inference for their consumer-facing apps; on the negative side, they are effectively bearing the entire cost of training the leading edge, while everyone else is free-riding on their investment.

    Indeed, this is probably the core economic factor undergirding the slow divorce of Microsoft and OpenAI. Microsoft is interested in providing inference to its customers, but much less enthused about funding $100 billion data centers to train leading edge models that are likely to be commoditized long before that $100 billion is depreciated.

    Is this why all of the Big Tech stock prices are down?

    In the long run, model commoditization and cheaper inference — which DeepSeek has also demonstrated — is great for Big Tech. A world where Microsoft gets to provide inference to its customers for a fraction of the cost means that Microsoft has to spend less on data centers and GPUs, or, just as likely, sees dramatically higher usage given that inference is so much cheaper. Another big winner is Amazon: AWS has by-and-large failed to make their own quality model, but that doesn’t matter if there are very high quality open source models that they can serve at far lower costs than expected.

    Apple is also a big winner. Dramatically decreased memory requirements for inference make edge inference much more viable, and Apple has the best hardware for exactly that. Apple Silicon uses unified memory, which means that the CPU, GPU, and NPU (neural processing unit) have access to a shared pool of memory; this means that Apple’s high-end hardware actually has the best consumer chip for inference (Nvidia gaming GPUs max out at 32GB of VRAM, while Apple’s chips go up to 192 GB of RAM).

    Meta, meanwhile, is the biggest winner of all. I already laid out last fall how every aspect of Meta’s business benefits from AI; a big barrier to realizing that vision is the cost of inference, which means that dramatically cheaper inference — and dramatically cheaper training, given the need for Meta to stay on the cutting edge — makes that vision much more achievable.

    Google, meanwhile, is probably in worse shape: a world of decreased hardware requirements lessens the relative advantage they have from TPUs. More importantly, a world of zero-cost inference increases the viability and likelihood of products that displace search; granted, Google gets lower costs as well, but any change from the status quo is probably a net negative.

    I asked why the stock prices are down; you just painted a positive picture!

    My picture is of the long run; today is the short run, and it seems likely the market is working through the shock of R1’s existence.

    Wait, you haven’t even talked about R1 yet.

    R1 is a reasoning model like OpenAI’s o1. It has the ability to think through a problem, producing much higher quality results, particularly in areas like coding, math, and logic (but I repeat myself).

    Is this more impressive than V3?

    Actually, the reason why I spent so much time on V3 is that that was the model that actually demonstrated a lot of the dynamics that seem to be generating so much surprise and controversy. R1 is notable, however, because o1 stood alone as the only reasoning model on the market, and the clearest sign that OpenAI was the market leader.

    R1 undoes the o1 mythology in a couple of important ways. First, there is the fact that it exists. OpenAI does not have some sort of special sauce that can’t be replicated. Second, R1 — like all of DeepSeek’s models — has open weights (the problem with saying “open source” is that we don’t have the data that went into creating it). This means that instead of paying OpenAI to get reasoning, you can run R1 on the server of your choice, or even locally, at dramatically lower cost.

    How did DeepSeek make R1?

    DeepSeek actually made two models: R1 and R1-Zero. I actually think that R1-Zero is the bigger deal; as I noted above, it was my biggest focus in last Tuesday’s Update:

    R1-Zero, though, is the bigger deal in my mind. From the paper:

    In this paper, we take the first step toward improving language model reasoning capabilities using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO as the RL framework to improve model performance in reasoning. During training, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. After thousands of RL steps, DeepSeek-R1-Zero exhibits super performance on reasoning benchmarks. For instance, the pass@1 score on AIME 2024 increases from 15.6% to 71.0%, and with majority voting, the score further improves to 86.7%, matching the performance of OpenAI-o1-0912.

    Reinforcement learning is a technique where a machine learning model is given a bunch of data and a reward function. The classic example is AlphaGo, where DeepMind gave the model the rules of Go with the reward function of winning the game, and then let the model figure everything else on its own. This famously ended up working better than other more human-guided techniques.

    LLMs to date, however, have relied on reinforcement learning with human feedback; humans are in the loop to help guide the model, navigate difficult choices where rewards aren’t obvious, etc. RLHF was the key innovation in transforming GPT-3 into ChatGPT, with well-formed paragraphs, answers that were concise and didn’t trail off into gibberish, etc.

    R1-Zero, however, drops the HF part — it’s just reinforcement learning. DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers (a la AlphaGo), DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions.

    What emerged is a model that developed reasoning and chains-of-thought on its own, including what DeepSeek called “Aha Moments”:

    A particularly intriguing phenomenon observed during the training of DeepSeek-R1-Zero is the occurrence of an “aha moment”. This moment, as illustrated in Table 3, occurs in an intermediate version of the model. During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

    This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies. The “aha moment” serves as a powerful reminder of the potential of RL to unlock new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future.

    This is one of the most powerful affirmations yet of The Bitter Lesson: you don’t need to teach the AI how to reason, you can just give it enough compute and data and it will teach itself!

    Well, almost: R1-Zero reasons, but in a way that humans have trouble understanding. Back to the introduction:

    However, DeepSeek-R1-Zero encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates a small amount of cold-start data and a multi-stage training pipeline. Specifically, we begin by collecting thousands of cold-start data to fine-tune the DeepSeek-V3-Base model. Following this, we perform reasoning-oriented RL like DeepSeek-R1-Zero. Upon nearing convergence in the RL process, we create new SFT data through rejection sampling on the RL checkpoint, combined with supervised data from DeepSeek-V3 in domains such as writing, factual QA, and self-cognition, and then retrain the DeepSeek-V3-Base model. After fine-tuning with the new data, the checkpoint undergoes an additional RL process, taking into account prompts from all scenarios. After these steps, we obtained a checkpoint referred to as DeepSeek-R1, which achieves performance on par with OpenAI-o1-1217.

    This sounds a lot like what OpenAI did for o1: DeepSeek started the model out with a bunch of examples of chain-of-thought thinking so it could learn the proper format for human consumption, and then did the reinforcement learning to enhance its reasoning, along with a number of editing and refinement steps; the output is a model that appears to be very competitive with o1.

    Here again it seems plausible that DeepSeek benefited from distillation, particularly in terms of training R1. That, though, is itself an important takeaway: we have a situation where AI models are teaching AI models, and where AI models are teaching themselves. We are watching the assembly of an AI takeoff scenario in realtime.

    So are we close to AGI?

    It definitely seems like it. This also explains why Softbank (and whatever investors Masayoshi Son brings together) would provide the funding for OpenAI that Microsoft will not: the belief that we are reaching a takeoff point where there will in fact be real returns towards being first.

    But isn’t R1 now in the lead?

    I don’t think so; this has been overstated. R1 is competitive with o1, although there do seem to be some holes in its capability that point towards some amount of distillation from o1-Pro. OpenAI, meanwhile, has demonstrated o3, a far more powerful reasoning model. DeepSeek is absolutely the leader in efficiency, but that is different than being the leader overall.

    So why is everyone freaking out?

    I think there are multiple factors. First, there is the shock that China has caught up to the leading U.S. labs, despite the widespread assumption that China isn’t as good at software as the U.S.. This is probably the biggest thing I missed in my surprise over the reaction. The reality is that China has an extremely proficient software industry generally, and a very good track record in AI model building specifically.

    Second is the low training cost for V3, and DeepSeek’s low inference costs. This part was a big surprise for me as well, to be sure, but the numbers are plausible. This, by extension, probably has everyone nervous about Nvidia, which obviously has a big impact on the market.

    Third is the fact that DeepSeek pulled this off despite the chip ban. Again, though, while there are big loopholes in the chip ban, it seems likely to me that DeepSeek accomplished this with legal chips.

    I own Nvidia! Am I screwed?

    There are real challenges this news presents to the Nvidia story. Nvidia has two big moats:

    • CUDA is the language of choice for anyone programming these models, and CUDA only works on Nvidia chips.
    • Nvidia has a massive lead in terms of its ability to combine multiple chips together into one large virtual GPU.

    These two moats work together. I noted above that if DeepSeek had access to H100s they probably would have used a larger cluster to train their model, simply because that would have been the easier option; the fact they didn’t, and were bandwidth constrained, drove a lot of their decisions in terms of both model architecture and their training infrastructure. Just look at the U.S. labs: they haven’t spent much time on optimization because Nvidia has been aggressively shipping ever more capable systems that accommodate their needs. The route of least resistance has simply been to pay Nvidia. DeepSeek, however, just demonstrated that another route is available: heavy optimization can produce remarkable results on weaker hardware and with lower memory bandwidth; simply paying Nvidia more isn’t the only way to make better models.

    That noted, there are three factors still in Nvidia’s favor. First, how capable might DeepSeek’s approach be if applied to H100s, or upcoming GB100s? Just because they found a more efficient way to use compute doesn’t mean that more compute wouldn’t be useful. Second, lower inference costs should, in the long run, drive greater usage. Microsoft CEO Satya Nadella, in a late night tweet almost assuredly directed at the market, said exactly that:

    Third, reasoning models like R1 and o1 derive their superior performance from using more compute. To the extent that increasing the power and capabilities of AI depend on more compute is the extent that Nvidia stands to benefit!

    Still, it’s not all rosy. At a minimum DeepSeek’s efficiency and broad availability cast significant doubt on the most optimistic Nvidia growth story, at least in the near term. The payoffs from both model and infrastructure optimization also suggest there are significant gains to be had from exploring alternative approaches to inference in particular. For example, it might be much more plausible to run inference on a standalone AMD GPU, completely sidestepping AMD’s inferior chip-to-chip communications capability. Reasoning models also increase the payoff for inference-only chips that are even more specialized than Nvidia’s GPUs.

    In short, Nvidia isn’t going anywhere; the Nvidia stock, however, is suddenly facing a lot more uncertainty that hasn’t been priced in. And that, by extension, is going to drag everyone down.

    So what about the chip ban?

    The easiest argument to make is that the importance of the chip ban has only been accentuated given the U.S.’s rapidly evaporating lead in software. Software and knowhow can’t be embargoed — we’ve had these debates and realizations before — but chips are physical objects and the U.S. is justified in keeping them away from China.

    At the same time, there should be some humility about the fact that earlier iterations of the chip ban seem to have directly led to DeepSeek’s innovations. Those innovations, moreover, would extend to not just smuggled Nvidia chips or nerfed ones like the H800, but to Huawei’s Ascend chips as well. Indeed, you can very much make the case that the primary outcome of the chip ban is today’s crash in Nvidia’s stock price.

    What concerns me is the mindset undergirding something like the chip ban: instead of competing through innovation in the future the U.S. is competing through the denial of innovation in the past. Yes, this may help in the short term — again, DeepSeek would be even more effective with more computing — but in the long run it simply sews the seeds for competition in an industry — chips and semiconductor equipment — over which the U.S. has a dominant position.

    Like AI models?

    AI models are a great example. I mentioned above I would get to OpenAI’s greatest crime, which I consider to be the 2023 Biden Executive Order on AI. I wrote in Attenuating Innovation:

    The point is this: if you accept the premise that regulation locks in incumbents, then it sure is notable that the early AI winners seem the most invested in generating alarm in Washington, D.C. about AI. This despite the fact that their concern is apparently not sufficiently high to, you know, stop their work. No, they are the responsible ones, the ones who care enough to call for regulation; all the better if concerns about imagined harms kneecap inevitable competitors.

    That paragraph was about OpenAI specifically, and the broader San Francisco AI community generally. For years now we have been subject to hand-wringing about the dangers of AI by the exact same people committed to building it — and controlling it. These alleged dangers were the impetus for OpenAI becoming closed back in 2019 with the release of GPT-2:

    Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code⁠(opens in a new window). We are not releasing the dataset, training code, or GPT-2 model weights…We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.

    We also think governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems. If pursued, these efforts could yield a better evidence base for decisions by AI labs and governments regarding publication decisions and AI policy more broadly.

    The arrogance in this statement is only surpassed by the futility: here we are six years later, and the entire world has access to the weights of a dramatically superior model. OpenAI’s gambit for control — enforced by the U.S. government — has utterly failed. In the meantime, how much innovation has been foregone by virtue of leading edge models not having open weights? More generally, how much time and energy has been spent lobbying for a government-enforced moat that DeepSeek just obliterated, that would have been better devoted to actual innovation?

    So you’re not worried about AI doom scenarios?

    I definitely understand the concern, and just noted above that we are reaching the stage where AIs are training AIs and learning reasoning on their own. I recognize, though, that there is no stopping this train. More than that, this is exactly why openness is so important: we need more AIs in the world, not an unaccountable board ruling all of us.

    Wait, why is China open-sourcing their model?

    Well DeepSeek is, to be clear; CEO Liang Wenfeng said in a must-read interview that open source is key to attracting talent:

    In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.

    Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.

    The interviewer asked if this would change:

    DeepSeek, right now, has a kind of idealistic aura reminiscent of the early days of OpenAI, and it’s open source. Will you change to closed source later on? Both OpenAI and Mistral moved from open-source to closed-source.

    We will not change to closed source. We believe having a strong technical ecosystem first is more important.

    This actually makes sense beyond idealism. If models are commodities — and they are certainly looking that way — then long-term differentiation comes from having a superior cost structure; that is exactly what DeepSeek has delivered, which itself is resonant of how China has come to dominate other industries. This is also contrary to how most U.S. companies think about differentiation, which is through having differentiated products that can sustain larger margins.

    So is OpenAI screwed?

    Not necessarily. ChatGPT made OpenAI the accidental consumer tech company, which is to say a product company; there is a route to building a sustainable consumer business on commoditizable models through some combination of subscriptions and advertisements. And, of course, there is the bet on winning the race to AI take-off.

    Anthropic, on the other hand, is probably the biggest loser of the weekend. DeepSeek made it to number one in the App Store, simply highlighting how Claude, in contrast, hasn’t gotten any traction outside of San Francisco. The API business is doing better, but API businesses in general are the most susceptible to the commoditization trends that seem inevitable (and do note that OpenAI and Anthropic’s inference costs look a lot higher than DeepSeek because they were capturing a lot of margin; that’s going away).

    So this is all pretty depressing, then?

    Actually, no. I think that DeepSeek has provided a massive gift to nearly everyone. The biggest winners are consumers and businesses who can anticipate a future of effectively-free AI products and services. Jevons Paradox will rule the day in the long run, and everyone who uses AI will be the biggest winners.

    Another set of winners are the big consumer tech companies. A world of free AI is a world where product and distribution matters most, and those companies already won that game; The End of the Beginning was right.

    China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.

    That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.’s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that has nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.

    I wrote a follow-up to this Article in this Daily Update.


    Get notified about new Articles


  • Stratechery Plus + Asianometry

    Listen to this post:

    Back in 2022, I rebranded a Stratechery subscription as Stratechery Plus, a bundle of content that would enhance the value of your subscription; today the bundle includes:

    Today I am excited to announce a new addition: the Asianometry newsletter and podcast, by Jon Yu.

    Asianometry is one of the best tech YouTube channels in existence, with over 768,000 subscribers. Jon produces in-depth videos explaining every aspect of technology, with a particular expertise in semiconductors. To give you an idea of Jon’s depth, he has made 31 videos about TSMC alone. His semiconductor course includes 30 videos covering everything from designing chips to how ASML builds EUV machines to Moore’s Law. His video on the end of Dennard’s Law is a particular standout:

    Jon is about more than semiconductors though: he’s made videos about other tech topics like The Tragedy of Compaq, and non-tech topics like Japanese Whisky and Taiwan convenience stores. In short, Jon is an intensely curious person who does his research, and we are blessed that he puts in the work to share what he learns.

    I am blessed most of all, however. Over the last year Jon has been making Stratechery Articles into video essays and cutting clips for Sharp Tech; he did a great job with one of my favorite articles of 2024:

    And now, starting today, Stratechery Plus subscribers can get exclusive access to Asianometry’s content in newsletter and podcast form. The Asianometry YouTube Channel will remain free and Jon’s primary focus, but from now on all of his content will be simultaneously released as a transcript and podcast. Stratechery Plus subscribers can head over to the new Asianometry Passport site to subscribe to his emails, or to add the podcast feed to your favorite podcast player.

    And, of course, subscribe to Jon’s YouTube channel, along with Stratechery Plus.


    Get notified about new Articles


  • AI’s Uneven Arrival

    Listen to this post:

    Box’s route to its IPO, ten years ago this month, was a difficult one: the company first released an S-1 in March 2014, and potential investors were aghast at the company’s mounting losses; the company took a down round and, eight months later, released an updated S-1 that created the template for money-losing SaaS businesses to explain themselves going forward:

    Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

    We experience a range of profitability with our customers depending in large part upon what stage of the customer phase they are in. We generally incur higher sales and marketing expenses for new customers and existing customers who are still in an expanding stage…For typical customers who are renewing their Box subscriptions, our associated sales and marketing expenses are significantly less than the revenue we recognize from those customers.

    SaaS company cohort analysis

    This was the justification for those top-line losses; I wrote in an Update at the time:

    That right there is the SaaS business model: you’re not so much selling a product as you are creating annuities with a lifetime value that far exceeds whatever you paid to acquire them. Moreover, if the model is working — and in retrospect, we know it has for that 2010 cohort — then I as an investor absolutely would want Box to spend even more on customer acquisition, which, of course, Box has done. The 2011 cohort is bigger than 2010, the 2012 cohort bigger than 2011, etc. This, though, has meant that the aggregate losses have been very large, which looks bad, but, counterintuitively, is a good thing.

    Numerous SaaS businesses would include some version of this cohort chart in their S-1’s, each of them manifestations of what I’ve long considered tech’s sixth giant: Apple, Amazon, Google, Meta, Microsoft, and what I call “Silicon Valley Inc.”, the pipeline of SaaS companies that styled themselves as world-changing startups but which were, in fact, color-by-numbers business model disruptions enabled by cloud computing and a dramatically expanded venture capital ecosystem that increasingly accepted relatively low returns in exchange for massively reduced risk profiles.

    This is not, to be clear, an Article about Box, or any one SaaS company in particular; it is, though, an exploration of how an era that opened — at least in terms of IPOs — a decade ago is both doomed in the long run and yet might have more staying power than you expect.

    Digital Advertising Differences

    John Wanamaker, a department store founder and advertising pioneer, famously said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.” That, though, was the late 19th century; the last two decades have seen the rise of digital advertising, the defining characteristic of which is knowledge about whom is being targeted, and whether or not they converted. The specifics of how this works have shifted over time, particularly with the crackdown on cookies and Apple’s App Tracking Transparency initiative, which made digital advertising less deterministic and more probabilistic; the probabilities at play, though, are a lot closer to 100% than they are to a flip-of-a-coin.

    What is interesting is that this advertising approach hasn’t always worked for everything, most notably some of the most advertising-centric businesses in the world. Back in 2016 Procter & Gamble announced they were scaling back targeted Facebook ads; from the Wall Street Journal:

    Procter & Gamble Co., the biggest advertising spender in the world, will move away from ads on Facebook that target specific consumers, concluding that the practice has limited effectiveness. Facebook Inc. has spent years developing its ability to zero in on consumers based on demographics, shopping habits and life milestones. P&G, the maker of myriad household goods including Tide and Pampers, initially jumped at the opportunity to market directly to subsets of shoppers, from teenage shavers to first-time homeowners.

    Marc Pritchard, P&G’s chief marketing officer, said the company has realized it took the strategy too far. “We targeted too much, and we went too narrow,” he said in an interview, “and now we’re looking at: What is the best way to get the most reach but also the right precision?”…On a broader scale, P&G’s shift highlights the limits of such targeting for big brands, one of the cornerstones of Facebook’s ad business. The social network is able to command higher prices for its targeted marketing; the narrower the targeting the more expensive the ad.

    P&G is a consumer packaged goods (CPG) company, and what mattered most for CPG companies was shelf space. Consumers would become aware of a brand through advertising, motivated to buy through things like coupons, and the payoff came when they were in the store and chose one of the CPG brands off the shelf; of course CPG companies paid for that shelf space, particularly coveted end-caps that made it more likely consumers saw the brands they were familiar with through advertising. There were returns to scale, as well: manufacturing is a big one; the more advertising you bought the less paid per ad; more importantly, the more shelf space you had the more room you had to expand your product lines, and crowd out competitors.

    The advertising component specifically was usually outsourced to ad agencies, for reasons I explained in a 2017 Article:

    Few advertisers actually buy ads, at least not directly. Way back in 1841, Volney B. Palmer, the first ad agency, was opened in Philadelphia. In place of having to take out ads with multiple newspapers, an advertiser could deal directly with the ad agency, vastly simplifying the process of taking out ads. The ad agency, meanwhile, could leverage its relationships with all of those newspapers by serving multiple clients:

    A drawing of The Pre-Internet Ad Agency Structure

    It’s a classic example of how being in the middle can be a really great business opportunity, and the utility of ad agencies only increased as more advertising formats like radio and TV became available. Particularly in the case of TV, advertisers not only needed to place ads, but also needed a lot more help in making ads; ad agencies invested in ad-making expertise because they could scale said expertise across multiple clients.

    At the same time, the advertisers were rapidly expanding their geographic footprints, particularly after the Second World War; naturally, ad agencies increased their footprint at the same time, often through M&A. The overarching business opportunity, though, was the same: give advertisers a one-stop shop for all of their advertising needs.

    The Internet provided two big challenges to this approach. First, the primary conversion point changed from the cash register to the check-out page; the products that benefited the most were either purely digital (like apps) or — at least in the earlier days of e-commerce — spur-of-the-moment purchases without major time pressure. CPG products didn’t really fall in either bucket.

    Second, these types of purchases aligned well with the organizing principle of digital advertising, which is the individual consumer. What Facebook — now Meta — is better at than anyone in the world is understanding consumers not as members of a cohort or demographic group but rather as individuals, and serving them ads that are uniquely interesting to them.

    Notice, though, that nothing in the traditional advertiser model was concerned with the individual: brands are created for cohorts or demographic groups, because they need to be manufactured at scale; then, ad agencies would advertise at scale — making money along the way — and the purchase would be consummated in physical stores at some later point in time, constrained (and propelled by) limited shelf space. Thus P&G’s pullback — and thus the opportunity for an entirely new wave of companies that were built around digital advertising and its deep personalization from the get-go.

    This bifurcation manifested itself most starkly in the summer of 2020, when large advertisers boycotted Facebook over the company’s refusal to censor then-President Trump; Facebook was barely affected. I wrote in Apple and Facebook:

    This is a very different picture from Facebook, where as of Q1 2019 the top 100 advertisers made up less than 20% of the company’s ad revenue; most of the $69.7 billion the company brought in last year came from its long tail of 8 million advertisers…

    This explains why the news about large CPG companies boycotting Facebook is, from a financial perspective, simply not a big deal. Unilever’s $11.8 million in U.S. ad spend, to take one example, is replaced with the same automated efficiency that Facebook’s timeline ensures you never run out of content. Moreover, while Facebook loses some top-line revenue — in an auction-based system, less demand corresponds to lower prices — the companies that are the most likely to take advantage of those lower prices are those that would not exist without Facebook, like the direct-to-consumer companies trying to steal customers from massive conglomerates like Unilever.

    In this way Facebook has a degree of anti-fragility that even Google lacks: so much of its business comes from the long tail of Internet-native companies that are built around Facebook from first principles, that any disruption to traditional advertisers — like the coronavirus crisis or the current boycotts — actually serves to strengthen the Facebook ecosystem at the expense of the TV-centric ecosystem of which these CPG companies are a part.

    It has been nine years since that P&G pullback I referenced above, and one of the big changes that P&G has made in that timeframe is to take most of their ad-buying in-house. This was in the long run inevitable, as the Internet ate everything, including traditional TV viewing, and as the rise of Aggregation platforms meant that the number of places you needed to actually buy an ad to reach everyone decreased even as potential reach increased. Those platforms also got better: programmatic platforms achieve P&G’s goal of mass reach in a way that actually increased efficiency instead of over-spending to over-target; programmatic advertising also covers more platforms now, including TV.

    o3 Ammunition

    Late last month OpenAI announced its o3 model, validating its initial o1 release and the returns that come from test-time scaling; I explained in an Update when o1 was released:

    There has been a lot of talk about the importance of scale in terms of LLM performance; for auto-regressive LLMs that has meant training scale. The more parameters you have, the larger the infrastructure you need, but the payoff is greater accuracy because the model is incorporating that much more information. That certainly still applies to o1, as the chart on the left indicates.

    o1 scales with both training and inference compute

    It’s the chart on the right that is the bigger deal: o1 gets more accurate the more time it spends on compute at inference time. This makes sense intuitively given what I laid out above: the more time spent on compute the more time o1 can spend spinning up multiple chains-of-thought, checking its answers, and iterating through different approaches and solutions.

    It’s also a big departure from how we have thought about LLMs to date: one of the “benefits” of auto-regressive LLMs is that you’re only generating one answer in a serial manner. Yes, you can get that answer faster with beefier hardware, but that is another way of saying that the pay-off from more inference compute is getting the answer faster; the accuracy of the answer is a function of the underlying model, not the amount of compute brought to bear. Another way to think about it is that the more important question for inference is how much memory is available; the more memory there is, the larger the model, and therefore, the greater amount of accuracy.

    In this o1 represents a new inference paradigm: yes, you need memory to load the model, but given the same model, answer quality does improve with more compute. The way that I am thinking about it is that more compute is kind of like having more branch predictors, which mean more registers, which require more cache, etc.; this isn’t a perfect analogy, but it is interesting to think about inference compute as being a sort of dynamic memory architecture for LLMs that lets them explore latent space for the best answer.

    o3 significantly outperforms o1, and the extent of that outperformance is dictated by how much computing is allocated to the problem at hand. One of the most stark examples was o3‘s performance on the ARC prize, a visual puzzle test that is designed to be easy for humans but hard for LLMs:

    OpenAI’s new o3 system – trained on the ARC-AGI-1 Public Training set – has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.

    o3 test results on ARC

    This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3…

    Despite the significant cost per task, these numbers aren’t just the result of applying brute force compute to the benchmark. OpenAI’s new o3 model represents a significant leap forward in AI’s ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.

    Of course, such generality comes at a steep cost, and wouldn’t quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.

    I don’t believe that o3 and inference-time scaling will displace traditional LLMs, which will remain both faster and cheaper; indeed, they will likely make traditional LLMs better through their ability to generate synthetic data for further scaling of pre-training. There remains a large product overhang for traditional LLMS — the technology is far more capable than the products that have been developed to date — but even the current dominant product, chatbots, are better experienced with a traditional LLM.

    That very use case, however, gets at traditional LLM limitations: because they lack the ability to think and decide and verify they are best thought of as a tool for humans to leverage. Indeed, while conventional wisdom about these models is that it allows anyone to generate good enough writing and research, the biggest returns come to those with the most expertise and agency, who are able to use their own knowledge and judgment to reap efficiency gains while managing hallucinations and mistakes.

    What o3 and inference-time scaling point to is something different: AI’s that can actually be given tasks and trusted to complete them. This, by extension, looks a lot more like an independent worker than an assistant — ammunition, rather than a rifle sight. That may seem an odd analogy, but it comes from a talk Keith Rabois gave at Stanford:

    So I like this idea of barrels and ammunition. Most companies, once they get into hiring mode…just hire a lot of people, you expect that when you add more people your horsepower or your velocity of shipping things is going to increase. Turns out it doesn’t work that way. When you hire more engineers you don’t get that much more done. You actually sometimes get less done. You hire more designers, you definitely don’t get more done, you get less done in a day.

    The reason why is because most great people actually are ammunition. But what you need in your company are barrels. And you can only shoot through the number of unique barrels that you have. That’s how the velocity of your company improves is adding barrels. Then you stock them with ammunition, then you can do a lot. You go from one barrel company, which is mostly how you start, to a two barrel company, suddenly you get twice as many things done in a day, per week, per quarter. If you go to three barrels, great. If you go to four barrels, awesome. Barrels are very difficult to find. But when you have them, give them lots of equity. Promote them, take them to dinner every week, because they are virtually irreplaceable. They are also very culturally specific. So a barrel at one company may not be a barrel at another company because one of the ways, the definition of a barrel is, they can take an idea from conception and take it all the way to shipping and bring people with them. And that’s a very cultural skill set.

    The promise of AI generally, and inference-time scaling models in particular, is that they can be ammunition; in this context, the costs — even marginal ones — will in the long run be immaterial compared to the costs of people, particularly once you factor in non-salary costs like coordination and motivation.

    The Uneven AI Arrival

    There is a long way to go to realize this vision technically, although the arrival of first o1 and then o3 signal that the future is arriving more quickly than most people realize. OpenAI CEO Sam Altman wrote on his blog:

    We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

    I grant the technical optimism; my definition of AGI is that it can be ammunition, i.e. it can be given a task and trusted to complete it at a good-enough rate (my definition of Artificial Super Intelligence (ASI) is the ability to come up with the tasks in the first place). The reason for the extended digression on advertising, however, is to explain why I’m skeptical about AI “materially chang[ing] the output of companies”, at least in 2025.

    In this analogy CPG companies stand in for the corporate world generally. What will become clear once AI ammunition becomes available is just how unsuited most companies are for high precision agents, just as P&G was unsuited for highly-targeted advertising. No matter how well-documented a company’s processes might be, it will become clear that there are massive gaps that were filled through experience and tacit knowledge by the human ammunition.

    SaaS companies, meanwhile, are the ad agencies. The ad agencies had value by providing a means for advertisers to scale to all sorts of media across geographies; SaaS companies have value by giving human ammunition software to do their job. Ad agencies, meanwhile, made money by charging a commission on the advertising they bought; SaaS companies make money by charging a per-seat licensing fee. Look again at that S-1 excerpt I opened with:

    Our business model focuses on maximizing the lifetime value of a customer relationship. We make significant investments in acquiring new customers and believe that we will be able to achieve a positive return on these investments by retaining customers and expanding the size of our deployments within our customer base over time…

    The positive return on investment comes from retaining and increasing seat licenses; those seats, however, are proxies for actually getting work done, just as advertising was just a proxy for actually selling something. Part of what made direct response digital advertising fundamentally different is that it was tied to actually making a sale, as opposed to lifting brand awareness, which is a proxy for the ultimate goal of increasing revenue. To that end, AI — particularly AI’s like o3 that scale with compute — will be priced according to the value of the task they complete; the amount that companies will pay for inference time compute will be a function of how much the task is worth. This is analogous to digital ads that are priced by conversion, not CPM.

    The companies that actually leveraged that capability, however, were not, at least for a good long while, the companies that dominated the old advertising paradigm. Facebook became a juggernaut by creating its own customer base, not by being the advertising platform of choice for companies like P&G; meanwhile, TV and the economy built on it stayed relevant far longer than anyone expected. And, by the time TV truly collapsed, both the old guard and digital advertising had evolved to the point that they could work together.

    If something similar plays out with AI agents, then the most important AI customers will primarily be new companies, and probably a lot of them will be long tail type entities that take the barrel and ammunition analogy to its logical extreme. Traditional companies, meanwhile, will struggle to incorporate AI (outside of whole-scale job replacement a la the mainframe); the true AI takeover of enterprises that retain real world differentiation will likely take years.

    None of this is to diminish what is coming with AI; rather, as the saying goes, the future may arrive but be unevenly distributed, and, contrary to what you might think, the larger and more successful a company is the less they may benefit in the short term. Everything that makes a company work today is about harnessing people — and the entire SaaS ecosystem is predicated on monetizing this reality; the entities that will truly leverage AI, however, will not be the ones that replace them, but start without them.



    Get notified about new Articles