I watched Apple’s “Apple Intelligence ” announcement for a couple of reasons. One is that I’m a shareholder. The other is that Apple’s marketing is pretty good, and I like to see their work. 

As the stock shot up after the announcement, I checked some of the analyses and comments. One had the view that Nvidea has limited upside because, in part, because Apple’s insistence on putting as much AI on device – iPhones – as possible, means that demand for Nvidea GPUs will level off in the not too distant future. That wasn’t the complete argument, but it’s a fair summary of the author’s expected outcome. 

In the comments a surprising number of people confidently asserted that it isn’t possible to run significant Large Language Models on a phone, and, due to the growth of these models, it won’t be in the future either. That made me chuckle.

I joined Digital Equipment Corporation (DEC) in 1981 and stayed until 1995, when I joined Sun Microsystems. During that time I saw a company in denial about what the PC was going to do to their business, even as DEC’s smaller rivals – Data General, Prime, and Wang – were toppled by Wintel juggernaut. 

The dynamic was simple: CPUs were getting more powerful; storage and networks were getting cheaper and faster; and the combination of those two trends reordered the predominant industry model from proprietary vertical integration to a horizontal stack. Intel for CPUs. Microsoft for OS. TCP/IP for local networking. And so on. The massive proliferation of wintel machines changed the economics of software, enabling immensely profitable de facto monopolies. 

For example, Intergraph in the early 80s was a major provider of GIS and CAD systems, customizing hardware from DEC to build powerful-for-the-time workstations. But as PCs became more capable Intergraph’s custom hardware became economically unsustainable. They exited the hardware business in 2000.

How difficult are neural processors?

AI’s underlying compute hardware consists of fairly simple parallel single precision arithmetic processors. Apple’s M4 chip puts 16 Neural Engine cores in about the same real estate as 6 efficiency cores, and a fraction of the area of the 4 performance cores. 

The point is that “AI Hardware” is not an exotic bleeding edge architecture. Multiple companies – Amazon, Google, Meta, Apple, and Microsoft – have the ability and the volume to economically design and deploy their own, optimized, AI processors.

Nvidia’s big advantage is its CUDA software. It’s a full development environment for building massively parallel apps using a language that many developers are already familiar with. If you’re a small or medium sized developer that’s a big advantage. If you’re a tech giant with thousands of engineers it’s an advantage now, but one that can be overcome, especially if you have designed your own chips optimized for 100,000 server warehouses.

When Amazon first began deploying warehouse scale computers, they used Cisco network switches. But It didn’t take long for their engineers to realize they didn’t need the overhead and cost of Cisco’s 10 million line of code for their very focussed infrastructure. They built their own lower latency switches – at much lower cost – and waved goodbye to Cisco.

Nvidia’s vulnerability

But major cloud vendors moving to their own hardware isn’t the only vector in the calculus of AI adoption. While training an AI  is now, and probably always will be, a major investment, running the AI doesn’t have to be. 

In a recent <a href=”paper” target=https://arxiv.org/pdf/2312.11514“_blank”></a> Apple researchers investigated running inference LLMs on devices with limited DRAM. In the paper they detailed a number of optimizations that reduced the runtime LLM’s footprint.

From the paper:

<blockquote>

Our method involves constructing an inference cost model that takes into account the characteristics of flash memory, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. [Using] two principal techniques. First, “windowing” strategically reduces data transfer by reusing previously activated neurons, and second, “row-column bundling”, [increasing] the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.

</blockquote>

The StorageMojo take

These are early days for AI. I’m confident many more such optimizations will be found. Just as virtual memory techniques were a competitive battleground in 70s – and most users have no idea they’re using virtual memory today – the pressure to run inference on edge devices will be unrelenting. Nvidia will suffer, much as Intel has, as inference compute cycles bleed out to the edge, and optimized architectures and software stacks make it much more efficient.

The real question is “how long will this take?” Given the market opportunity and today’s level of investment Nvidia’s run will be less than a decade.

Ultimately, AI is a feature, not a product. As the specific requirements for its popular use cases become apparent, the optimizations will get more focused and major vendors will tweak their hardware to better run them. Nvidia will be the Intel of the 2030s.

Until then, there may be a lot of money to be made on Nvidia stock. I don’t own any directly, but I expect some of my ETFs do.

Comments welcome, and are moderated.