Why is advanced memory becoming a bottleneck for AI growth?

A few months ago I was talking to a friend who had spent 3 months trying to source sufficient amounts of high bandwidth memory for a mid-scale AI training cluster as a part of his work at a small to medium size company with a decent sized compute budget and team, but he ran out of time to actually train his model before the end of the year.

As for the rest of these stories they will continue to pile up under different names.

Table of Contents

The part that most people skip over

Many people misunderstand how much memory a large number of GPUs require to function effectively at AI compute. Without sufficient memory bandwidth the many GPUs can bring a large AI compute cluster to a standstill. Unfortunately HBM is a very difficult memory to make cost effectively in very large quantities, so not every company can use it.

The problem with all of these great AI chips is that the memory bandwidth of the High-Bandwidth Memory (HBM) on the chip is not sufficient for the large number of very memory-intensive workloads. In many cases, moving the data in memory on the chip (as opposed to on a disk or over a network) is the slowest part of the computation. HBM is the best type of memory for this, but it is very expensive to make, difficult to package, and there is not enough of it in the world. There are only a handful of companies in the world that can make HBM in commercial volumes, and they are not going to suddenly decide to build more capacity in a year or two. That would require a huge amount of investment, and a change in their strategy.

There are only a handful of companies in the world that are able to make HBM at scale, and it’s not something that can be flipped overnight. It takes years of investment, it takes large amounts of fab capacity, it takes advanced packaging capability, it takes a large and highly technical team, and so on. While a lot of money is being poured into making more memory, it’s going to take a while for that to bear fruit.

Why memory availability matters more than the chip headlines suggest

The largest oversight in the infrastructure for AI is treating the memory (the high-bandwidth memory in particular) as if it were a component. It’s a constraint — a ceiling. It’s a “shelves” with a finite amount of space on it. Even though there is tons of compute able to be spent, until there is enough HBM to pair each of the GPUs with, the GPUs will sit idle. All that money spent on powerful compute is lost until the needed memory becomes available. It’s frustrating. I know from personal experience.

The hbm shortage is well documented and, depending on how you read the supply projections, it’s not resolving quickly. As high bandwidth memory, or hbm, continues to be a critical component for many AI workloads, the demand for training clusters continues to compound against packaging capacity that was not designed to support the current scale of the AI market. The challenge of expanding CoWoS, or advanced packaging, capacity is a multi-year challenge, not something that can be addressed in a few quarters.

Early adopters of high bandwidth memory will be able to obtain the necessary parts to complete their projects on time. Others in the organization will have to form a queue to obtain the memory needed. Smaller organizations will have to wait until additional memory becomes available and projects will be delayed and in many cases reduced in scope.

What’s actually driving demand so hard

Some major trends in AI are presently all going in the same direction creating a huge and long lasting demand for chips used for AI. Workloads for AI are increasing. More and more companies are entering the market for AI. There are increasing numbers of models and growing size of those models. Training of models is increasing in frequency. Inference of models is increasing in frequency. The trends all work together to create compound growth. Large memory workloads, particularly those that require high bandwidth memory, dominate the landscape for through put constrained by memory to compute motion.

Model sizes keep growing. Bigger parameter counts mean larger memory footprints, both during training and at inference.
The shift toward running AI inference at scale in production is relatively new. Training gets the attention, but inference is where sustained memory demand lives day after day, grinding away.
Multiple industries are building out AI infrastructure simultaneously, which means demand isn’t arriving from one direction — it’s a flood from every direction at once.
Geopolitical dynamics are reshaping supply chains in ways that add friction rather than reduce it, particularly around the handful of fabs capable of producing advanced memory at volume.

Again, none of these pressures are meaning to abate in the short term. They are non-independent, which means that their individual impacts on demand for various forms of AI compute will, in non-linear fashion, compound each other’s impact on total demand in ways that are challenging to forecast in a linear fashion.

A quick look at how the supply picture breaks down

Factor	Current status	Near-term outlook
HBM production capacity	Constrained, concentrated among 2-3 suppliers	Incremental expansion, not step-change
CoWoS packaging capacity	Severely limited relative to demand	Slow ramp through 2025-2026
AI accelerator demand	Growing faster than supply can absorb	No signs of softening
Pricing pressure on HBM	Elevated across the board	Likely to stay high while supply lags

(2) CoWoS packaging – the second row in the above chart. This is the least covered topic in the many discussions around the GPUs used in AI infrastructure. And that is exactly the point – without advanced CoWoS packaging (or even just adequate packaging) HBM is useless – you can’t integrate it to the compute cores.

So where does that leave most companies?

In an uncomfortable spot. Not great, honestly.

It will take 2-3 years for the companies that have been planning for this to establish the relationships they need with the suppliers and to have been sourcing the memory for a long time. The rest of the companies are struggling to get the parts they need because the parts they need are already spoken for before they can even place an order. This is not a technical problem. This is a problem of planning, of forecasting. It is a failure to realize that compute and memory are two parts of a single whole, and that you cannot get one without the other.

This situation will take a while to revert to normal but in the meantime there is a lot of money pouring into very innovative compute architectures to overcome the memory bottle neck. The key to getting the most out of the situations described above is to understand the root cause of the problem, what is actually scarce, and why.

The hype around chips will continue to run for a long time to come. But one must pay a lot more attention to the silence of the memory hype train (hbm shortage, cowos hbm and 2.3nm capacity constraints through 2027) than to the GPUs. And it will be a lot harder to create good press around a memory shortage than there has been with GPUs.