Around 2007, NVIDIA introduced CUDA, marking a transformative shift.
But CUDA wasn’t just a technical breakthrough—it was a conceptual one. Before CUDA, chip manufacturers focused primarily on making chips compatible with OS systems, tuning firmware, writing drivers, and implementing necessary system calls, leaving the rest to downstream application developers.
CUDA didn’t just offer optimized operators and parallel computation scheduling, waiting for others to adapt it; it went directly into applicable scenarios and covered all intermediate software layers needed for those scenarios.
CUDA is the fundamental schedulable execution unit within NVIDIA’s GPU architecture, and it’s also a C-like parallel computing programming language. But building the SIMD hardware scheduling for CUDA, its compiler, and runtime are only the basics of what NVIDIA achieved.
Had NVIDIA stopped at these basics, they could have smoothed the development chain, but it wouldn’t have become the de facto standard in numerous computing fields, nor formed a moat even AMD, Intel, and Broadcom combined couldn’t breach.
When NVIDIA began with CUDA, it actively gathered needs from universities and companies, clarifying industry demands and treating “potential” customers—who might never pay or fully adopt CUDA tools—as primary stakeholders, assembling development teams to create software, libraries, and SDKs for them.
Some libraries weren’t even directly related to CUDA, but if they were needed for CUDA to be adopted in specific scenarios, NVIDIA would allocate software resources to contribute to the open-source community.
To encourage adoption, NVIDIA practically gave away GPUs to universities. Users didn’t need to understand CUDA’s intricacies; they simply inserted an NVIDIA card, installed drivers and a few designated programs, and were seamlessly integrated into their existing workflows.
Afterward, they would see speeds increase tenfold, quickly making CUDA essential.
CUDA ultimately became an ecosystem—not just a hardware unit within GPUs or a C-like programming language, but an entire structural system.
Even with AMD or Intel breathing down their necks, they cannot replace a part of CUDA today and another tomorrow to gradually shift entrenched users to their platforms.
Even if NVIDIA’s CEO allowed it and users cooperated, circumventing NVIDIA’s GPUs to achieve the same is incredibly difficult. Even starting from scratch to replicate CUDA would take a very long time; what company could afford such a drawn-out investment cycle just for a copy? Meanwhile, CUDA is constantly evolving, so by the time a copy is achieved, NVIDIA might have already moved on to new growth areas.
This is why Jim Keller said, “CUDA is not a moat; it’s a swamp.” CUDA has drawn the entire industry in, and even an industry powerhouse can’t easily pull any company out.
By 2024, NVIDIA’s competitors no longer have the timing advantage, either.
When NVIDIA created CUDA, it coincided with the global computing boom after 2007. At that time, only a few GPU companies realized the general-purpose parallel computing potential of GPUs, but no one knew how to apply it. By 2024, that kind of market expansion no longer exists, and no other company can replicate CUDA from scratch.
HIP and OpenCL represent repeated failed attempts to counter CUDA, and I believe the core reason for these failures was a lack of understanding of the real-world applications, leading to unfocused technical approaches.
In contrast, NVIDIA’s long-term planning is evident from their choice of SIMT over VLIW in hardware and their commitment to creating a CUDA language compiler—decisions influenced by future needs rather than current capabilities.
NVIDIA’s CEO, Jensen Huang, is not only the company’s founder but also a skilled engineer, with both technical vision and long-term execution capability, unlike a typical CEO responsible only to investors. This combination has enabled NVIDIA’s ongoing commitment to CUDA. Intel, AMD, and Broadcom lack a similar visionary leader to drive such initiatives.
NVIDIA has also made its mistakes, but in comparison, ATI-AMD seems more like gambling, while Intel appears impatient to showcase technology. Neither can become the next NVIDIA.
ROCm’s current modest success remains an exception. AMD has, in some respects, followed in NVIDIA’s footsteps, but NVIDIA’s rapid progress means that following this path only widens the gap.
In 2012, AMD finally decided to revamp its GPU architecture. Although VLIW was energy-efficient, its programming complexity limited its GPGPU potential, leading to AMD’s first SIMT-based architecture, GCN. While GCN performed well in general computing, it was underwhelming in graphics rendering.
AMD realized Jensen could optimize computational architecture for graphics rendering, but they couldn’t. So after several years with GCN, they split into CDNA and RDNA architectures.
But NVIDIA didn’t stand still. As RDNA closed in on raster performance, Jensen launched DLSS and ray tracing, opening a new competitive front.
This reflects Jensen’s advanced thinking in chip design: DLSS isn’t a simple chip design feature but an innovation born from software teams leveraging neural networks to restore resolution details. Tensor cores accelerate this algorithm, and this requirement was passed to the chip team, who developed the acceleration units, then passed them back to software to be packaged as programmable interfaces.
Initially, AMD hadn’t considered such a long-term perspective. Using conventional backtracking algorithms that aren’t hardware-bound, their FSR iterations fell behind DLSS and eventually required dedicated hardware acceleration. Ray tracing followed a similar pattern.
Jensen once said NVIDIA is not a chip company but a software company, meaning that chips are merely the core of NVIDIA’s software empire, with the entire ecosystem built around GPUs. By early 2024, NVIDIA had around 24,000 engineers, of which 17,000 were software developers.
This model leaves AMD struggling to catch up even as it closely follows NVIDIA’s technical path, but their team structure is so different that “copying” doesn’t work. This is why an AMD executive publicly stated they won’t attempt a CUDA clone—not only is the timing off, but copying isn’t feasible.
Currently, what some perceive as CUDA’s “weakness” is simply a new growth area: large model training. Here, certain upper-layer frameworks like PyTorch can bypass CUDA and directly call CPU interfaces, giving AMD a chance.
But this only shows that in this new field, CUDA is replaceable and AMD has an opportunity; it doesn’t indicate CUDA will miss this wave, let alone be comprehensively replaced.
In reality, DLSS’s bottleneck was tensor computation, so NVIDIA added Tensor Cores to their cards. When GPT models surged in popularity, people realized that large model training required vast amounts of tensor computation, where CUDA’s software stack was already prepared. NVIDIA’s stock soared, surpassing Apple, TSMC, and Microsoft, becoming the world’s most valuable company, far outpacing the combined market value of AMD and Intel. Was this immense success a coincidence or part of NVIDIA’s long-term plan?
Since CDNA’s debut, especially with the MI300 series using advanced packaging to integrate HBM, CPU, and GPU, Infinity Fabric enables flexible configurations of multiple CPU/GPU chiplets, supporting UMA and consistent memory access even with up to eight lanes while maintaining high performance. This indicates that under Lisa Su, AMD’s chip design capabilities are formidable.
But even so, in her investor calls, Lisa Su only dared to set a 10% market share target. Compared to Jensen, this goal seems extremely modest.
AMD’s Zen architecture lets it chip away at Intel’s CPU market in both server and PC client sectors, but it has no means to take on Jensen Huang’s CUDA empire.
Related:
- Physical AI Revolution at 2025 CES: What to Expect
- Jensen Huang Talks RTX 5090 Price at CES 2025 Interview
- CUDA Cores Explained: How They Power GPUs

Disclaimer:
- This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
- This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
- Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.

