Computational Storage: The Potential and Challenges of SSD Acceleration
In recent years, an increasing number of companies have begun exploring innovative methods to offload some of the CPU workloads onto SSDs (solid-state drives), a trend that is opening new possibilities for high-performance computing (HPC). While the concept of “computational storage” was originally driven by the need to enhance CPU performance, its practical implementation still faces several technological challenges, particularly in how to transform this idea into a commercially viable solution without losing scalability and flexibility.
1. The Initial Concept of Computational Storage
A few years ago, the concept of computational storage gained widespread attention within the industry, being heralded as a potential breakthrough for addressing CPU efficiency bottlenecks. The core idea was that if storage devices could perform some of the data processing directly, it would reduce the need to transfer data between memory and the CPU. In theory, this approach could help lower power consumption, reduce data transfer demands, and accelerate overall computational performance.
However, despite the appealing nature of this idea, it has become clear that there are no one-size-fits-all solutions for the various use cases. Each scenario has its unique characteristics, and current computational storage architectures often fail to scale across broader applications.
2. Misguided Thinking in Computational Storage
When discussing computational storage, many engineers often propose overly idealistic solutions. For example, there have been suggestions to run Linux on SSDs and equip them with more powerful processors. While this idea may seem creative on the surface, it is overly complex in practice, lacks clear focus, and ultimately fails to address the core issue. The success of computational storage requires focusing on simpler, more practical tasks rather than chasing overly idealistic technological fantasies.
3. Focusing on SSD Accelerators
With the ongoing advancement of NAND storage technologies, we have started to realize that leveraging the onboard bandwidth of SSDs through dedicated accelerators can improve computational efficiency for specific tasks. Instead of having the CPU handle all the data processing between the CPU and SSD, it makes more sense to have the SSD execute tasks suited to storage devices, such as fixed operations within the Logical Block Addressing (LBA) range.
These accelerators don’t require excessive power or complex computation and can efficiently handle operations such as:
- Conditional filtering of large datasets.
- Object-based erasure coding.
- Checksums and data validation.
- Pre-filtering data before it reaches the CPU.
This method is especially effective in high-demand computing environments such as data centers and supercomputing clusters. By processing data at the SSD level, we can reduce the amount of data that needs to be transferred across the PCIe bus or over the network. This reduces congestion, alleviates bandwidth limitations, and significantly boosts overall performance.
By focusing on highly specific tasks, SSD accelerators can provide fast data processing without adding significant power consumption. Importantly, this approach can be scaled across multiple drives, creating a more efficient parallel system that surpasses traditional CPU-bound processing methods.
4. CPU and SSD Collaboration: Efficient Data Flow
While a single CPU can typically perform more complex tasks faster than an individual SSD, in practice, the available CPU DRAM bandwidth for non-OS tasks is limited. Furthermore, transferring data from SSDs to DRAM consumes roughly half of the available DDR bandwidth.
Considering modern all-flash storage enclosures that can house 30, 60, or even 90 SSDs, this setup offers a considerable offloading capacity. For instance, a chassis with 90 Gen6 SSDs can process data at speeds of up to 2.5 TB/s without impacting any CPU resources. In such an architecture, the SSDs handle pre-processing and computations, while the CPU focuses on more complex tasks.
5. The Evolution of Intelligent SSDs
Today, some companies are advancing even further by integrating CPU clusters into storage arrays, taking the concept of computational storage a step beyond dedicated accelerators. The main difference here is that the CPU clusters are not used for computation but instead act as hosts for running web services or microservices, appearing as additional CXL (Compute Express Link) services on the PCIe bus.
For example, in a complex AI project, multiple large language models (LLMs) may work together to perform tasks such as:
- Extracting English audio and converting it into text.
- Translating the text into another language (e.g., Chinese).
- Creating a voice track with a celebrity’s voice.
This process involves numerous small steps typically handled by the CPU or GPU and requires frequent model exchanges. Why not allow the SSDs to execute some of these smaller tasks in the background, while the main CPU delegates more advanced operations to the accelerators? In HPC environments, this collaborative approach could significantly improve efficiency and drastically reduce processing time.
6. SSDs for Accelerating Encryption and Enhancing Security
Beyond high-performance computing, security is another critical area where SSDs can play a significant role, particularly for encryption operations. In this context, SSDs can perform hundreds of encryption operations per second, such as signing and verification. When a server is equipped with 30 to 90 SSDs, the processing power increases proportionally, and each SSD can act as an independent hardware-based agent, linking directly to the Hardware Security Module (HSM) server.
This not only boosts the speed of encryption but also enhances the overall security of the system. Compared to traditional CPUs, SSDs are better suited to execute digital signature algorithms (DSA), providing a substantial advantage in performing secure operations.
7. Conclusion
While the traditional notion of computational storage may not evolve as expected, we are witnessing the rise of a more targeted approach using accelerators that simplify operations and enhance efficiency. With the increasing onboard bandwidth of SSDs and the potential to leverage this capability in new ways, we may see some exciting applications in the near future, particularly in high-performance computing and security.
Source: allaboutcircuits
Disclaimer:
- This channel does not make any representations or warranties regarding the availability, accuracy, timeliness, effectiveness, or completeness of any information posted. It hereby disclaims any liability or consequences arising from the use of the information.
- This channel is non-commercial and non-profit. The re-posted content does not signify endorsement of its views or responsibility for its authenticity. It does not intend to constitute any other guidance. This channel is not liable for any inaccuracies or errors in the re-posted or published information, directly or indirectly.
- Some data, materials, text, images, etc., used in this channel are sourced from the internet, and all reposts are duly credited to their sources. If you discover any work that infringes on your intellectual property rights or personal legal interests, please contact us, and we will promptly modify or remove it.