Like any electronic system, errors in the storage subsystem may occur due to design faults/defects in any component or electrical noise. These errors can be classified as hard errors (caused by design failures) or soft errors (caused by system noise or storage array bit flips caused by alpha particles, etc.).
To handle these memory errors at runtime, the memory subsystem must have advanced RAS (Reliability, Availability, and Serviceability) functionality to prolong the normal operation of the entire system in the presence of memory errors. Without RAS functionality, the system is likely to crash due to memory errors. However, RAS functionality allows the system to continue running in the presence of correctable errors while recording detailed information about uncorrectable errors for future debugging.
One of the most popular RAS schemes used in the memory subsystem is Error-Correcting Code (ECC) memory. By generating ECC SECDED (Single Error Correction Double Error Detection) codes for actual data and storing them in separate DRAM storage, the DDR controller can correct single-bit errors and detect double-bit errors from received data.
The ECC code is generated by the controller based on the actual WR (WRITE) data. The memory stores both the WR data and ECC codes. During RD (READ) operations, the controller reads data and the corresponding ECC code from memory. The controller regenerates ECC codes from the received data and compares them with the received ECC codes.
If a match exists, no errors occur. If a mismatch exists, the ECC SECDED mechanism allows the controller to correct any single-bit error and detect two-bit errors. This ECC scheme provides end-to-end protection against single-bit errors that can occur at any position within the memory subsystem between the controller and memory.
Based on the actual storage of ECC codes, ECC schemes can be of two types: Chip-ECC or On-Die ECC. In Chip-ECC, ECC codes are stored in separate DRAM, while in On-Die ECC, codes are stored alongside the actual data in the same DRAM.
Since DDR5 and LPDDR5 support data rates much higher than their predecessors, they support additional ECC features to enhance the robustness of the memory subsystem. On-die ECC in DDR5 and Link-ECC in LPDDR5 are two such RAS schemes that can further enhance the RAS functionality of the memory subsystem.
01
Sideband ECC #
Sideband ECC schemes are typically implemented in applications using standard DDR memory (such as DDR4 and DDR5). As the name suggests, ECC codes are sent to memory as sideband data along with the actual data. For example, for a 64-bit data width, 8 additional bits are used for ECC storage. Therefore, DDR4 ECC DIMMs commonly used in today’s enterprise servers and data centers have a width of 72 bits. These DIMMs have two additional x4 DRAMs or one x8 DRAM for an extra 8-bit ECC storage. In sideband ECC, the controller writes and reads ECC codes along with the actual data. This ECC scheme does not require additional WR or RD overhead commands.
02
Cascaded ECC #
Cascaded ECC schemes are typically implemented in applications using LPDDR memory. Due to the fixed channel width of LPDDR DRAM (16 bits for LPDDR5 / 4 / 4X channels), using sideband ECC with these memories becomes an expensive solution. For instance, for a 16-bit data width, an extra 16-bit LPDDR channel needs to be allocated for a 7-bit or 8-bit ECC code word in sideband ECC. Additionally, the 7-bit or 8-bit ECC code word only partially fills the 16-bit extra channel, leading to inefficient storage and adding extra load to the address and command channels, potentially limiting performance. Therefore, inline ECC becomes a better solution for LPDDR memory.
In inline ECC, the controller doesn’t need to provide an additional channel for ECC storage; instead, ECC codes are stored in the same DRAM channel that stores the actual data. Thus, the total data width of the storage channel remains the same as the actual data width. In cascaded ECC, the 16-bit channel memory is partitioned to allocate a dedicated portion of memory for ECC code storage. When ECC codes aren’t sent along with WR and RD data, the controller generates separate overhead WR and RD commands for ECC codes. As a result, each WR and RD command for actual data is accompanied by an overhead WR and RD command for ECC data. High-performance controllers reduce the cost of such overhead ECC commands by packaging ECC data for several consecutive addresses in a single overhead ECC WR command. Similarly, the controller reads ECC data for several consecutive addresses from memory in a single overhead ECC RD command and applies the read ECC data to the actual data from consecutive addresses. Therefore, with such ECC overhead commands, the delay loss is minimized for more continuous traffic patterns.
03
On-Die ECC #
With each generation of DDR, an increase in DRAM capacity is common. DRAM vendors often shrink their manufacturing technology to achieve higher speeds and better economies of scale. With higher capacities, speeds, and smaller manufacturing technology, the likelihood of single errors on the DRAM storage array increases. To further enhance memory channels, DDR5 DRAM has additional storage dedicated to ECC storage. On-die ECC is an advanced RAS feature that DDR5 systems use to achieve higher speeds. For every 128 bits of data, DDR5 DRAM has 8 additional bits reserved for ECC storage.
Inside the DRAM, ECC is computed for WR data, and ECC codes are stored in the additional storage. During read operations, the DRAM reads both the actual data and the ECC codes, and it can correct any single-bit error on the read data bits. Thus, On-Die ECC provides further protection against single errors within the DDR5 memory array. As this scheme doesn’t offer protection against errors occurring on the DDR channel, combining On-Die ECC with Sideband ECC is used to enhance end-to-end RAS on the memory subsystem.
04
Link-ECC #
The Link-ECC scheme is a feature of LPDDR5 that prevents single-bit errors on the LPDDR5 link or channel. The memory controller calculates ECC for WR data and sends ECC along with the data on specific bits. The DRAM generates ECC on received data, compares it with the received ECC data, and corrects all single errors. For read operations, the roles of the controller and DRAM are reversed. Note that Link-ECC cannot provide any protection against single errors on the memory array. However, combining On-Die ECC with Link-ECC provides end-to-end protection against single-bit errors, enhancing the robustness of the LPDDR5 channel.
One widely used memory RAS feature is the Error-Correcting Code (ECC) scheme. Applications using standard DDR memory often implement Sideband ECC, while applications using LPDDR memory typically implement Embedded ECC. With higher speeds, DDR5 and LPDDR5 channels have significant signal integrity (SI) impacts. Now, even in the forms of On-Die ECC and Link-ECC, ECC is supported on DDR5 and LPDDR5 DRAMs (Innovative LPCAMM Solutions by Top Storage Manufacturers) respectively due to their increased speed. Synopsys’ DesignWare DDR5/4 and LPDDR5/4 IP solutions offer advanced RAS features, including all the ECC schemes highlighted in this article. (Author bio: Vadhiraj Sankaranarayanan is a Senior Technical Marketing Manager at Synopsys. Prior to joining Synopsys, Sankaranarayanan worked as an engineer at Dell, Rambus, Apple, and Kawasaki Microelectronics.)
Recommended Reading:

