S.M.A.R.T and SSD #
S.M.A.R.T. or SMART stands for Self-Monitoring, Analysis, and Reporting Technology. SMART is a monitoring system designed for storing hard drives, used to collect data about the hard drive’s performance and report it to the user. SMART can assist in ensuring that your hard drive operates at its peak performance. Discover the history of SMART and how it has adapted to solid-state drives.
The History of SMART #
Originally, SMART was a means by which disk drive manufacturers informed computers about the status of their hard disk drives (HDD). While some parameters were initially defined, each disk manufacturer had the freedom to choose which parameters to include and what the thresholds should be. The Small Form Factor Committee, a specialized electronic industry organization, attempted to establish what would later become known as the SMART standard. The initial standards specified the communication protocol for monitoring and analysis used by ATA hosts but did not outline any specific metrics or analysis methods.
When solid-state drives (SSD) emerged, they adopted a similar approach to report the status of SSDs. Unfortunately, not everything related to hard drives is applied to solid-state drives.
SMART Attributes #
While the specific SMART status items can vary depending on the software, some crucial items are universally related to the “health” of a drive. This provides a general overview of error correction and maintenance. More detailed information is typically listed after the “health” score, highlighting areas that require attention. In general, you can expect to find the following items:
- 01 (001) Raw_Read_Error_Rate: Indicates the rate of low-level data read errors.
- 04 (004) Start_Stop_Count: Keeps track of the number of times the drive has been started or stopped.
- 05 (005) Reallocated_Sector_Ct: Reflects the count of sectors that have been remapped due to errors.
- 09 (009) Power_On_Hours: Accumulates the total number of hours the drive has been powered on since factory production, typically with a lifespan of around 30,000 hours for conventional hard drives.
- 0A (010) Spin_Retry_Count: Measures the number of retries the drive’s spindle motor attempts to start.
- 0B (011) Calibration_Retry_Count: Records the count of calibration retries the drive undergoes.
- 0C (012) Power_Cycle_Count: Tracks the number of times the drive has been powered on.
- C2 (194) Temperature_Celsius: Provides the temperature of the drive in Celsius.
- C7 (199) UDMA_CRC_Error_Count: Indicates the rate of UDMA (Ultra DMA) communication errors with CRC (Cyclic Redundancy Check).
- C8 (200) Write_Error_Rate: Reflects the rate of write errors.
- F1 (241) Total_LBAs_Written: Represents the total amount of data written to the drive since its factory production, typically measured in logical block addresses (LBAs) where each LBA is 512 bytes.
- F2 (242) Total_LBAs_Read: Represents the total amount of data read from the drive since its factory production, also measured in logical block addresses (LBAs) of 512 bytes each.
These attributes collectively provide insight into the condition and performance of a hard drive, helping users and systems administrators monitor and address potential issues.
Description of SMART Parameters #
In general, users only need to observe the relationship between the current value, the worst value, and the critical value, and pay attention to the status prompt information to roughly understand the health status of the hard drive. Below is a brief introduction to the meanings of various parameters, with items highlighted in red being critical to lifespan, and those in blue being specific to solid-state drives (SSDs).
In flash-based solid-state drives, storage units are divided into two types: SLC (Single Layer Cell) and MLC (Multi-Level Cell). SLC is high in cost, and low in capacity, but has fast read/write speeds, high reliability, and can be erased and written up to 100,000 times, which is 10 times higher than MLC. While MLC has a larger capacity and lower cost, its performance significantly lags behind SLC. To ensure the lifespan of MLC, the control chip also needs to have intelligent wear-leveling algorithms, allowing the write count of each storage unit to be evenly distributed to achieve an average failure-free time of 1 million hours. Therefore, solid-state drives have many SMART parameters that mechanical hard drives do not have, such as the erasure count of storage units and statistics on spare blocks, most of which are customized by manufacturers. Some of these additions lack detailed explanations, and some explanations may not be entirely accurate. This information is provided for reference only. Items specific to solid-state drives that are not marked with the manufacturer are unique to the SandForce controller chip, with each manufacturer specifying separately.
- 01 (001) Raw Read Error Rate: This attribute measures the rate of low-level data read errors. Ideally, the data value should be 0 or any value, with the current value significantly higher than the threshold. If the value is greater than 0, it may indicate problems with the disk surface or the read/write heads, such as media damage, head contamination, or head resonance. However, for some Seagate hard drives, this attribute may have large data values, which may not necessarily indicate a problem. The key is to monitor how the current value decreases over time. In solid-state drives (SSD), this attribute includes both correctable errors and uncorrectable RAISE errors (UECC + URAISE).
- 02 (002) Throughput Performance: This parameter reflects the disk’s read/write throughput performance, with a higher data value being better. If the current value is significantly lower or approaching the threshold, it suggests serious issues with the hard drive. However, modern hard drives often display data values as 0 or may not display this attribute at all. Typically, data values become available only after manual offline SMART testing.
- 03 (003) Spin-Up Time: Spin-Up Time measures the time it takes for the spindle motor to reach its rated speed from the start. Smaller values are better, and this value is generally just a reference because normal hard drives can have varying startup times. For some hard drives, this parameter may always show 0, and the assessment depends on the current value in comparison to the worst value.
- 04 (004) Start/Stop Count: This attribute represents the cumulative count of how many times the hard drive’s spindle motor has been started or stopped. New hard drives typically have a low count, which gradually increases over time. Certain system functions, like shutting down the hard drive during idle times, can significantly increase this count. If the start/stop count is excessively high (much greater than the power-on count of 0C), it may suggest issues with the hard drive’s motor or its driving circuitry. Some hard drives use a formula to calculate the current value based on this count.
- 05 (005) Reallocated Sectors Count / Retired Block Count: This attribute tracks the count of sectors or blocks that have been reallocated due to persistent read, write, or verify errors. When a sector consistently exhibits errors, the drive’s firmware will remap that sector’s physical address to a reserved spare sector and transfer the data. This process is known as sector reallocation or block retirement. The data value for this attribute should ideally be 0, and the current value should be significantly higher than the threshold. In hard drives, this parameter is crucial because it directly affects the drive’s lifespan and performance. As sectors are reallocated, hard drives maintain data integrity by masking the presence of bad sectors. However, excessive reallocations or a current value approaching the threshold may indicate that the drive has exhausted its spare sector pool and is no longer capable of remapping, which can lead to data loss. In solid-state drives (SSD), this attribute remains important. SSDs have a limited number of write cycles for each memory cell. When a cell reaches the end of its life cycle, the drive reallocates data to spare cells. Monitoring this attribute helps assess an SSD’s health and estimate its remaining lifespan.
- 06 (006) Read Channel Margin: This attribute is less common and not displayed on modern hard drives. Its exact function is not well-documented.
- 07 (007) Seek Error Rate: This attribute measures the error rate when the read/write heads seek (move) to the correct track on the hard drive platters. A value of 0 is generally expected, and the current value should be significantly higher than the threshold. An increasing Seek Error Rate can be indicative of issues with the mechanical components of the head assembly, servo circuits, or problems with the disk’s surface or temperature. However, for some Seagate drives, even new drives may have non-zero values in this attribute. It’s essential to monitor the trend of the current value.
- 08 (008) Seek Time Performance: Seek Time Performance represents the average performance of the drive seek operations, typically related to the Seek Error Rate. A continuously decreasing current value may signal issues with the head assembly, seek motor, or servo circuits. However, this attribute is not displayed on many modern hard drives.
- 09 (009) Power-On Time Count (POH): This attribute represents the cumulative power-on time of the hard drive. The data value accumulates the duration in which the device has been powered on. For new hard drives, this value should be relatively low. However, different hard drive manufacturers may define the counting units differently, such as hours, minutes, seconds, or even 30-second intervals. The critical threshold for this attribute is typically set to 0, and as the power-on time increases, the current value gradually decreases. When the current value approaches the threshold, it indicates that the hard drive is nearing its expected design lifespan. However, this doesn’t necessarily mean the drive will fail immediately. You can estimate the remaining lifespan or failure probability by referring to the manufacturer’s specified MTBF (Mean Time Between Failures) value for that drive model. For solid-state drives (SSDs), it’s essential to consider the “Device Initiated Power Management (DIPM)” feature, which can affect this statistic. If DIPM is enabled, the cumulative power-on time does not include sleep periods. If DIPM is disabled, then the time in all three states—active, idle, and sleep—is counted.
- 0A (010) Spin-up Retry Count: The data value for this attribute should ideally be 0, and the current value should be greater than the threshold. This attribute tracks the count of retry attempts made by the spindle motor to spin up the hard drive to its rated speed within a specified time. An increasing count indicates issues with the motor’s drive circuitry, mechanical subsystems, or inadequate power supply to the drive.
- 0B (011) Calibration Retry Count: Similar to the Spin-up Retry Count, the Calibration Retry Count tracks the number of times calibration operations were retried. Hard drives may perform head calibration to compensate for changes in temperature, which can cause mechanical components to expand and contract. Some drives also have timed head calibration functions. An increasing count suggests problems with the motor drive circuitry, mechanical subsystems, or calibration failures. However, some new drives may have a certain amount of data in this attribute, which doesn’t necessarily indicate a problem. It’s important to monitor the current value and compare it to the threshold.
- 0C (012) Power Cycle Count: This attribute reflects the cumulative count of power-on/power-off cycles, which essentially represents the number of times the hard drive has been powered on or off. New hard drives typically have only a few cycles. It’s worth noting that this is different from the Start/Stop Count (04), which indicates how many times the hard drive’s spindle motor was started or stopped during operation (e.g., when the system enters sleep mode or when configured to spin down after a period of inactivity). The number of power-on/off cycles is usually higher than the start/stop count. Generally, hard drives are designed for a high number of power cycles, often exceeding 5,000 cycles or more. Therefore, this count serves as a reference for lifespan estimation but doesn’t have inherent diagnostic significance.
- 0D (013) Soft Read Error Rate: This attribute tracks soft read errors or correctable read errors that are reported to the operating system. A lower data value is preferable, as a high value could indicate problems with the magnetic media on the disk platters. These SMART attributes provide valuable insights into the state of a storage drive and can help users and system administrators monitor drive health and anticipate potential issues. Monitoring these attributes can be essential for ensuring data integrity and drive reliability.
- AA (170) Grown Failing Block Count (Micron): This attribute tracks the total number of blocks that have experienced read or write failures and have grown in number. A growing count in this attribute indicates that more blocks on the drive are becoming unreliable or failing.
- AB (171) Program Fail Block Count: This attribute represents the number of blocks on the flash memory that have experienced programming (write) failures. A higher count suggests issues with the flash memory’s ability to write data reliably.
- AC (172) Erase Fail Block Count: Similar to the Program Fail Block Count, this attribute tracks the number of blocks on the flash memory that have experienced erasing failures. Erasing failures can indicate problems with the flash memory’s ability to clear and rewrite data in specific blocks.
- AD (173) Wear Leveling Count (Micron): This attribute provides information about the average number of times all good blocks on the flash memory have been erased and written (wear cycles). Flash memory has a limited number of write cycles before it degrades, and frequent writes to specific areas can lead to uneven wear. Wear leveling aims to distribute write cycles evenly across the memory to prolong its lifespan.
- AE (174) Unexpected Power Loss Count: This attribute keeps track of the number of unexpected power loss events that have occurred since the drive was enabled. Unexpected power losses can potentially lead to data corruption or other issues.
- B1 (177) Wear Range Delta: This attribute measures the difference in wear percentages between the most worn-out block and the least worn-out block on the flash memory. A large delta indicates that wear leveling is not distributing write cycles evenly across the blocks, potentially reducing the drive’s lifespan.
- B4 (180) Unused Reserved Block Count Total (HP): Solid-state drives (SSDs) reserve some storage capacity for replacing defective storage cells. This attribute’s current value represents the number of reserved storage blocks that have not been used yet. It’s essential to have available reserved blocks for maintaining drive reliability.
- B5 (181) Program Fail Count: This attribute displays the number of programming (write) failures, similar to attribute AB. It indicates how many times the drive has failed to write data correctly.
- B5 (181) Non-4k Aligned Access (Micron): This attribute relates to non-4KB aligned accesses, but further details might be specific to the drive manufacturer’s implementation.
- B6 (182) Erase Fail Count: Similar to attribute AC, this attribute counts the number of block-erasing failures that have occurred on the drive since it was enabled.
- B7 (183) SATA Downshift Error Count: This attribute represents the count of instances where the SATA interface speed has experienced downshift errors. Compatibility issues between the hard drive and motherboard can lead to SATA transfer rate downgrades.
- B8 (184) I/O Error Detection and Correction (HP): This attribute is part of HP’s SMART IV technology. It records the number of data transmission errors (parity errors) detected and corrected when data is transferred from the drive’s internal cache RAM to the host system.
- B8 (184) End-to-End Error Detection Count (Intel – 34nm SSDs): This attribute is specific to Intel’s second-generation 34nm solid-state drives. It tracks the number of errors related to logical block addressing (LBA) mapping between internal logical block addresses and their actual physical addresses within the SSD.
- B8 (184) Init Bad Block Count (Indilinx Chip): This attribute represents the number of bad blocks that were present on the hard drive when it left the factory. Bad blocks are storage blocks that are defective or cannot be used for data storage.
- B9 (185) Head Stability (Western Digital): The specific meaning of this attribute is not clear, and it may be a manufacturer-specific parameter with limited publicly available information.
- BA (186) Induced Op-Vibration Detection (Western Digital): Similar to B9, this attribute’s exact meaning is not well-documented and may be specific to Western Digital drives.
- BB (187) Reported Uncorrectable Errors (Seagate): This attribute tracks the number of uncorrectable errors reported to the operating system that cannot be corrected using hardware error correction codes (ECC). If the data value is not zero, it indicates potential data integrity issues, and it is advisable to back up data on the hard drive.
- BC (188) Command Timeout: This attribute counts the number of times an operation has been terminated due to the hard drive timing out. Typically, the data value should be zero. If it is significantly higher than zero, it could be related to power supply issues, data cable oxidation causing poor contacts, or potentially severe problems with the hard drive itself.
- BD (189) High Fly Writes: High Fly Writes monitoring is a feature that enhances the reliability of read and write operations. It constantly monitors the head’s flying height to ensure it stays within a normal range for reliable data writing. If the head’s flying height deviates, write operations are halted and may be retried or attempted at a different location. This continuous monitoring process increases data write reliability and reduces read error rates. The data value for this attribute represents the number of times deviations in head flying height were detected during writes.
- BD (189) Factory Bad Block Count (Micron): This attribute is related to the count of bad blocks that were present on the flash memory of the drive when it left the factory.
- BE (190) Airflow Temperature: This attribute represents the temperature of the airflow over the surface of the hard drive’s platters. In some Seagate drives, the current value is calculated as (100 – current temperature), so a higher airflow temperature results in a lower current value. The worst value is the lowest point the current value has reached, and the threshold is defined by the manufacturer as the highest allowable temperature. The data value itself may not have a direct practical meaning, and many hard drives may not provide this parameter.
- BF (191) G-sense Error Rate: This attribute’s data value records the frequency of errors caused by mechanical shocks or vibrations affecting the hard drive.
- C0 (192) Power-Off Retract Count: This attribute counts the number of times the hard drive’s read/write heads have been safely retracted to their parking position during power-off or unexpected power loss events. It represents the number of times the driver has safely parked its head to prevent any damage during power loss. Higher values may indicate frequent power-off events or power losses.
- C1 (193) Load/Unload Cycle Count: In the past, hard drives had read/write heads that would land on a parking zone at the center of the platter when the drive was not in use, causing wear when the drive spun up or down. This attribute tracks the number of times the heads have been loaded (moved to the data area) and unloaded (moved back to the parking zone). However, modern hard drives use a “ramp load” technology where the heads never touch the platters, significantly reducing wear. So, the importance of this attribute has diminished for modern drives.
- C2 (194) Temperature: This attribute reports the internal temperature of the hard drive. It provides the current temperature of the drive. It’s important to monitor drive temperature because excessively high temperatures can lead to increased mechanical wear and reduce drive performance. Generally, you want to keep the drive temperature below 45°C. The manufacturer typically specifies the maximum allowable operating temperature, which is usually below 60°C. Different drive manufacturers may represent this attribute differently. For some Seagate drives, the current value is the actual temperature in degrees Celsius, the worst value represents the highest temperature ever recorded, and the threshold value may not be meaningful. For some Western Digital drives, the worst value represents a time-based function of how long the temperature has been above a certain threshold, and the current value is inversely proportional to the current temperature.
- C3 (195) Hardware ECC Recovered: This attribute represents the number of times the hard drive’s Error Correcting Code (ECC) mechanism has successfully corrected errors during read/write operations. ECC is a technology that allows the drive to detect and correct errors, ensuring that data operations can continue without interruption. The data value indicates how many errors were corrected using ECC. However, the interpretation of this value may vary between drive manufacturers.
- C4 (196) Reallocated Events Count: This attribute should have a data value of 0, and the current value should be far greater than the threshold value. It counts the number of attempts to relocate (remap) data from a problematic sector to a spare sector. This occurs when a sector on the drive develops issues, and the drive tries to move the data to a reserved spare sector. Both successful and unsuccessful remapping attempts are counted. This attribute is related to bad sectors on the drive.
- C5 (197) Current Pending Sector Count: This attribute tracks the number of sectors that are unstable or pending remapping. These are sectors with read errors that have not yet been reallocated. If the drive successfully reads the sector later, it will be removed from the pending list. If a write operation fails on such a sector, it will be remapped. The current value represents the number of unstable sectors awaiting remapping.
- C6 (198) Offline Uncorrectable Sector Count: This attribute should have a data value of 0, and the current value should be far greater than the threshold value. It accumulates the number of uncorrectable errors encountered during read/write operations. An increase in this value suggests issues with the drive’s surface or mechanical components. When a sector is offline and cannot be corrected, it will typically be remapped in the future.
- C7 (199) Ultra ATA CRC Error Rate: This attribute’s data value accumulates the number of CRC (Cyclic Redundancy Check) errors detected during data transfers over the Ultra ATA interface. These errors may indicate problems with data cables, connectors, or the drive’s interface. If the value is not 0 and continues to increase rapidly, it suggests connectivity or interface issues that should be addressed.
- C8 (200) Write Error Rate / Multi-Zone Error Rate (Western Digital): This attribute should have a data value of 0, and the current value should be far greater than the threshold value. It tracks the number of errors encountered while writing data to sectors. An increasing value may indicate potential issues with the drive’s platters, heads, or other components.
- C9 (201) Off-Track Error Rate / Soft Read Error Rate: This attribute’s data value accumulates the number of off-track errors encountered during reads. If the value is not 0, it’s advisable to back up the data on the drive.
- CA (202) Data Address Mark Errors: A lower value for this attribute is better (or as defined by the manufacturer). It counts the number of errors related to data address marks on the drive.
- CA (202) Percentage Of The Rated Lifetime Used (Micron): This attribute represents the remaining life of the drive as a percentage of its rated lifetime. It starts at 100% and decreases to 0% as the drive approaches the end of its expected lifespan. The calculation is based on the number of write cycles used for MLC and SLC NAND flash types. When this attribute reaches 0%, it suggests that the drive has reached its expected lifespan.
- CA (202) Total Count of Error Bits from Flash (Indilinx Chip): This attribute represents the total number of error bits encountered in the flash memory of the drive, particularly relevant for drives using the Indilinx chip.
- CB (203) Soft ECC Correction: This attribute counts the number of errors corrected using software-based ECC (Error Correcting Code). It indicates the frequency of ECC errors that were corrected through software mechanisms.
- CC (204) Bad Block Full Flag (Indilinx Chip): This attribute, specific to drives with the Indilinx chip, is related to the status of bad blocks on the drive. The data value may indicate whether there are bad blocks approaching capacity.
- CD (205) Thermal Asperity Rate (TAR): This attribute should ideally have a data value of 0. It relates to errors caused by overheating. If there are nonzero values, it suggests issues associated with drive temperature.
- CE (206) Flying Height: This attribute represents the vertical distance between the read/write heads and the surface of the platters. A too-low or too-high flying height can impact the drive’s performance and reliability. However, this value is typically inferred rather than directly measured, and it’s based on the strength of signals from the read heads.
- CF (207) Spin High Current: This attribute records instances of high current surges in the spindle motor. An increase in this count may indicate issues with the motor or power supply.
- D0 (208) Spin Buzz: This attribute counts the number of times the spindle motor attempted to start but encountered issues, often due to inadequate power supply.
- D1 (209) Offline Seek Performance: This attribute represents the drive’s seek performance while in an offline state, usually used for factory testing purposes.
- D2 (210) Ramp Load Value: This attribute is specific to some older drives and often has a data value of 0, and its exact significance may not be clear.
- D3 (211) Vibration During Write: It records instances of external vibrations experienced during write operations.
- D4 (212) Shock During Write: This attribute records instances of mechanical shocks experienced during write operations.
- DC (220) Disk Shift: This attribute indicates the offset or shift of the platters relative to the spindle. Smaller values are preferable, as larger shifts can be caused by external forces or temperature changes.
- DD (221) G-Sense Error Rate: Similar to BF, this attribute tracks the frequency of errors caused by external mechanical shocks or vibrations.
- DE (222) Loaded Hours: This attribute accumulates the total operating hours of the read/write head assembly, reflecting the time the seek motor has been in operation.
- DF (223) Load/Unload Retry Count: Similar to attribute C1, this attribute counts the number of times the drive has attempted to retry loading or unloading the heads.
- E0 (224) Load Friction: This attribute measures the mechanical resistance experienced by the read/write heads during operation.
- E1 (225) Host Writes: This attribute is specific to solid-state drives (SSDs) and counts the number of times data has been written to the drive. It’s important because SSDs have a limited number of write cycles.
- E2 (226) Load ‘In’-time: This attribute accumulates the running time of the read/write head assembly when it’s not in the parked position, similar to attribute DE.
- E3 (227) Torque Amplification Count: This attribute counts the number of times the spindle motor has attempted to increase torque to compensate for variations in platter rotation speed. It may indicate issues with the spindle motor or bearings.
- E4 (228) Power-Off Retract Cycle: Similar to attribute C0, this attribute accumulates the count of times the heads have automatically retracted due to unexpected power-off events.
- E6 (230) GMR Head Amplitude: This attribute measures the “jitter” or oscillation distance of the heads during operation.
- E7 (231) Temperature: This attribute represents the internal temperature of the drive, similar to attribute C2.
- E7 (231) SSD Life Left: For SSDs, this attribute indicates the remaining life of the drive based on P/E (Program/Erase) cycles and available spare blocks. Higher values indicate a healthier SSD.
- E8 (232) Endurance Remaining: Similar to attribute CA, this attribute represents the percentage of the remaining lifespan of the SSD based on write cycles.
- E8 (232) Available Reserved Space (Intel Chip): Specific to Intel SSDs, it indicates the remaining capacity reserved for replacing damaged storage units. As it decreases, the SSD’s lifespan may be impacted.
- E9 (233) Power-On Hours: For regular hard drives, this attribute is similar to attribute 09.
- E9 (233) Media Wearout Indicator (Intel Chip): It tracks the wear and tear of the NAND flash memory in Intel SSDs. A decreasing value indicates the drive is nearing its designed lifespan.
- F0 (240) Head Flying Hours / Transfer Error Rate (Fujitsu): This attribute represents the time the read/write heads have spent in the working position. For Fujitsu drives, it also indicates the number of times a connection was reset during data transfer.
- F1 (241) Total LBAs Written: This attribute accumulates the total count of Logical Block Addresses (LBAs) written to the drive.
- F1 (241) Lifetime Writes from Host: It reflects the total amount of data written by the host to the drive since it was enabled.
- F2 (242) Total LBAs Read: This attribute accumulates the total count of Logical Block Addresses (LBAs) read from the drive. Some SMART tools may display negative values when using 48-bit LBAs instead of 32-bit LBAs.
- F2 (242) Lifetime Reads from Host: It reflects the total amount of data read by the host from the drive since it was enabled.
- FA (250) Read Error Retry Rate: This attribute counts the number of times read operations encountered errors and required retries.
- FE (254) Free Fall Protection: Some laptop hard drives have free fall protection that detects movement and quickly parks the heads to prevent physical damage. This attribute counts the number of times this protection mechanism has been activated.
In Conclusion #
SMART tools allow you to perform checks on the health of your SSD. To obtain precise information, it is recommended to utilize proprietary software provided by the SSD manufacturer. These specialized programs are designed to offer the most accurate insights into the status and performance of your solid-state drive, ensuring that you can effectively monitor its condition and address any potential issues.