logo
banner banner

Blog Details

Created with Pixso. Home Created with Pixso. Blog Created with Pixso.

NAS Features of X67: RAID Technology's Concepts, Principles and Applications

NAS Features of X67: RAID Technology's Concepts, Principles and Applications

2025-01-07

RAID (Redundant Array of Independent Disks), originally known as Redundant Array of Inexpensive Disks, was first proposed by Professor D. A. Patterson of the University of California, Berkeley in the paper "A Case of Redundant Array of Inexpensive Disks" in 1988. At that time, large-capacity disks were expensive, so the basic idea of RAID was to organically combine multiple small-capacity and relatively inexpensive disks to obtain the capacity, performance and reliability equivalent to expensive large-capacity disks at a lower cost. As the cost and price of disks continued to decrease, the term "inexpensive" became meaningless, and the RAID Advisory Board (RAB) decided to replace "inexpensive" with "independent".

 

This design idea of RAID was quickly adopted by the industry. RAID technology, as a high-performance and highly reliable storage technology, has been widely applied. RAID mainly uses data striping, mirroring and data parity technologies to achieve high performance, reliability, fault tolerance and scalability. According to the strategies and architectures of using or combining these three technologies, RAID can be divided into different levels to meet the needs of different data applications. The original RAID levels RAID1-RAID5 were defined in the paper by D. A. Patterson et al., and RAID0 and RAID6 have been expanded since 1988. In recent years, storage vendors have continuously introduced RAID levels such as RAID7, RAID10/01, RAID50, RAID53 and RAID100, but there is no unified standard. At present, the industry-recognized standards are RAID0-RAID5, and the four levels except RAID2 have been set as industrial standards. The most commonly used RAID levels in the actual application field are RAID0, RAID1, RAID3, RAID5, RAID6 and RAID10.

 

From the implementation perspective, RAID is mainly divided into three types: software RAID, hardware RAID and hybrid RAID. For software RAID, all functions are completed by the operating system and CPU, and there is no independent RAID control/processing chip and I/O processing chip, so the efficiency is the lowest. Hardware RAID is equipped with a special RAID control/processing chip and I/O processing chip as well as an array buffer, and does not occupy CPU resources, but the cost is very high. Hybrid RAID has a RAID control/processing chip but lacks an I/O processing chip, and needs the CPU and driver programs to complete, and its performance and cost are between software RAID and hardware RAID.

 

Each RAID level represents an implementation method and technology, and there is no distinction between high and low levels. In practical applications, the appropriate RAID level and specific implementation method should be selected according to the characteristics of user data applications, and the availability, performance and cost should be comprehensively considered.

 

Basic Principles

 

RAID, namely Redundant Array of Independent Disks, is usually abbreviated as disk array. Briefly, RAID is a disk subsystem composed of multiple independent high-performance disk drives, which provides higher storage performance and data redundancy technology than a single disk. RAID is a multi-disk management technology that provides cost-effective, high data reliability and high-performance storage to the host environment. The definition of RAID by SNIA is: a disk array in which part of the physical storage space is used to record the redundant information of user data stored in the remaining space. When a disk or access path fails, the redundant information can be used to reconstruct the user data. Although disk striping does not conform to the definition of RAID, it is usually also called RAID (i.e., RAID0).

 

The original intention of RAID was to provide high-end storage functions and redundant data security for large servers. In the whole system, RAID is regarded as a storage space composed of two or more disks, and the I/O performance of the storage system is improved by reading and writing data on multiple disks concurrently. Most RAID levels have complete data verification and correction measures, and even mirroring methods, which greatly enhance the reliability of the system, and that's where "Redundant" comes from.

 

Here we need to mention JBOD (Just a Bunch of Disks). Initially, JBOD was used to represent a disk collection without control software to provide coordinated control, which is the main factor distinguishing RAID from JBOD. At present, JBOD often refers to a disk enclosure, regardless of whether it provides RAID functionality or not.

 

The two key objectives of RAID are to improve data reliability and I/O performance. In the disk array, the data is scattered among multiple disks, but for the computer system, it looks like a single disk. Redundancy is achieved by writing the same data to multiple disks (typically mirroring) or writing the calculated parity data into the array, so that data loss will not be caused when a single disk fails. Some RAID levels allow more disks to fail at the same time, such as RAID6, where two disks can be damaged at the same time. Under such a redundancy mechanism, the failed disk can be replaced with a new disk, and RAID will automatically reconstruct the lost data according to the data and parity data in the remaining disks to ensure data consistency and integrity. The data is scattered and stored on multiple different disks in RAID, and the concurrent data reading and writing is much better than that of a single disk, so higher aggregated I/O bandwidth can be obtained. Of course, the disk array will reduce the total available storage space of all disks, sacrificing space in exchange for higher reliability and performance. For example, the storage space utilization of RAID1 is only 50%, and RAID5 will lose the storage capacity of one disk, and the space utilization is (n-1)/n.

 

The disk array can ensure the continuous operation of the system without interruption when some disks (single or multiple, depending on the implementation) are damaged. During the process of reconstructing the data of the failed disk to the new disk, the system can continue to operate normally, but the performance will be reduced to a certain extent. Some disk arrays must be shut down when adding or deleting disks, while some support hot swapping, allowing the replacement of disk drives without shutting down. This high-end disk array is mainly used in application systems with high requirements for reliability, and the system cannot be shut down or the shutdown time should be as short as possible. Generally speaking, RAID cannot replace data backup. It is powerless for data loss caused by non-disk failures, such as viruses, human destruction, accidental deletion, etc. At this time, the data loss is relative to the operating system, file system, volume manager or application system. For the RAID system itself, the data is intact and no loss has occurred. Therefore, data backup, disaster recovery and other data protection measures are very necessary, which complement RAID and protect the security of data at different levels to prevent data loss.

 

There are three key concepts and technologies in RAID: mirroring, data striping and data parity. Mirroring copies data to multiple disks. On the one hand, it can improve reliability, and on the other hand, it can read data from two or more copies concurrently to improve read performance. Obviously, the write performance of mirroring is slightly lower, and it takes more time to ensure that the data is correctly written to multiple disks. Data striping stores data slices on multiple different disks, and multiple data slices together form a complete data copy, which is different from the multiple copies of mirroring and is usually used for performance considerations. Data striping has a higher concurrency granularity. When accessing data, it is possible to read and write data on different disks at the same time, thus obtaining a very significant I/O performance improvement. Data parity uses redundant data for data error detection and repair. The redundant data is usually calculated by algorithms such as Hamming code and XOR operation. Using the parity function can greatly improve the reliability, robustness and fault tolerance of the disk array. However, data parity needs to read data from multiple places and perform calculations and comparisons, which will affect the system performance. Different levels of RAID adopt one or more of the above three technologies to obtain different data reliability, availability and I/O performance. As for what kind of RAID (even new levels or types) to design or what mode of RAID to adopt, it is necessary to make a reasonable choice under the premise of deeply understanding the system requirements and comprehensively evaluate the reliability, performance and cost to make a compromise choice.

 

Advantages of RAID

 

  • Large Capacity: This is an obvious advantage of RAID. It expands the disk capacity, and the RAID system composed of multiple disks has huge storage space. Now the capacity of a single disk can reach more than 1TB, so the storage capacity of RAID can reach the PB level, and most storage requirements can be met. Generally speaking, the available capacity of RAID is less than the total capacity of all member disks. Different levels of RAID algorithms require a certain redundancy overhead, and the specific capacity overhead is related to the adopted algorithm. If the RAID algorithm and capacity are known, the available capacity of RAID can be calculated. Usually, the capacity utilization of RAID is between 50% and 90%.

  • High Performance: The high performance of RAID benefits from the data striping technology. The I/O performance of a single disk is limited by computer technologies such as interface and bandwidth, and is often the bottleneck of the system performance. Through data striping, RAID distributes the data I/O to each member disk, thus obtaining the aggregated I/O performance that is several times higher than that of a single disk.

  • Reliability: Availability and reliability are another important features of RAID. Theoretically, the reliability of a RAID system composed of multiple disks should be worse than that of a single disk. There is an implicit assumption here: a single disk failure will cause the entire RAID to be unavailable. RAID uses data redundancy technologies such as mirroring and data parity to break this assumption. Mirroring is the most primitive redundancy technology, which completely copies the data on a certain group of disk drives to another group of disk drives to ensure that there is always a data copy available. Compared with the 50% redundancy overhead of mirroring, the data parity is much smaller, and it uses the parity redundant information to verify and correct the data. The redundancy technology of RAID greatly improves the data availability and reliability, and ensures that when several disks fail, data will not be lost and the continuous operation of the system will not be affected.

  • Manageability: In fact, RAID is a virtualization technology that virtualizes multiple physical disk drives into a large-capacity logical drive. For the external host system, RAID is a single, fast and reliable large-capacity disk drive. In this way, users can organize and store the application system data on this virtual drive. From the user application perspective, it can make the storage system simple and easy to use and manage. Since RAID has completed a large amount of storage management work internally, the administrator only needs to manage a single virtual drive, which can save a lot of management work. RAID can dynamically add or delete disk drives and automatically perform data verification and data reconstruction, which can greatly simplify the management work.

banner
Blog Details
Created with Pixso. Home Created with Pixso. Blog Created with Pixso.

NAS Features of X67: RAID Technology's Concepts, Principles and Applications

NAS Features of X67: RAID Technology's Concepts, Principles and Applications

RAID (Redundant Array of Independent Disks), originally known as Redundant Array of Inexpensive Disks, was first proposed by Professor D. A. Patterson of the University of California, Berkeley in the paper "A Case of Redundant Array of Inexpensive Disks" in 1988. At that time, large-capacity disks were expensive, so the basic idea of RAID was to organically combine multiple small-capacity and relatively inexpensive disks to obtain the capacity, performance and reliability equivalent to expensive large-capacity disks at a lower cost. As the cost and price of disks continued to decrease, the term "inexpensive" became meaningless, and the RAID Advisory Board (RAB) decided to replace "inexpensive" with "independent".

 

This design idea of RAID was quickly adopted by the industry. RAID technology, as a high-performance and highly reliable storage technology, has been widely applied. RAID mainly uses data striping, mirroring and data parity technologies to achieve high performance, reliability, fault tolerance and scalability. According to the strategies and architectures of using or combining these three technologies, RAID can be divided into different levels to meet the needs of different data applications. The original RAID levels RAID1-RAID5 were defined in the paper by D. A. Patterson et al., and RAID0 and RAID6 have been expanded since 1988. In recent years, storage vendors have continuously introduced RAID levels such as RAID7, RAID10/01, RAID50, RAID53 and RAID100, but there is no unified standard. At present, the industry-recognized standards are RAID0-RAID5, and the four levels except RAID2 have been set as industrial standards. The most commonly used RAID levels in the actual application field are RAID0, RAID1, RAID3, RAID5, RAID6 and RAID10.

 

From the implementation perspective, RAID is mainly divided into three types: software RAID, hardware RAID and hybrid RAID. For software RAID, all functions are completed by the operating system and CPU, and there is no independent RAID control/processing chip and I/O processing chip, so the efficiency is the lowest. Hardware RAID is equipped with a special RAID control/processing chip and I/O processing chip as well as an array buffer, and does not occupy CPU resources, but the cost is very high. Hybrid RAID has a RAID control/processing chip but lacks an I/O processing chip, and needs the CPU and driver programs to complete, and its performance and cost are between software RAID and hardware RAID.

 

Each RAID level represents an implementation method and technology, and there is no distinction between high and low levels. In practical applications, the appropriate RAID level and specific implementation method should be selected according to the characteristics of user data applications, and the availability, performance and cost should be comprehensively considered.

 

Basic Principles

 

RAID, namely Redundant Array of Independent Disks, is usually abbreviated as disk array. Briefly, RAID is a disk subsystem composed of multiple independent high-performance disk drives, which provides higher storage performance and data redundancy technology than a single disk. RAID is a multi-disk management technology that provides cost-effective, high data reliability and high-performance storage to the host environment. The definition of RAID by SNIA is: a disk array in which part of the physical storage space is used to record the redundant information of user data stored in the remaining space. When a disk or access path fails, the redundant information can be used to reconstruct the user data. Although disk striping does not conform to the definition of RAID, it is usually also called RAID (i.e., RAID0).

 

The original intention of RAID was to provide high-end storage functions and redundant data security for large servers. In the whole system, RAID is regarded as a storage space composed of two or more disks, and the I/O performance of the storage system is improved by reading and writing data on multiple disks concurrently. Most RAID levels have complete data verification and correction measures, and even mirroring methods, which greatly enhance the reliability of the system, and that's where "Redundant" comes from.

 

Here we need to mention JBOD (Just a Bunch of Disks). Initially, JBOD was used to represent a disk collection without control software to provide coordinated control, which is the main factor distinguishing RAID from JBOD. At present, JBOD often refers to a disk enclosure, regardless of whether it provides RAID functionality or not.

 

The two key objectives of RAID are to improve data reliability and I/O performance. In the disk array, the data is scattered among multiple disks, but for the computer system, it looks like a single disk. Redundancy is achieved by writing the same data to multiple disks (typically mirroring) or writing the calculated parity data into the array, so that data loss will not be caused when a single disk fails. Some RAID levels allow more disks to fail at the same time, such as RAID6, where two disks can be damaged at the same time. Under such a redundancy mechanism, the failed disk can be replaced with a new disk, and RAID will automatically reconstruct the lost data according to the data and parity data in the remaining disks to ensure data consistency and integrity. The data is scattered and stored on multiple different disks in RAID, and the concurrent data reading and writing is much better than that of a single disk, so higher aggregated I/O bandwidth can be obtained. Of course, the disk array will reduce the total available storage space of all disks, sacrificing space in exchange for higher reliability and performance. For example, the storage space utilization of RAID1 is only 50%, and RAID5 will lose the storage capacity of one disk, and the space utilization is (n-1)/n.

 

The disk array can ensure the continuous operation of the system without interruption when some disks (single or multiple, depending on the implementation) are damaged. During the process of reconstructing the data of the failed disk to the new disk, the system can continue to operate normally, but the performance will be reduced to a certain extent. Some disk arrays must be shut down when adding or deleting disks, while some support hot swapping, allowing the replacement of disk drives without shutting down. This high-end disk array is mainly used in application systems with high requirements for reliability, and the system cannot be shut down or the shutdown time should be as short as possible. Generally speaking, RAID cannot replace data backup. It is powerless for data loss caused by non-disk failures, such as viruses, human destruction, accidental deletion, etc. At this time, the data loss is relative to the operating system, file system, volume manager or application system. For the RAID system itself, the data is intact and no loss has occurred. Therefore, data backup, disaster recovery and other data protection measures are very necessary, which complement RAID and protect the security of data at different levels to prevent data loss.

 

There are three key concepts and technologies in RAID: mirroring, data striping and data parity. Mirroring copies data to multiple disks. On the one hand, it can improve reliability, and on the other hand, it can read data from two or more copies concurrently to improve read performance. Obviously, the write performance of mirroring is slightly lower, and it takes more time to ensure that the data is correctly written to multiple disks. Data striping stores data slices on multiple different disks, and multiple data slices together form a complete data copy, which is different from the multiple copies of mirroring and is usually used for performance considerations. Data striping has a higher concurrency granularity. When accessing data, it is possible to read and write data on different disks at the same time, thus obtaining a very significant I/O performance improvement. Data parity uses redundant data for data error detection and repair. The redundant data is usually calculated by algorithms such as Hamming code and XOR operation. Using the parity function can greatly improve the reliability, robustness and fault tolerance of the disk array. However, data parity needs to read data from multiple places and perform calculations and comparisons, which will affect the system performance. Different levels of RAID adopt one or more of the above three technologies to obtain different data reliability, availability and I/O performance. As for what kind of RAID (even new levels or types) to design or what mode of RAID to adopt, it is necessary to make a reasonable choice under the premise of deeply understanding the system requirements and comprehensively evaluate the reliability, performance and cost to make a compromise choice.

 

Advantages of RAID

 

  • Large Capacity: This is an obvious advantage of RAID. It expands the disk capacity, and the RAID system composed of multiple disks has huge storage space. Now the capacity of a single disk can reach more than 1TB, so the storage capacity of RAID can reach the PB level, and most storage requirements can be met. Generally speaking, the available capacity of RAID is less than the total capacity of all member disks. Different levels of RAID algorithms require a certain redundancy overhead, and the specific capacity overhead is related to the adopted algorithm. If the RAID algorithm and capacity are known, the available capacity of RAID can be calculated. Usually, the capacity utilization of RAID is between 50% and 90%.

  • High Performance: The high performance of RAID benefits from the data striping technology. The I/O performance of a single disk is limited by computer technologies such as interface and bandwidth, and is often the bottleneck of the system performance. Through data striping, RAID distributes the data I/O to each member disk, thus obtaining the aggregated I/O performance that is several times higher than that of a single disk.

  • Reliability: Availability and reliability are another important features of RAID. Theoretically, the reliability of a RAID system composed of multiple disks should be worse than that of a single disk. There is an implicit assumption here: a single disk failure will cause the entire RAID to be unavailable. RAID uses data redundancy technologies such as mirroring and data parity to break this assumption. Mirroring is the most primitive redundancy technology, which completely copies the data on a certain group of disk drives to another group of disk drives to ensure that there is always a data copy available. Compared with the 50% redundancy overhead of mirroring, the data parity is much smaller, and it uses the parity redundant information to verify and correct the data. The redundancy technology of RAID greatly improves the data availability and reliability, and ensures that when several disks fail, data will not be lost and the continuous operation of the system will not be affected.

  • Manageability: In fact, RAID is a virtualization technology that virtualizes multiple physical disk drives into a large-capacity logical drive. For the external host system, RAID is a single, fast and reliable large-capacity disk drive. In this way, users can organize and store the application system data on this virtual drive. From the user application perspective, it can make the storage system simple and easy to use and manage. Since RAID has completed a large amount of storage management work internally, the administrator only needs to manage a single virtual drive, which can save a lot of management work. RAID can dynamically add or delete disk drives and automatically perform data verification and data reconstruction, which can greatly simplify the management work.