As Linux has become a mainstream operating system, the need to have reliable, high-availability data has become important. Most operating systems use RAID to provide this need, and Linux is no exception. RAID uses a set of hard drives working together to provide data redundancy and higher access speeds to the data (see sidebar What is RAID?).

RAID is implemented in only two ways: hardware or software. Hardware solutions to RAID require dedicated drive arrays managed by a RAID controller. RAID controllers provide RAID levels 0 through 5, but usually focus on RAID 0, RAID 1, and RAID 5. Almost all RAID arrays are SCSI based, although there are several IDE-based RAID systems now on the market aimed directly at the lower priced Linux (and UNIX) workstations. SCSI RAID controllers used to be expensive, often costing several thousands of dollars, but more reasonably priced RAID controllers (several hundreds of dollars) have started to appear on the SCSI market, led by companies such as Adaptec.

What is RAID?

Redundant Array of Inexpensive Drives -- RAID -- was first proposed in papers written at UCB, although the term wasn’t defined as well as it is today. RAID was originally defined as a subsystem of two or more disk drives treated by the operating system as a single logical drive. The purpose was to take advantage of data redundancy inherent in the multiple drive design.

Originally, there were six levels of RAID (RAID 0 through RAID 5), and a sixth has been added. RAID 0 is the base level which stripes data across drives. No parity control is used and there is no data redundancy with RAID 0. However, the striping of data across two or more drives does provide an increase in performance through load-balancing across the array’s drives.

RAID 1 is called disk mirroring or disk shadowing. Each hard drive has a duplicate drive containing an exact copy. Since every bit on each drive is duplicated on another drive, there is data redundancy. When one drive develops a problem or fails, the mirror can maintain system operations while the faulty drive is corrected. RAID 1 offers an increase in read performance (since two drives can be read at the same time) but write operations are not faster. The net effect is no noticeable change in system performance.

RAID 2 tries to overcome the limitations of RAID 1 through the use of Hamming codes. In 1950 Dr. Hamming showed that if data could be organized so that an error will affect only one bit in each group, the error can be detected and corrected. For example, nine drives can be used in a RAID 2 array which has the first bit of every byte written on the first drive, the second bit of the byte on the second drive, and so on. The ninth drive has an error correction code. A fault on one drive would be detected by the correction code, and fixed through a real-time algorithm. Since all drives in a RAID 2 array can seek and write in parallel, throughput is faster. The problem with RAID 2 is it requires large disk arrays, which are seldom practical for small systems.

RAID 3 uses a different architecture than previous levels. In a RAID 3 array, two or more drives hold data while an extra drive holds the error correction code. Data is interleaved across the data drives so that the first byte is on the first drive, the second byte on the second, and so forth. The counting wraps back to the first data drive when all the drives are used (so in a two-drive array, drive one will have the odd-numbered bytes and drive two will have the even-numbered bytes). The error correction drive holds a bit-sensitive exclusive-ORed value for each sector of drive space on the data drives. RAIDS 3 shows some performance increase because of simultaneous read and write operations. RAID 3 is used with larger systems where huge files must be read sequentially and the performance advantage is noticeable. For small systems with many small files, it does not offer a significant performance increase.

RAID 4 tries to solve the primary problem with RAID 3: sequential disk I/Os and large block sizes. With RAID 4 large blocks of data (such as a sector) are written to the first drive, the next block to the second drive, and so on. A single error correction drive is involved. In case of a data disk error, missing data can be reconstructed from the error correction codes. While reads are usually faster, there is no significant increase in write time with RAID 4.

RAID 5 addresses a significant flaw in RAID 4. Since RAID 4 uses a dedicated error correction drive, write limitations are inherent as codes are written to the error drive. Although not a problem for most systems, for those involved in high data throughput (such as transaction processing) this can become a bottleneck. RAID 5 addresses the problem by removing the dedicated error correction drive and spreading the error correction data across all the drives in the array. Each drive in the array holds data and error correction information for the other drives. RAID 5 doesn’t improve read times over RAID 4 performance, but write times are increased because they can be performed in parallel. In theory, RAID 5 allows all drives to read and write in parallel so as the number of drives in the array increases, transfer rates drop by the inverse (so if four drives are used, read and write operations are a quarter of the speeds of using only one drive). RAID 5 is one of the widest used RAID levels.

There are a few new RAID systems that have appeared, but none have gained a significant following. ECCS introduced RAID 10, and it offers mirrored pairs of drives with block striping between the drives. Essentially, RAID 10 is a combination of RAID 1 and RAID 0. Although this approach offers scalability and good redundancy, RAID 10 is not a new product but a combination of older RAID implementations. RAID 10 subsystems tend to be expensive. A new design called RAID 53 was introduced by Hi-DATA, which is intended to combine the reliability of RAID 3 with the lower cost of RAID 5. It used a RAID 3 array for each drive in a RAID 5 model. Again, cost becomes higher than is reasonable for smaller systems, so RAID 53 is usually encountered only on high-end mainframes and multiple-processor minicomputers.