Pros and cons of caching controllers in RAID arrays

Except for corner-case applications such as video broadcasting and other multimedia, caching can dramatically increase I/O performance. But how much? And at what risk? This article examines the question of data reliability. Low price Raid card online Buy in India, Shop Refurbished IT spare parts online in India
Mission critical arrays are distinguished by their ability to perform in adverse conditions. This includes errors and component failures along I/O paths, which include hubs, switches and controllers, drives and cables. You can trust that cached write will make it to disk safely and you can trust the currency of those cached reads.
What happens if the I/O path fails? Or worse, if your applications don’t notice the problem? These situations are determined by the strength of your array’s error recovery mechanisms. This determines whether or not your database has been corrupted.
How to cache
RAID controllers typically offer two types of write caching, write-back or write-through.
Read caching.The locality of reference principle is the basis of read caching. Applications will refer to data that is close to previously referenced data more often than data that is stored far away. This means that the next block an app will likely read is more likely to be stored in the same data block as the previous block. Although read-caching algorithms can be complex, the fundamental idea is the following: After each read operation, grab the requested blocks and all the others in the stripe. This is why the term “read ahead” caching is used. The application returns the requested block and the rest are saved in the read cache. A “cache hit” is a request for a later read I/O to store a block in the cache. This avoids a long trip to the disk pool.
Write caching. Another simple principle underpinning write caching is that it takes only a few milliseconds for data to be stored in a controller’s cache and a fraction of a second to store it on disk. More than 1,000 times faster to write to or read from cache than to disk. There are two types: write-back or write-through.
Write-back caching allows a write to be written to cache. The I/O is then acknowledged as complete to the server that issued it. A few minutes later, the cached write will be written to disk or flushed to it. The application assumes that the data is stored permanently on disk when it receives an I/O complete acknowledgment. Write-through caching is sometimes called conservative cache mode. This means that writes are written to both the cache as well as the disk before they are acknowledged as complete. With applications that read recent data frequently, write-through caching can improve I/O performance.
Caching can be a cost-effective method to increase I/O performance. If RAID controllers cannot be configured in dual-active pairs and equipped with robust recovery mechanisms and cache coherency, caching could cause corrupt data to be delivered and databases to become corrupted if elements in the I/O path are not working.
Cache mirroring.
The RAID controller is one element of the I/O path that can cause data integrity problems if it fails. Data stored in a writeback cache is vulnerable until it’s made permanent on disk. This is done as a background task later when there are spare cycles. A controller that has write-back cache enabled may fail. The writes to its cache could be lost. Since the controller already acknowledged the I/Os complete, the application will not know of the data loss. This type of data corruption, also known as the “lost-write” problem in database parlance is called “lost write”. Although the application believes that the writes were saved to disk, the controller’s data cache never saw the file.
The battery backup units (BBUs), which are provided by RAID array vendors, preserve cache contents in the event of power outages. BBUs cannot protect data from controller failures unless: 1) The battery-backed memory can be transported; 2) The battery circuit or cache memory are not the cause; and 3) The failure didn’t propagate to the memory, corrupting it before the controller shut down. Even if all these conditions are met transportable battery-backed memories technology fails in mission-critical environments. They cannot wait for a field engineer with a replacement controller.
Although they perform the same functions as BBUs in many ways, UPSs can be kept alive for at least a few hours to allow the controller to clear its write cache. RAID controllers will automatically switch to conservative cache mode if operations are not stopped by the UPS battery. This ensures that all writes are saved on disk before I/Os are acknowledged.
Single-controller arrays do not have a reliable cache recovery mechanism that protects cached write against controller failures. External storage arrays equipped with dual-active RAID controllers may provide reliable cache recovery mechanisms called mirror caching. During normal operation, both dual-active controllers share I/O workload. However, if one controller fails to function, the other controller takes over the entire I/O workload.
Dual-active RAID configurations that include cache mirroring allow writes to be written to both controllers before they are acknowledged as complete. Some controller designs have a write buffer that is reserved for mirroring writes. Writes are mirror to the write buffer of each controller when both controllers have been operational. Failure of a controller causes the other controller to complete the write operations. The controller’s write buffer is then flushed to disk and the database is restored to its original state. Transparently, the surviving controller fails over the host address of the failing controller. It then updates its configuration files and takes on the workload of that controller.