Physical Storage SAN RAID etc
The choice of where and how to store the data for the data warehousing system is a critical architectural question. Part of the issue is data warehousing “style”:
- Enterprise data warehouse
- Basic data warehouse system
- Specialized data warehouse / data marts
- Federated data warehouse
- Virtual data warehouse
- ROLAP vs MOLAP
The basic data warehouse system calls for the creation of the following types of databases:
- Data source staging (Kimball’s “Landing Area”)
- Data warehouse (Kimbal’s “Backroom”)
- Data marts (Kimball’s “Frontroom”)
Data Model Patterns for Data Warehousing
A data model is a graphical view of data created for analysis and design purposes. While architecture does not include designing data warehouse database in detail, it does include defining principles and patterns modeling specialized parts of the data warehouse system.
Areas that require specialized patterns are:
- Staging / Landing Area – looks like source system
- Data warehouse / Backroom – uses normalized ERD
- Data Mart / Frontroom – uses dimension modeling – the ROLAP star schema or the MOLAP cube
In addition to these specialized patterns, the architecture should include other pattern descriptions for:
- Naming of tables and columns
- Assignment of keys
- Relational Integrity (RI)
- Audit history
he data warehouse contains the physical storage areas for configuration information, normalized data, and product data.
The data warehouse can run in other environments but only the following environments are recommended:
|Item||Requirements and guidelines|
The factors to consider for a data warehouse are –
Configure I/O for Bandwidth not Capacity
Storage configurations for a data warehouse should be chosen based on the I/O bandwidth that they can provide, and not necessarily on their overall storage capacity. Buying storage based solely on capacity has the potential for making a mistake, especially for systems less than 500GB is total size. The capacity of individual disk drives is growing faster than the I/O throughput rates provided by those disks, leading to a situation in which a small number of disks can store a large volume of data, but cannot provide the same I/O throughput as a larger number of small disks.
As an example, consider a 200GB data mart. Using 72GB drives, this data mart could be built with as few as six drives in a fully-mirrored environment. However, six drives might not provide enough I/O bandwidth to handle a medium number of concurrent users on a 4-CPU server. Thus, even though six drives provide sufficient storage, a larger number of drives may be required to provide acceptable performance for this system.
While it may not be practical to estimate the I/O bandwidth that will be required by a data warehouse before a system is built, it is generally practical with the guidance of the hardware manufacturer to estimate how much I/O bandwidth a given server can potentially utilize, and ensure that the selected I/O configuration will be able to successfully feed the server. There are many variables in sizing the I/O systems, but one basic rule of thumb is that your data warehouse system should have multiple disks for each CPU (at least two disks for each CPU at a bare minimum) in order to achieve optimal performance.
Stripe Far and Wide
The guiding principle in configuring an I/O system for a data warehouse is to maximize I/O bandwidth by having multiple disks and channels access each database object. You can do this by striping the datafiles of the Oracle Database. A striped file is a file distributed across multiple disks. This striping can be managed by software (such as a logical volume manager), or within the storage hardware. The goal is to ensure that each tablespace is striped across a large number of disks (ideally, all of the disks) so that any database object can be accessed with the highest possible I/O bandwidth.
Because data warehouses are often the largest database systems in a company, they have the most disks and thus are also the most susceptible to the failure of a single disk. Therefore, disk redundancy is a requirement for data warehouses to protect against a hardware failure. Like disk-striping, redundancy can be achieved in many ways using software or hardware.
A key consideration is that occasionally a balance must be made between redundancy and performance. For example, a storage system in a RAID-5 configuration may be less expensive than a RAID-0+1 configuration, but it may not perform as well, either. Redundancy is necessary for any data warehouse, but the approach to redundancy may vary depending upon the performance and cost constraints of each data warehouse.
Test the I/O System Before Building the Database
The most important time to examine and tune the I/O system is before the database is even created. Once the database files are created, it is more difficult to reconfigure the files. Some logical volume managers may support dynamic reconfiguration of files, while other storage configurations may require that files be entirely rebuilt in order to reconfigure their I/O layout. In both cases, considerable system resources must be devoted to this reconfiguration.
When creating a data warehouse on a new system, the I/O bandwidth should be tested before creating all of the database datafiles to validate that the expected I/O levels are being achieved. On most operating systems, this can be done with simple scripts to measure the performance of reading and writing large test files.
Plan for Growth
A data warehouse designer should plan for future growth of a data warehouse. There are many approaches to handling the growth in a system, and the key consideration is to be able to grow the I/O system without compromising on the I/O bandwidth. You cannot, for example, add four disks to an existing system of 20 disks, and grow the database by adding a new tablespace striped across only the four new disks. A better solution would be to add new tablespaces striped across all 24 disks, and over time also convert the existing tablespaces striped across 20 disks to be striped across all 24 disks.
Two features to consider for managing disks are Oracle Managed Files and Automatic Storage Management. Without these features, a database administrator must manage the database files, which, in a data warehouse, can be hundreds or even thousands of files. Oracle Managed Files simplifies the administration of a database by providing functionality to automatically create and manage files, so the database administrator no longer needs to manage each database file. Automatic Storage Management provides additional functionality for managing not only files but also the disks. With Automatic Storage Management, the database administrator would administer a small number of disk groups. Automatic Storage Management handles the tasks of striping and providing disk redundancy, including rebalancing the database files when new disks are added to the system.
A storage area network (SAN) is a dedicated network that provides access to consolidated, block level data storage. SANs are primarily used to make storage devices, such as disk arrays, tape libraries, and optical jukeboxes, accessible to servers so that the devices appear like locally attached devices to the operating system. A SAN typically has its own network of storage devices that are generally not accessible through the local area network by other devices. The cost and complexity of SANs dropped in the early 2000s to levels allowing wider adoption across both enterprise and small to medium sized business environments.
A SAN does not provide file abstraction, only block-level operations. However, file systems built on top of SANs do provide file-level access, and are known as SAN filesystems or shared disk file systems.
A storage area network (SAN) is a high-speed special-purpose network (or subnetwork) that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users.
Typically, a storage area network is part of the overall network of computing resources for an enterprise. A storage area network is usually clustered in close proximity to other computing resources such as IBM z990 mainframes but may also extend to remote locations for backup and archival storage, using wide area network carrier technologies such as ATM or SONET.
A storage area network can use existing communication technology such as IBM’s optical fiber ESCON or use Fibre Channel technology. Some SAN system integrators liken it to the common storage bus (flow of data) in a personal computer that is shared by different kinds of storage devices such as a hard disk or a CD-ROM player.
SANs support disk mirroring, backup and restore, archival and retrieval of archived data, data migration from one storage device to another and the sharing of data among different servers in a network. SANs can incorporate subnetworks with network-attached storage (NAS) systems.
Historically, data centers first created “islands” of SCSI disk arrays as direct-attached storage (DAS), each dedicated to an application, and visible as a number of “virtual hard drives” (i.e. LUNs). Essentially, a SAN consolidates such storage islands together using a high-speed network.
Operating systems maintain their own file systems on their own dedicated, non-shared LUNs, as though they were local to themselves. If multiple systems were simply to attempt to share a LUN, these would interfere with each other and quickly corrupt the data. Any planned sharing of data on different computers within a LUN requires advanced solutions, such as SAN file systems or clustered computing.
Despite such issues, SANs help to increase storage capacity utilization, since multiple servers consolidate their private storage space onto the disk arrays.
Common uses of a SAN include provision of transactionally accessed data that require high-speed block-level access to the hard drives such as email servers, databases, and high usage file servers.
SAN and NAS
Network-attached storage (NAS), in contrast to SAN, uses file-based protocols such as NFS or SMB/CIFS where it is clear that the storage is remote, and computers request a portion of an abstract file rather than a disk block.
Despite the differences between SAN and NAS, it is possible to create solutions that include both technologies.
Sharing storage usually simplifies storage administration and adds flexibility since cables and storage devices do not have to be physically moved to shift storage from one server to another.
Other benefits include the ability to allow servers to boot from the SAN itself. This allows for a quick and easy replacement of faulty servers since the SAN can be reconfigured so that a replacement server can use the LUN of the faulty server. While this area of technology is still new many view it as being the future of the enterprise datacenter.
SANs also tend to enable more effective disaster recovery processes. A SAN could span a distant location containing a secondary storage array. This enables storage replication either implemented by disk array controllers, by server software, or by specialized SAN devices. Since IP WANs are often the least costly method of long-distance transport, the Fibre Channel over IP (FCIP) and iSCSI protocols have been developed to allow SAN extension over IP networks. The traditional physical SCSI layer could only support a few meters of distance – not nearly enough to ensure business continuance in a disaster.
Most storage networks use the SCSI protocol for communication between servers and disk drive devices. A mapping layer to other protocols is used to form a network:
- ATA over Ethernet (AoE), mapping of ATA over Ethernet
- Fibre Channel Protocol (FCP), the most prominent one, is a mapping of SCSI over Fibre Channel
- Fibre Channel over Ethernet (FCoE)
- ESCON over Fibre Channel (FICON), used by mainframe computers
- HyperSCSI, mapping of SCSI over Ethernet
- iFCP or SANoIP mapping of FCP over IP
- iSCSI, mapping of SCSI over TCP/IP
- iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand
Storage networks may also be built using SAS and SATA technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from IDE direct-attached storage. SAS and SATA devices can be networked using SAS Expanders.
SANs often utilize a Fibre Channel fabric topology – an infrastructure specially designed to handle storage communications. It provides faster and more reliable access than higher-level protocols used in NAS. A fabric is similar in concept to a network segment in a local area network. A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches.
Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them. These offerings use proprietary protocol elements, and the top-level architectures being promoted are radically different. They often enable mapping Fibre Channel traffic over IP or over SONET/SDH.
One of the early problems with Fibre Channel SANs was that the switches and other hardware from different manufacturers were not compatible. Although the basic storage protocols FCP were always quite standard, some of the higher-level functions did not interoperate well. Similarly, many host operating systems would react badly to other operating systems sharing the same fabric. Many solutions were pushed to the market before standards were finalized and vendors have since innovated around the standards.
SANs in media and entertainment
Video editing workgroups require very high data transfer rates and very low latency. Outside of the enterprise market, this is one area that greatly benefits from SANs.
SANs in Media and Entertainment are often referred to as Serverless SANs due to the nature of the configuration which places the video workflow (ingest, editing, playout) clients directly on the SAN rather than attaching to servers. Control of data flow is managed by a distributed file system such as StorNext by Quantum.
Per-node bandwidth usage control, sometimes referred to as Quality of Service (QoS), is especially important in video workgroups as it ensures fair and prioritized bandwidth usage across the network, if there is insufficient open bandwidth available. Active Storage’s ActiveSAN, Avid Unity and ISIS, Apple’s Xsan and Tiger Technology MetaSAN are specifically designed for video networks and offer this functionality.d]
Storage virtualization is the process of completely abstracting logical storage from physical storage. The physical storage resources are aggregated into storage pools, from which the logical storage is created. It presents to the user a logical space for data storage and transparently handles the process of mapping it to the physical location. This is implemented in modern disk arrays, using vendor proprietary solutions. However, the goal is to virtualize multiple disk arrays from different vendors, scattered over the network, into a single monolithic storage device, which can be managed uniformly.
RAID (redundant array of independent disks, originally redundant array of inexpensive disks) is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called “RAID levels”, depending on what level of redundancy and performance (via parallel communication) is required. In October 1986, the IBM S/38 announced “checksum”. Checksum was an implementation of RAID-5. The implementation was in the operating system and was software only and had a minimum of 10% overhead. The S/38 “scatter loaded” all data for performance. The downside was the loss of any single disk required a total system restore for all disks. Under checksum, when a disk failed, the system halted and was then shutdown. Under maintenance, the bad disk was replaced and then a parity-bit disk recovery was run. The system was restarted using a recovery procedure similar to the one run after a power failure. While difficult, the recovery from a drive failure was much shorter and easier than without checksum.
RAID is an example of storage virtualization and was first defined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of disassociating a low-cost expectation from RAID technology.
RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical drives. The physical drives are said to be “in a RAID”, however the more common, incorrect parlance is to say that they are “in a RAID array”. The array can then be accessed by the operating system as one single drive. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between three key goals: resiliency, performance, and capacity.
A number of standard schemes have evolved which are referred to as levels. There were five RAID levels originally conceived, but many more variations have evolved, notably several nested levels and many non-standard levels (mostly proprietary). RAID levels and their associated data formats are standardised by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard.
Following is a brief textual summary of the most commonly used RAID levels.
- RAID 0 (block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. Hence simple stripe sets are normally referred to as RAID 0. Any drive failure destroys the array, and the likelihood of failure increases with more drives in the array (at a minimum, potential for catastrophic data loss is double that of isolated drives without RAID). A single drive failure destroys the entire array because when data is written to a RAID 0 volume, the data is broken into fragments called blocks. The number of blocks is dictated by the stripe size, which is a configuration parameter of the array. The blocks are written to their respective drives simultaneously on the same sector. This allows smaller sections of the entire chunk of data to be read off each drive in parallel, increasing bandwidth. RAID 0 does not implement error checking, so any error is uncorrectable. More drives in the array means higher bandwidth, but greater risk of data loss.
- In RAID 1 (mirroring without parity or striping), data is written identically to two drives, thereby producing a “mirrored set”; the read request is serviced by either of the two drives containing the requested data, whichever one involves least seek time plus rotational latency. Similarly, a write request updates the strips of both drives. The write performance depends on the slower of the two writes (i.e., the one that involves larger seek time and rotational latency); at least two drives are required to constitute such an array. While more constituent drives may be employed, many implementations deal with a maximum of only two; of course, it might be possible to use such a limited level 1 RAID itself as a constituent of a level 1 RAID, effectively masking the limitation. The array continues to operate as long as at least one drive is functioning. With appropriate operating system support, there can be increased read performance, and only a minimal write performance reduction; implementing RAID 1 with a separate controller for each drive in order to perform simultaneous reads (and writes) is sometimes called “multiplexing” (or “duplexing” when there are only two drives).
- In RAID 10 (mirroring and striping), data is written in stripes across the primary disks and then mirrored to the secondary disks. A typical RAID 10 configuration consists of four drives. Two for striping and two for mirroring. A RAID 10 configuration takes the best concepts of RAID 0 and RAID 1 and combines them to provide better performance along with the reliability of parity without actually having parity as with RAID 5 and RAID 6. RAID 10 is often referred to as RAID 1+0 (mirrored+striped).
- In RAID 3 (byte-level striping with dedicated parity), all disk spindle rotation is synchronized, and data is striped so each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive.
- RAID 4 (block-level striping with dedicated parity) is identical to RAID 5 (see below), but confines all parity data to a single drive. In this setup, files may be distributed between multiple drives. Each drive operates independently, allowing I/O requests to be performed in parallel. However, the use of a dedicated parity drive could create a performance bottleneck; because the parity data must be written to a single, dedicated parity drive for each block of non-parity data, the overall write performance may depend a great deal on the performance of this parity drive.
- RAID 5 (block-level striping with distributed parity) distributes parity along with the data and requires all drives but one to be present to operate; the array is not destroyed by a single drive failure. Upon drive failure, any subsequent reads can be calculated from the distributed parity such that the drive failure is masked from the end user. However, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt. Additionally, there is the potentially disastrous RAID 5 write hole. RAID 5 requires at least three disks.
- RAID 6 (block-level striping with double distributed parity) provides fault tolerance of two drive failures; the array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems. This becomes increasingly important as large-capacity drives lengthen the time needed to recover from the failure of a single drive. Single-parity RAID levels are as vulnerable to data loss as a RAID 0 array until the failed drive is replaced and its data rebuilt; the larger the drive, the longer the rebuild takes. Double parity gives additional time to rebuild the array without the data being at risk if a single additional drive fails before the rebuild is complete. Like RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced and the associated data rebuilt.
RAID Be Implemented
There are three places to implement RAID: software, RAID controllers and storage arrays:
1. Software RAID
RAID implemented on a server by software uses internal drives or external JBOD (just a bunch of disks). The software, usually a logical volume manager, manages all of the mirroring of data or parity calculations.
2. RAID Controller
These are cards that can be added to a server and offload the overhead of RAID from the CPUs. RAID controllers are a better solution for a single server than software RAID because server CPUs spend no processing power calculation parity or managing the mirrored data. RAID controllers use either internal drives or JBOD. A server-based RAID controller can fail and be a single point of failure.
3. Storage Array
A storage array usually consists of two high-performance, redundant RAID controllers and trays of disks. All pieces of the array are redundant and built to withstand the rigors of a production environment with many servers accessing the storage at the same time. They support multiple RAID levels and different drive types and speeds. Storage arrays also usually have snapshots, volume copy and the ability to replicate from one array to another.