It is easy to dismiss data warehouse security. After all, it is probably more tempting to focus first and foremost on the customer-facing applications in an enterprise. But attackers care about corporate data, making any place it is stored a potential target. Fortunately, many of the basic strategies employed by enterprises to protect their databases can be ported over to the data warehouse. Strong access controls based on roles is a key element of securing enterprise data warehouses.
For those concerned about internal security, it is recommended that those access controls go beyond the BI layer of the reporting tool and down to the data warehouse itself. But any sound security strategy will have to start with identifying the data that will be loaded into the target data warehouse and classifying it according to its sensitivity.
It is important to remember, too, that—as always—security is not a static state but an ongoing process. It does not stop when your data warehouse project is completed. Organizations need to continually audit and monitor security policies and data usage to ensure that both the security and business requirements of the organization are being met. With proper planning and configuration, organizations can strike the right balance between security and the ultimate goal of data warehousing projects—sharing information so that it can be leveraged to provide value to the organization.
rganizations that have a strong security umbrella in their operational mainframe environment are more likely to pay attention to security measures for their data warehouse on a multi-tier platform. Organizations that have a very lax security policy for their operational environment are usually prone to treat security casually for their data warehouse as well. These organizations may unwittingly be exposing themselves to security breaches, especially if the plan is to deliver information from the data warehouse databases over the Web.
The following is an example of a security requirement that may need to be imposed on a data warehouse. Suppose an organization wanted to give its distributors the ability to analyze their orders and shipments via a multi-dimensional BI application. To prevent one distributor from searching through other distributors’ data, there would have to be a mechanism for restricting the order and shipment information to only those pertaining to that particular distributor. In other words, some security lock is required to prevent access to the sales records of the distributor’s competitors. This is not as straightforward as it sounds, because:
There are no off-the-shelf umbrella security solutions to impose this kind of security. This security requirement would have to be implemented through the various security features of the RDBMS and of the BI tools used by the BI application.
The solution of imposing security at a table level may not be granular enough. Although, one possible way to achieve this type of security is to partition the tables either physically or logically (through VIEWs). Partitioning will restrict access solely to the appropriate distributor as long as both the fact tables and the dimension tables are partitioned. Therefore, this method could become too cumbersome.
An alternative may be to enhance the meta data with definitions of data parameters, which could control access to the data. This form of security would be implemented with appropriate program logic to tell the meta data repository “who the distributor is,” allowing the application to return the appropriate data for that distributor only. This type of security measure will only be as good as the program controlling it.
This example illustrates that the required security measures must be well thought through, and that the security features of the RDBMS and of the BI tools must be well understood and cross-tested. Complete reliance on one comprehensive security package that has the capability to implement any and all types of security measures is not a security solution, because such a security package does not exist.
If you are installing purchased security packages – as you should, be sure to minimize the number of security packages you implement because one of two things may happen:
Business people will be logging in through multiple security packages, using multiple logon IDs, and multiple passwords that expire at different times. They will get frustrated very quickly if they have to go through different logon procedures and remember different IDs and passwords for each procedure. Complaints will run high.
Business people will stop using the data warehouse entirely because it is too cumbersome. You don’t want this to happen either.
A number of organizations are avoiding this problem by adopting a single-sign-on scheme, which keeps the frustration level to a minimum but still allows tracking any security breaches, albeit in a less sophisticated way.
Centralized vs. Decentralized Security
The goal of centralized security is “one entry point — one guard.” It is much easier to guard a single door than multiple doors. In a centralized environment, all security measures can be implemented in one location because all the data is in one place. However, keeping all the data in one central place is not always feasible, or desirable.
If data needs to be stored in a distributed fashion, implementing security measures becomes much more complicated. The steps involved are:
Identify the endpoints in your network architecture and the paths connecting the endpoints.
Determine the connectivity paths (from the entry points) to get to the data. Link and label the connectivity paths.
Compare the paths with the existing security you have in place. You may already have some security packages installed, and some of them may be sufficient to guard a subset of the data. It may be useful to draw a matrix for security gap analysis purposes.
The security gap analysis matrix will help to identify where security is still needed and what type of security is needed. Keep in mind that:
Password security may be the least expensive to implement, but it can be easily violated.
RDBMS security is the most important component of the security solution and should override all other security measures that may contradict the authority granted to the data in the RDBMS.
Encryption is not that prevalent in data warehouses because of the complicated encryption and decryption algorithms. Encryption and decryption processes also degrade performance considerably. However, with the frequent use of the Internet as an access and delivery mechanism, encryption should be seriously considered to protect the organization from costly security breaches.
Security for Internet Access
The Internet enables distribution of information worldwide, and the data warehouse provides easy access to organizational data. Combining these two capabilities appears to be a giant leap forward for engaging in e-commerce. However, consider the implications of combining these technologies carefully before you decide to take the risk of potentially exposing sensitive organizational data.
Many product vendors are enabling Web access to databases in general and some vendors to data warehouse databases in particular. This complicates the concern for:
Security of the data warehouse in general.
Security issues associated with allowing Web access to the organization’s data.
If you opt to display the data over the Internet, spend extra time and money on authorization and authentication of internal staff and external customers. And if you are transmitting sensitive data to and from external customers, consider investing in encryption and decryption software.
Authentication — is the process of identifying a person, usually based on a logon ID and password. This process is meant to ensure that the person is who he or she claims to be.
Authorization — is the process of granting or denying a person access to a resource, such as an application or a Web page. In security software, authentication is distinct from authorization; and most security packages implement a two-step authentication and authorization process.
Encryption — is the “translation” of data into a secret code. It is the most effective way to achieve data security. To read an encrypted file, you must have access to a secret key or password that enables you to decrypt it.
The bottom line on security is that you need to define your security requirements early in order to have time to consider and weigh all factors.
Data warehousing poses its own set of challenges for security: enterprise data warehouses are often very large systems, serving many user communities with varying security needs, and while data warehouses require a flexible and powerful
security infrastructure, the security capabilities must seamlessly operate in an environment which has stringent performance and scalability requirements. Security must be built into the core of a data warehouse.
WHY IS SECURITY IMPORTANT FOR A DATA WAREHOUSE?
Many of the basic requirements for security are well-known, and apply equally to a data warehouse as to any other system: The application must prevent unauthorized users from accessing or modifying data, the applications and underlying data must
not be susceptible to data-theft by hackers, the data must be available to the right users at the right time, and the system must keep a record of activities performed by its users. These requirements are perhaps even more important in a data
warehouse because by definition a data warehouse contains data consolidated from multiple sources, and thus from the perspective of a malicious individual trying to steal information a data warehouse can be one of the most lucrative targets in an enterprise.
However, beyond these fundamental and obligatory requirements, there are additional scenarios in which a robust security infrastructure can vastly improve the effectiveness or reduce the costs of a data warehouse environment.
Some typical customer scenarios for data warehousing security include:
• A company is managing an enterprise data warehouse that will be widely used by many divisions and subsidiaries. That company needs a security infrastructure that ensures that the employees of each division to only be able to view only the data that is relevant to their own division, while also allowing for employees in its corporate offices to view the overall picture.
• A company's data warehouses stores personal information. Privacy laws may govern the use of such personal information. The adherence to these privacy laws must be implemented in the data warehouse.
• A company sells data from a data warehouse to its clients. Those clients may only view the data to which they have purchased or subscribed; they Security and the Data Warehouse should never be permitted to see the data of other clients (especially since those other clients may be competitors).
STRATEGY FOR DATA WAREHOUSE SECURITY
How should the data warehouse team approach security? There are several key considerations when implementing a data warehouse.
The first consideration, which is hardly surprising, is that the data warehouse team must consider end-to-end security. A data warehouse environment consists of much more than just a database. The enti re environment ranges from the extraction
of data from operational system, transportation of this data to the data warehouse, the possible distribution of this data to data marts and other analytic servers, and finally the dissemination of this data to end-users. The environment spans multiple servers and multiple software products ... and of course every component needs to be secure.
There are undoubtedly many data warehouses today in which the database itself has little risk of a security breech, but at the same time the flat files which are used to populate the data warehouse are stored in an unsecured location. This is an
example of the security loopholes that can emerge when the entire data-warehouse process has not been designed with security in mind. The second consideration is related to the interaction of security and the data warehouse architecture. A consolidated data warehouse is much simpler to secure than dozens of heterogeneous data mart s. Indeed, many industry analysts and customers agree that an enterprise data warehouse is the preferred implementation model, and among that model's many virtues is the fact that a centralized data warehouse's security is simpler and less expensive to manage, while providing higher levels of security.
The final consideration is the recognition the core of a data warehouse is the data. Although end-to-end security is crucial, the ability to provide a flexible multi-layer security model on the data in the data warehouse is nevertheless the primary
requirement for data warehouse security.
DATA WAREHOUSE SECURITY
Among the best ways to mitigate security risk is to provide multiple layers of security mechanisms, so that failure of a single mechanism does not result in compromise of critical information.
A data warehouse must ensure that sensitive data does not fall into the wrong hands, and this is especially important when the data is consolidated into one large data warehouse. Upon creating a database user and granting him or her the rights to
connect to the data warehouse, the administrator who manages the data warehouse must control access to data, and they often must limit a particular user’s access to the level of individual records in a databa se table based on the identity and privilege of that user.
Organizations struggle with the implementation of this type of strict, granular access control by building application code in each of the front-end applications. Maintaining this type of complex access control code is not only costly, but also
risk-prone. If access control is built into an application, but a user has access to the database itself, which is common in warehouse environments—through SQL*Plus or a reporting tool—then the application logic and access control can by bypassed.
For these principal reasons, organizations that understand this hard-to-solve problem build security as a whole, and access control in particular, onto the data itself, inside the data warehouse.
Role-based Access Control
Database privileges and roles ensure that a user can only perform an operation on a database object if that he or she has been authorized to perform that operation. A privilege is an authorization to perform a particular operation; without explicitly granted privileges, a user cannot access any information in the database. System privileges authorize a user to perform a specific operation, such as the CREATE TABLE privilege, which allows a user to create a database table. Object
privileges authorize a user to perform a specific operation on a particular object. An example of object privileges is SELECT ON SALES_HISTORY, to allow a particular user to query to the fact table, but not query other database objects, nor
modify any of them. Granting a user SELECT, UPDATE, INSERT ON EMEA_SALES allow a user to read and write to this view.
By providing these types of privileges, the database enables you to ensure that database users are only authorized to perform those specific operations required by their job functions. In addition, other features, such as roles and stored
procedures, not only allow you to control which privileges a user has, but under what conditions he can use those privileges.
While privileges let you restrict the types of operations a user can perform, managing these privileges may be complex. To address the complexity of privilege management, database roles encapsulate one or more privileges that can be granted to and revoked from users. For example, you can create the PROMO_ANALYST role, grant it all privileges necessary for marketing promotion analysts to perform their jobs, and then simply grant this single role to all marketing analysts. In addition, you can create the PROMO_MANAGER role, grant it the PROMO_ANALYST role and any other necessa ry privileges, and then grant it to
all marketing managers. To later grant or revoke an additional privilege to all of these users, you need only grant or revoke that privilege to the PROMO_ANALYST role.
Roles and privileges, or Role-based Access Control (RBAC), enforce security on the data itself, and their use is essentia l to a data warehouse because users access data via a number of applications and tools. Roles enforce object-level security,
which can be enhanced with the enforcement of security at the level of individual rows within a database object.
Virtual Private Database
Virtual Private Database (VPD) is server-enforced, fine-grained access control, together with secure application context. By dynamically appending SQL statements with a predicate, VPD limits access to data at the row level and ties a security policy to the database object itself (specifically, a table, view, or synonym). It enables multiple users to have secure direct access to mission-critical data within a single database server, with the assurance of complete data separation. Virtual Private Database can ensure that banking customers see only their own account history, and that a company serving multiple companies’ data (who may be competitors to one another) can do so from the same data warehouse, and allow each company to see only its own data.
Virtual Private Database is application transparent. Security is enforced at the database layer and takes into account app lication-specific logic used to limit data access within the database. Both commercial off-the-shelf applications and custom-built applications can take advantage of its granular access control, without the need to change a single line of application code. Within an enterprise, the Virtual Private Database results in lower cost of ownership in deploying applications. Security can be built once, in the warehouse, rather than in every application that accesses data. Security is stronger, because it is enforced by the database, no matter how a user accesses data. Security cannot be bypassed by a user accessing data via an ad hoc query tool or new report writer. In an enterprise data warehouse, which often supports dozens of different applications as well as many end-user tools, Virtual Private Database is key enabling technology.
ADDITIONAL PROTECTION WITH ENCRYPTION
Data traveling over a network is not protected by the multiple layers of security that safeguard it inside the RDBMS. To th is end, database supports encryption of network traffic. Industry-standard algorithms, such as DES,
Triple-DES and AES, protect network traffic, both between clients and servers and between database servers.
Among other database security measures, protect data stored in warehouses via encryption within the database. Although encryption should never be used as a substitute for effective access control, one can obtain an additional
measure of security by selectively encrypting sensitive information, such as credit card numbers, before it is stored in th e database. The Database supports data encryption using industry-standard algorithms:
• DES encryption: The toolkit includes procedures to encrypt and decrypt using the Data Encryption Standard (DES) algorithm.
• Triple-DES encryption: The toolkit su pports Triple DES (3DES) in two- and three-key modes.
• MD5 cryptographic hash: This one-way cryptographic hash algorithm ensures data integrity.
It also provides a random number generator, because random numbers are required to generate secure encryption keys.
Encrypting stored data can provide the a ssurance of securing data regardless of access method—that is, even if a person accesses the operating system files where other mechanisms for database-enforced security cannot protect it. However,
encrypting data inside a data warehouse can be complex, and it usually adds overhead to the system. Additionally, the encryption keys must be stored somewhere—in an application, in a file, or in a table—and managing keys is widely
recognized as a very difficult security problem. Many risks that appear to be managed with stored data encryption are better solved with the combination of other proven security solutions, such as strong user authentication, network
encryption, granular access controls, and auditing.
Data warehouses hold massive amounts of financial information, company secrets, medical diagnoses, credit card numbers, and other personal information. Because the data warehouse is a hotbed of critical information, it makes a lucrative target for
legitimate users who need data access to do their jobs and for malicious users who desire access to its valuable data. Developers and administrators who build and manage warehouses need to maintain a record of system activity, both to be able to roll back where an error has occurred and to ensure that users are held accountable for their actions. Auditing selected sensitive information and user actions helps to keep users accountable and data protected. Further, auditing helps deter
unauthorized user behavior that may not otherwise be prevented.
Auditing the Data Warehouse
The standard audit facility in databse allows the auditing of database activity by statement, by use of system privilege, by object, or by user. For example, one can audit activity as general as all user connections to the database, and as specific as a user updating a table. One can also audit only successful operations, or unsuccessful operations. Auditing unsuccessful select statements may catch users testing their access boundaries or snooping for data they are not privileged to see.
Performance cannot be sacrificed in a data warehouse, and, as such, SQL statements are parsed once for both execution and auditing, not separately. The granularity and scope of audit options allow you to record and monitor specific
database activity without incurring the performance overhead that more general auditing entails. And, by setting just the options of interest, you can avoid catch-all audit methods which intercept and log all statements, and then filter them to
retrieve the ones of interest. Because queries against a data warehouse generally take longer to process than queries in OLTP systems, any performance impact of auditing is negligible. On the other hand, warehouses contain massive amounts of
data, so it is crucial to narrow the scope of auditing to the most important data.