Certified Data Mining and Warehousing Professional Implementation


Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy.

An implementation is a realization of a technical specification or algorithm as a program, software component, or other computer system through programming and deployment. Many implementations may exist for a given specification or standard. For example, web browsers contain implementations of World Wide Web Consortium-recommended specifications, and software development tools contain implementations of programming languages.

When planning a data warehouse, there are some initial steps to think through.

  1. Do you really need one?
  2. How long will it take you to implement it? Is there enough time, money, resources, etc
  3. What is the goal? What do you want the data warehouse to do, solve, provide?

After the decision has been made to implement a data warehouse, some initial steps are:

  1. Identify and extract data from all data sources.
  2. clean data 
  3. Set up tables

Problems can occur when the extraction of data takes longer than initially thought, data warehouses can fail even after the implementation has been completed successfully, make sure it is being maintained.

To avoid the failure of the dataware house impementation, Our design should consider the following points:

  1. How many data sources are used to load the warehouse?
  2. How often is the data loaded into the warehouse?
  3. What is the load time window?
  4. What is the tool used for loading data?

Managing the design, development, implementation, and operation of even a single corporate data warehouse can be a difficult and time consuming task. In this article, we present the primary steps to ensure a successful data warehouse development effort. Here, at Horsburgh.com, we have used this approach successfully on our client's data warehouse and data mart development projects.

Problem Definition
The definition and recording of the problem to be solved is one of the most often overlooked step of any development effort. A problem needs to be solved, so the tendency is to jump right in a solve it. For small, negligible cost efforts this is fine. For Data Warehousing design, ignoring this step can lead to disaster. Write down and widely publish the answers to the following questions and all other questions that are appropriate for your specific effort. Remember to keep the questions targeted to DEFINING the problem NOT solving it.

Need of a Data Warehouse
This is an obvious question, but should be taken seriously. For some businesses the answer is an easy yes, but for others, there may be better solutions. It is wise to seek professional advice when answering this question. Having an outside professional examine the question may cost some money up front, but they are far less costly early on in the development.

Specific Problems to solve
Write down the four, five, ten, whatever, number of problems that having a data warehouse will solve. The problems should be clearly stated, be very specific, and have testable criteria for success. Make sure you publicize these problems and get user and management feedback.

Available resources (time, money, and personnel)
Knowing what your actual resources are at the beginning is critical for defining the development path. If your budget is low, consider down-scaling the effort. If time is short, consider using off-the-shelf products extensively. If your personnel resources are thin, consider outsourcing. Being realistic about your actual resources will help you prevent overruns and project disappointments. Promising a full-blown data warehouse when you only have resources for a database, will always doom a project. Also, don't be afraid to tell upper management that the resources are too small for solving the problem. Believe me, they would rather know up front than get a surprise during deployment.

Criteria to measure success
This is an often overlooked step in the problem definition. For every problem stated, you must define a means for determining the success of the solution. If you can't think of a success criteria, then the problem is not defined specifically enough. Stay away from problem statements such as "The data warehouse must hold all our accounting data." Restate the problem in quantifiable terms, like "The data warehouse must handle the current 20GB of accounting data including all metadata and replicated data with an expected 20% growth per year."

Outsource all, some, or none of the development and operation
If you have in-house personnel that are under-utilized or have time to be assigned to the development process, then keeping most of the development in-house makes sense. If not, then you can either hire additional staff or outsource some or most of the development. I recommend that you do not outsource all of the development. You must have some in-house expertise available or at least strong upper management support. Otherwise you may end up with a very nice system that does not solve your problems. Strategic outsourcing makes sense in most medium to large data warehouse projects. The outsourcing contractor can supply the needed expertise and personnel at the various development phases. And when a particular phase is finished, you are not left with a staff member looking for something to do. You will probably find the up-front costs of an outsourcing firm to be higher than hiring in-house personnel. But the long-term savings will be far greater with a professional outsourcing firm than by retaining in-house personnel. Remember to make sure you feel comfortable with the outsourcer's style and abilities. You will working with them very closely. Don't just choose the largest or best-known source. How you and your outsourcer "mesh" is far more important than their list of clients.

Upgrade an existing system, converting from a legacy system, or developing from scratch
Developing a data warehouse from scratch, as strange as it sounds, is by far the easiest. If you are in this situation, count your blessings. If not, upgrading existing databases or converting from one or more legacy systems will be your lot. Fortunately, you will have a long list of "things that don't work right" to begin with. Make sure that you fully understand what systems will still be in place after the migration and how they will be integrated into your intranet. If your budget is low, then consider using middleware and "web-like" products to layer on top of the existing system. With a more moderate budget, you can replace inefficient systems with newer and more powerful ones. Remember that computer hardware, memory, and disk space are cheap. It's the software and operations that are expensive. Powerful hardware can make even today's bloated software work faster. With a higher budget, consider migrating legacy databases to a single powerful commercial database. If you are not sure what the "latest and greatest" data warehousing products are, hire a professional data warehousing consultant. Their fee will be well worth it.
Requirements Analysis
Performing a requirements analysis is critical to the success of any project. Without a clear goal in mind, success is dubious. There are a number of different philosophies about requirements analysis: top down, bottom up, inside out, etc. The method I have found to work the best is as follows:

  1. Clearly state the problem(s) you wish to solve.
  2. Identify all data sources and formats.
  3. Identify the users of the completed system.
  4. Formulate a specific budget - time, money, personnel.
  5. Ask identified users to specifically state what they expect the system to do.
  6. Ask management to specifically state their success criteria
  7. Separate their requirements from their "desirements." Only design to requirements. The enhancement phase is where you address the "desirements."
  8. Group and "bubble-up" requirements.
  9. Generate a prioritized requirements table listing the requirement, where it came from, the success criteria, and priority. Keep this table high-level. A table with a dozen requirements will be much easier to manage than one with hundreds.
  10. Produce a detailed development schedule including hardware, software, personnel, documentation, and reviews. Include outsourcing requirements and long lead-time items.
  11. Get a sign-off of the requirements, resource allocation, and schedule from top management before you go any further.

It's always best to ask very specific questions. Don't worry if their responses change each time you ask the question. It will happen, so plan for it.
Information/Data Modeling
Information and Data modeling, along with the definition of the metadata, is the single most important activity in the design of a data warehouse. If this step is done correctly, success is almost ensured. If not, then areas such as flexibility, scalability, and usability will suffer. Information modeling is best left to the experts. I recommend that you hire a professional information modeling firm or consultant. The consultant should be brought in at the tail end of the requirements analysis and be kept though the design and prototyping and then be called in as necessary during the development and documentation phases. Don't skimp on this phase or you will regret it.
Design & Prototyping
There are many design methodologies. The ones I've used most successfully are 1. Rapid Prototyping (for small to medium projects) and 2. Structured Development (for large or very complex projects).

Rapid Prototyping
The rapid prototyping method is illustrated in figure X. There are 5 keys to a successful rapid prototyping methodology:

  • Assemble a small, very bright team of database programmers, hardware technicians, designers, quality assurance technicians, documentation and decision support specialists, and a single manager.
  • Define and involve a small "focus group" consisting of users (both novice and experienced) and managers (both line and upper). These are the people who will provide the feedback necessary to drive the prototyping cycle. Listen to them.
  • Generate a user's manual and user interface first. You will be amazed at what you will find out by producing a user's manual first!
  • Use tools specifically designed for rapid prototyping. Stay away from C, C++, COBOL, SQL, etc. Instead use the visual development tools included with the database.
  • Remember a prototype is NOT the final application. Prototypes are meant to be copied into production models. Once the prototypes are successful, then begin the development processing using development tools, such as C, C++, Java, SQL, etc.

Structured Development
When a project has more than 10 people involved or when multiple companies are performing the development, a more structured development management approach is required. Note that rapid prototyping can be a subset of the structured development approach. This approach applies a more disciplined approach to the data warehouse development. Documentation requirements are larger, quality control is critical, and the number of reviews increases. While some parts may seem like overkill at the time, they can save a project from problems, especially late in the development cycle. For more information about how the structured development approach works and for detailed technical and management information, you can contact me via e-mail. (I've written a handbook on the topic :-)
Development & Documentation
Once the requirements analysis is well underway, the prototypes are working, and the focus groups are becoming happy, it's time to begin the development. Coordinating hardware and software purchases and upgrades, server and hardware installation, software and database development, documentation guides and manuals, reviews, and testing can become a full-time job. The key to keeping a handle on all of this to maintain a good written schedule that everyone can view and to have periodic "all-hands" reviews. Remember that working with vendors can be a frustrating experience. Hardware incompatibilities, data format incompatibilities, software bugs, late deliveries, etc. are more the norm than the exception. Outsourcing can help, but you must be continually involved to ensure success.
Test & Review
Testing and Reviews take place throughout the development cycle, including prototyping, development, deployment, operations, and enhancements. It never ends. It's wise to place a single individual in charge of testing and reviews. This is not a popular job, but it is critical for developing a system that works and meets each of the requirements. Be sure to empower this person (usually a quality assurance engineer) with the appropriate authority. Also, provide them with an appropriately sized staff. Testing is time consuming, tedious work and preparing for reviews and analyzing results can take much longer than you might think. Fortunately this person can save you from being surprised at budget review time and usually catches most problems before they become too big. If you outsource this task, make sure that you make it clear to the others on the team what the outsourcer's role is and what level of authority they have.

Deployment & Training
OK, the development is complete, quality assurance is satisfied, the documentation is ready, and all the "off-the-shelf" products have arrived. Now it's time to put everything together. This can be a highly disruptive time. Make sure that you have full management support and that they understand the nature and effect of the installation and deployment disruption. Scheduling training sessions concurrently with the installation can be an effective use of time. Don't skimp on the training. Make sure you have training in the budget from the beginning and don't dip into it. The best way to ensure success is to effectively train the users so that they will actually use the system and possibly sing its praises. Also remember that training is ongoing. New employees or employees being moved or promoted will need to be trained. Each time enhancements are added, new training sessions must be scheduled.
Data warehouses usually contain two or more servers. Tasks such as backups, bug fixes, software updates, hardware maintenance and upgrades, media services, account maintenance, security patches, and other similar tasks must be performed regularly. Operation and maintenance of such services requires an operations staff. It is not enough to "let the users take care of it." If you are providing these services in-house then you will need on-site support from either an outsourcing agency or in-house staff. The current trend is to outsource most of these services. More and more companies are outsourcing the entire data warehouse and access it via the Internet or private network. Outsourcing can result in a substantial savings. Just make sure that your provider can supply the services you require and is available when you need them. Also, be sure to discuss security requirements with them before you hire them.
There is always one thing you can count on: "Requirements Creep." The more successful the data warehouse, the faster requirements creep will occur. As your users become more sophisticated they will want more and more capabilities. If you can respond quickly and efficiently, your users will again sing your praises (and upper management will definitely take notice :-). Make sure that you have designed in the ability to add features from the very beginning. Remember to design in scalability and flexibility at all phases of development.

It includes the following topics -

 For Support