Certified Data Mining and Warehousing Professional Tools Collection

Tools Collection

Data Warehouse Tools

There are a number of important tools which are connected to data warehouses, and one of these is data aggregation. A data warehouse can be designed to store information based on a certain level of detail.

For example, you can store data based on each transaction, or you can store it based on a summary. These are examples of data aggregation. When data is summarized, the queries will move at a much faster rate. However, some of the information may be lost during a query, and this information may be important for solving a certain problem.

Before you decide which one you will use, it is important to weigh your options carefully. Once you have carried out an operation, you will need to rebuild the warehouse in order to undo it. The best way to handle this situation is to make sure the data warehouse is constructed with a large amount of detail. However, the cost for this can be huge depending on the storage options you choose. Once you have filled your data warehouse with important information, you will want to use this data to help you make smart investment decisions. The tools that can allow you to do this will fall under a topic that is called business intelligence.

Business intelligence is a field which is very diverse. It is comprised of things such as Executive Information Systems, Decision Support Systems, and On-Line Analytical Processing. Business intelligence can further be broken down into a field that is called multi-dimensional analysis tools. These are tools that will allow a user to view data from a wide variety of angles. A query tool will allow a user to send SQL queries within a warehouse to look for results. Data mining is also a field that falls under business intelligence, and will allow you to look for patterns and relationships within a data warehouse.

Another tool that is connected to data warehouses is data visualization. The tools that are used for data visualization will present visual models of data. This data could come in the form of intricate 3D images. The goal of data visualization is to allow the user to view trends in a method which is easier to understand than complicated models that are based off statistics. One tool that is allowing this field to advance is VRML, or Virtual Reality Modeling language. In order for data warehouses to function properly, it is also important to place an emphasis on metadata management. Meta data can be described as being "information about information."

Meta data must be managed when data is acquired or analyzed. Meta data will be held in a repository, and can give you important information about many of the data warehouse tools. The process of properly managing meta data has become a science within itself. If it is done properly, the company can greatly benefit. The reason why it is important is because it can allow organizations to analyze the changes that occur within database tables. This is a tool that plays an important part of the construction of a data warehouse.

Data warehousing is a field which is somewhat complicated. There are many vendors who are attempting to advertise the tools, but the cost and complexity involved with the products has not allowed them to be used by a large number of companies. Any company that is thinking of using data warehouses must make sure they have taken the time to review and understand the technology. It can only be useful if you know how to use it. Once you understand and acquire the technology, it is possible for you to gain a powerful advantage over your competitors. This has made data warehouses attractive to many companies.



Organizations that wish to use data mining tools can purchase mining programs designed for existing software and hardware platforms, which can be integrated into new products and systems as they are brought online, or they can build their own custom mining solution. For instance, feeding the output of a data mining exercise into another computer system, such as a neural network, is quite common and can give the mined data more value. This is because the data mining tool gathers the data, while the second program (e.g., the neural network) makes decisions based on the data collected.

Different types of data mining tools are available in the marketplace, each with their own strengths and weaknesses. Internal auditors need to be aware of the different kinds of data mining tools available and recommend the purchase of a tool that matches the organization's current detective needs. This should be considered as early as possible in the project's lifecycle, perhaps even in the feasibility study.

Most data mining tools can be classified into one of three categories: traditional data mining tools, dashboards, and text-mining tools. Below is a description of each.

  • Traditional Data Mining Tools. Traditional data mining programs help companies establish data patterns and trends by using a number of complex algorithms and techniques. Some of these tools are installed on the desktop to monitor the data and highlight trends and others capture information residing outside a database. The majority are available in both Windows and UNIX versions, although some specialize in one operating system only. In addition, while some may concentrate on one database type, most will be able to handle any data using online analytical processing or a similar technology.
  • Dashboards. Installed in computers to monitor information in a database, dashboards reflect data changes and updates onscreen — often in the form of a chart or table — enabling the user to see how the business is performing. Historical data also can be referenced, enabling the user to see where things have changed (e.g., increase in sales from the same period last year). This functionality makes dashboards easy to use and particularly appealing to managers who wish to have an overview of the company's performance.
  • Text-mining Tools. The third type of data mining tool sometimes is called a text-mining tool because of its ability to mine data from different kinds of text — from Microsoft Word and Acrobat PDF documents to simple text files, for example. These tools scan content and convert the selected data into a format that is compatible with the tool's database, thus providing users with an easy and convenient way of accessing data without the need to open different applications. Scanned content can be unstructured (i.e., information is scattered almost randomly across the document, including e-mails, Internet pages, audio and video data) or structured (i.e., the data's form and purpose is known, such as content found in a database). Capturing these inputs can provide organizations with a wealth of information that can be mined to discover trends, concepts, and attitudes.

Besides these tools, other applications and programs may be used for data mining purposes. For instance, audit interrogation tools can be used to highlight fraud, data anomalies, and patterns. An example of this has been published by the United Kingdom's Treasury office in the 2002–2003 Fraud Report: Anti-fraud Advice and Guidance, which discusses how to discover fraud using an audit interrogation tool. Additional examples of using audit interrogation tools to identify fraud are found in David G. Coderre’s 1999 book, Fraud Detection.

In addition, internal auditors can use spreadsheets to undertake simple data mining exercises or to produce summary tables. Some of the desktop, notebook, and server computers that run operating systems such as Windows, Linux, and Macintosh can be imported directly into Microsoft Excel. Using pivotal tables in the spreadsheet, auditors can review complex data in a simplified format and drill down where necessary to find the underlining assumptions or information.

When evaluating data mining strategies, companies may decide to acquire several tools for specific purposes, rather than purchasing one tool that meets all needs. Although acquiring several tools is not a mainstream approach, a company may choose to do so if, for example, it installs a dashboard to keep managers informed on business matters, a full data-mining suite to capture and build data for its marketing and sales arms, and an interrogation tool so auditors can identify fraud activity.


 For Support