Certify and Increase Opportunity.
Govt. Certified Data Mining and Warehousing


Data mining is primarily used today by companies with a strong consumer focus – retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among “internal” factors such as price, product positioning, or staff skills, and “external” factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to “drill down” into summary information to view detail transactional data.

With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.

For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.

WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.

The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.

By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick’s defense and then finds Williams for an open jump shot.

Various applications are as


Since the early 1960s, with the availability of oracles for certain combinatorial games, also called tablebases (e.g. for 3×3-chess) with any beginning configuration, small-board dots-and-boxes, small-board-hex, and certain endgames in chess, dots-and-boxes, and hex; a new area for data mining has been opened. This is the extraction of human-usable strategies from these oracles. Current pattern recognition approaches do not seem to fully acquire the high level of abstraction required to be applied successfully. Instead, extensive experimentation with the tablebases – combined with an intensive study of tablebase-answers to well designed problems, and with knowledge of prior art (i.e. pre-tablebase knowledge) – is used to yield insightful patterns. Berlekamp (in dots-and-boxes, etc.) and John Nunn (in chess endgames) are notable examples of researchers doing this work, though they were not – and are not – involved in tablebase generation.


Data mining in customer relationship management applications can contribute significantly to the bottom line. Rather than randomly contacting a prospect or customer through a call center or sending mail, a company can concentrate its efforts on prospects that are predicted to have a high likelihood of responding to an offer. More sophisticated methods may be used to optimize resources across campaigns so that one may predict to which channel and to which offer an individual is most likely to respond (across all potential offers). Additionally, sophisticated applications could be used to automate mailing. Once the results from data mining (potential prospect/customer and channel/offer) are determined, this “sophisticated application” can either automatically send an e-mail or a regular mail. Finally, in cases where many people will take an action without an offer, “uplift modeling” can be used to determine which people have the greatest increase in response if given an offer. Data clustering can also be used to automatically discover the segments or groups within a customer data set.

Businesses employing data mining may see a return on investment, but also they recognize that the number of predictive models can quickly become very large. Rather than using one model to predict how many customers will churn, a business could build a separate model for each region and customer type. Then, instead of sending an offer to all people that are likely to churn, it may only want to send offers to loyal customers. Finally, the business may want to determine which customers are going to be profitable over a certain window in time, and only send the offers to those that are likely to be profitable. In order to maintain this quantity of models, they need to manage model versions and move on to automated data mining.

Data mining can also be helpful to human resources (HR) departments in identifying the characteristics of their most successful employees. Information obtained – such as universities attended by highly successful employees – can help HR focus recruiting efforts accordingly. Additionally, Strategic Enterprise Management applications help a company translate corporate-level goals, such as profit and margin share targets, into operational decisions, such as production plans and workforce levels.

Another example of data mining, often called the market basket analysis, relates to its use in retail sales. If a clothing store records the purchases of customers, a data mining system could identify those customers who favor silk shirts over cotton ones. Although some explanations of relationships may be difficult, taking advantage of it is easier. The example deals with association rules within transaction-based data. Not all data are transaction based and logical, or inexact rules may also be present within a database.

Market basket analysis has also been used to identify the purchase patterns of the Alpha Consumer. Alpha Consumers are people that play a key role in connecting with the concept behind a product, then adopting that product, and finally validating it for the rest of society. Analyzing the data collected on this type of user has allowed companies to predict future buying trends and forecast supply demands.

Data mining is a highly effective tool in the catalog marketing industry. Catalogers have a rich database of history of their customer transactions for millions of customers dating back a number of years. Data mining tools can identify patterns among customers and help identify the most likely customers to respond to upcoming mailing campaigns.

Data mining for business applications is a component which needs to be integrated into a complex modeling and decision making process. Reactive business intelligence (RBI) advocates a “holistic” approach that integrates data mining, modeling, and interactive visualization into an end-to-end discovery and continuous innovation process powered by human and automated learning.

In the area of decision making, the RBI approach has been used to mine knowledge that is progressively acquired from the decision maker, and then self-tune the decision method accordingly.

An example of data mining related to an integrated-circuit production line is described in the paper “Mining IC Test Data to Optimize VLSI Testing.”In this paper, the application of data mining and decision analysis to the problem of die-level functional testing is described. Experiments mentioned demonstrate the ability to apply a system of mining historical die-test data to create a probabilistic model of patterns of die failure. These patterns are then utilized to decide, in real time, which die to test next and when to stop testing. This system has been shown, based on experiments with historical test data, to have the potential to improve profits on mature IC products.

Science and engineering

In recent years, data mining has been used widely in the areas of science and engineering, such as bioinformatics, genetics, medicine, education and electrical power engineering.

In the study of human genetics, sequence mining helps address the important goal of understanding the mapping relationship between the inter-individual variations in human DNA sequence and the variability in disease susceptibility. In simple terms, it aims to find out how the changes in an individual’s DNA sequence affects the risks of developing common diseases such as cancer, which is of great importance to improving methods of diagnosing, preventing, and treating these diseases. The data mining method that is used to perform this task is known as multifactor dimensionality reduction.

In the area of electrical power engineering, data mining methods have been widely used for condition monitoring of high voltage electrical equipment. The purpose of condition monitoring is to obtain valuable information on, for example, the status of the insulation (or other important safety-related parameters). Data clustering techniques – such as the self-organizing map (SOM), have been applied to vibration monitoring and analysis of transformer on-load tap-changers (OLTCS). Using vibration monitoring, it can be observed that each tap change operation generates a signal that contains information about the condition of the tap changer contacts and the drive mechanisms. Obviously, different tap positions will generate different signals. However, there was considerable variability amongst normal condition signals for exactly the same tap position. SOM has been applied to detect abnormal conditions and to hypothesize about the nature of the abnormalities.

Data mining methods have also been applied to dissolved gas analysis (DGA) in power transformers. DGA, as a diagnostics for power transformers, has been available for many years. Methods such as SOM has been applied to analyze generated data and to determine trends which are not obvious to the standard DGA ratio methods (such as Duval Triangle).

Another example of data mining in science and engineering is found in educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors which reduce their learning, and to understand factors influencing university student retention. A similar example of social application of data mining is its use in expertise finding systems, whereby descriptors of human expertise are extracted, normalized, and classified so as to facilitate the finding of experts, particularly in scientific and technical fields. In this way, data mining can facilitate institutional memory.

Other examples of application of data mining methods are biomedical data facilitated by domain ontologies, mining clinical trial data, and traffic analysis using SOM.

In adverse drug reaction surveillance, the Uppsala Monitoring Centre has, since 1998, used data mining methods to routinely screen for reporting patterns indicative of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.Recently, similar methodology has been developed to mine large collections of electronic health records for temporal patterns associating drug prescriptions to medical diagnoses.

Data mining has been applied software artifacts within the realm of software engineering: Mining Software Repositories.

Human rights

Data mining of government records – particularly records of the justice system (i.e. courts, prisons) – enables the discovery of systemic human rights violations in connection to generation and publication of invalid or fraudulent legal records by various government agencies.

Medical data mining

In 2011, the case of Sorrell v. IMS Health, Inc., decided by the Supreme Court of the United States, ruled that Pharmacies may share information with outside companies. This practice was authorized under the 1st Amendment of the Constitution, protecting the “freedom of speech.”

Spatial data mining

Spatial data mining is the application of data mining methods to spatial data. The end objective of spatial data mining is to find patterns in data with respect to geography. So far, data mining and Geographic Information Systems (GIS) have existed as two separate technologies, each with its own methods, traditions, and approaches to visualization and data analysis. Particularly, most contemporary GIS have only very basic spatial analysis functionality. The immense explosion in geographically referenced data occasioned by developments in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasizes the importance of developing data-driven inductive approaches to geographical analysis and modeling.

Data mining offers great potential benefits for GIS-based applied decision-making. Recently, the task of integrating these two technologies has become of critical importance, especially as various public and private sector organizations possessing huge databases with thematic and geographically referenced data begin to realize the huge potential of the information contained therein. Among those organizations are:

  • offices requiring analysis or dissemination of geo-referenced statistical data
  • public health services searching for explanations of disease clustering
  • environmental agencies assessing the impact of changing land-use patterns on climate change
  • geo-marketing companies doing customer segmentation based on spatial location.


Geospatial data repositories tend to be very large. Moreover, existing GIS datasets are often splintered into feature and attribute components that are conventionally archived in hybrid data management systems. Algorithmic requirements differ substantially for relational (attribute) data management and for topological (feature) data management. Related to this is the range and diversity of geographic data formats, which present unique challenges. The digital geographic data revolution is creating new types of data formats beyond the traditional “vector” and “raster” formats. Geographic data repositories increasingly include ill-structured data, such as imagery and geo-referenced multi-media.

There are several critical research challenges in geographic knowledge discovery and data mining. Miller and Han offer the following list of emerging research topics in the field:

  • Developing and supporting geographic data warehouses (GDW’s): Spatial properties are often reduced to simple aspatial attributes in mainstream data warehouses. Creating an integrated GDW requires solving issues of spatial and temporal data interoperability – including differences in semantics, referencing systems, geometry, accuracy, and position.
  • Better spatio-temporal representations in geographic knowledge discovery: Current geographic knowledge discovery (GKD) methods generally use very simple representations of geographic objects and spatial relationships. Geographic data mining methods should recognize more complex geographic objects (i.e. lines and polygons) and relationships (i.e. non-Euclidean distances, direction, connectivity, and interaction through attributed geographic space such as terrain). Furthermore, the time dimension needs to be more fully integrated into these geographic representations and relationships.
  • Geographic knowledge discovery using diverse data types: GKD methods should be developed that can handle diverse data types beyond the traditional raster and vector models, including imagery and geo-referenced multimedia, as well as dynamic data types (video streams, animation).

Sensor data mining

Wireless sensor networks can be used for facilitating the collection of data for spatial data mining for a variety of applications such as air pollution monitoring. A characteristic of such networks is that nearby sensor nodes monitoring an environmental feature typically register similar values. This kind of data redundancy due to the spatial correlation between sensor observations inspires the techniques for in-network data aggregation and mining. By measuring the spatial correlation between data sampled by different sensors, a wide class of specialized algorithms can be developed to develop more efficient spatial data mining algorithms.

Visual data mining

In the process of turning from analogical into digital, large data sets have been generated, collected, and stored discovering statistical patterns, trends and information which is hidden in data, in order to build predictive patterns. Studies suggest visual data mining is faster and much more intuitive than is traditional data mining.

Music data mining

Data mining techniques, and in particular co-occurrence analysis, has been used to discover relevant similarities among music corpora (radio lists, CD databases) for the purpose of classifying music into genres in a more objective manner.


Data mining has been used to stop terrorist programs under the U.S. government, including the Total Information Awareness (TIA) program, Secure Flight (formerly known as Computer-Assisted Passenger Prescreening System (CAPPS II)), Analysis, Dissemination, Visualization, Insight, Semantic Enhancement (ADVISE), and the Multi-state Anti-Terrorism Information Exchange (MATRIX). These programs have been discontinued due to controversy over whether they violate the 4th Amendment to the United States Constitution, although many programs that were formed under them continue to be funded by different organizations or under different names.

In the context of combating terrorism, two particularly plausible methods of data mining are “pattern mining” and “subject-based data mining”.

Pattern mining

“Pattern mining” is a data mining method that involves finding existing patterns in data. In this context patterns often means association rules. The original motivation for searching association rules came from the desire to analyze supermarket transaction data, that is, to examine customer behavior in terms of the purchased products. For example, an association rule “beer ⇒ potato chips (80%)” states that four out of five customers that bought beer also bought potato chips.

In the context of pattern mining as a tool to identify terrorist activity, the National Research Council provides the following definition: “Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise.”Pattern Mining includes new areas such a Music Information Retrieval (MIR) where patterns seen both in the temporal and non temporal domains are imported to classical knowledge discovery search methods.

Subject-based data mining

“Subject-based data mining” is a data mining method involving the search for associations between individuals in data. In the context of combating terrorism, the National Research Council provides the following definition: “Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum.”

Knowledge grid

Knowledge discovery “On the Grid” generally refers to conducting knowledge discovery in an open environment using grid computing concepts, allowing users to integrate data from various online data sources, as well make use of remote resources, for executing their data mining tasks. The earliest example was the Discovery Net, developed at Imperial College London, which won the “Most Innovative Data-Intensive Application Award” at the ACM SC02 (Supercomputing 2002) conference and exhibition, based on a demonstration of a fully interactive distributed knowledge discovery application for a bioinformatics application. Other examples include work conducted by researchers at the University of Calabria, who developed a Knowledge Grid architecture for distributed knowledge discovery, based on grid computing.

Reliability / Validity

Data mining can be misused, and can also unintentionally produce results which appear significant but which do not actually predict future behavior and cannot be reproduced on a new sample of data.

Data Mining for the following applications:

a) Balanced Scorecard

1. The Balance Scorecard (BSC) is a framework for managing business performance.

2. BSC provides a framework for designing a set of measures for business activities as being the key drivers of the business or Key Perfomance Indicators(KPIs).

3. KPIs are collected from CRM,ERP, Accounting , Personnel, Inventory.

4. BSC provides executives and managers with a method for reporting and analyzing key performance indicators to determine if operational activities are aligned with the company’s overall strategy and vision.

5. The BSC methodology is a management technique for structuring these scorecards and displays financial, internal process, customer and learning/growth data.

6. The balance scorecard takes four perspectives:

  1. Financial: Satisfying the stakeholders in the company – owners, employees, suppliers. For e.g. the objectives of this perspective would be to achieve a certain level of profitability, or growth.
  1. Customer or Market:  Satisfying the customers such that they buy product and services to support the Financial perspective, e.g. increase customer satisfaction, introduce a new product.
  1. Internal Business Processes: Supporting the Financial and Customer perspectives through having appropriate and well operated processes or procedures e.g. the sales process, the product implementation process.
  1. Learning, Innovation and Growth: Supporting the Financial, Customer and Internal Business Process perspectives through having the ability to change, improve and innovate through the acquisition of new knowledge, skills and technology.

7. Example of BSC:

  1. Knowledge-enhanced Predictive Reports(KPRs) can improve business visibility harnessing BSC with predictive modeling and business logic using expert systems.
  2. KPRs can analyze changes in business drivers and co-inference them automatically to detect hidden patterns underneath complex numbers.
  3. KPRs incorporate predictive modeling with rule-based expert systems into report writing and charting systems.
  1. Predictive analytics can be used to detect patterns and trends in business drivers automatically from hidden numbers, and to predict future directions.
  2. Rule-based expert systems can be used to leverage complexity of various business drivers and indicators. Expert systems based on business logic can take this task as an expert, making balanced scorecards  friendlier and easier to understand.
  1. Web-based reporting and charting engines are essential in generating balanced scorecards in a timely real-time fashion so that executives and business users can recognize developing situation in real-time.
  1. Incorporation of predictive analytics and rule-based expert systems into BSC provides a number of advantages:
  1. It will make BSC much easier to comprehend potential problems and successes.
  2. Trends developing can be detected early so that actions can be taken quickly
  3. Complexity in interpreting KPIs is removed in real-time by embedded knowledge of business experts.

b) Fraud Detection

Fraud Detection for Telecommunications Industry

1. The telecommunications industry has expanded dramatically in the last few years with the development of affordable mobile phone technology.

2. With the increasing number of mobile phone users, global mobile phone fraud is also set to rise.

3. There are many different types of telecom fraud and these can occur at various levels.

4. The two most prevalent types are subscription fraud and superimposed or surfing.

5. Subscription fraud occurs when the fraudster obtains a subscription to a service, often with false identity details, with no intention of paying. This is thus at the level of a phone number – all transactions from this number will be fraudulent.

6. Superimposed fraud is the use of a service without having the necessary authority and is usually detected by the appearance of phantom calls on a bill.

7. There are several ways to carry out superimpose fraud, including mobile phone cloning and obtaining calling card authorization details.

8. Superimposed fraud will generally occur at the level of individual calls – the fraudulent calls will be mixed in with the legitimate ones.

9. Subscription fraud will generally be detected at some point through the billing process – although the aim is to detect it well before that, since large costs can quickly be run up.

10. Superimpose fraud can remain undetected for a long time.

11. Telecommunications networks generate vast quantities of data, sometimes on the order of several GBs per day, so that data mining techniques are of particular importance.

12. At a low level, simple rule-based detection systems use rules such as the apparent use of the same phone in two very distant geographical locations in quick succession, calls which appear to overlap in time, and very high value and long calls.

13. At a higher level, statistical summaries of call distributions are compare with thresholds determined either by experts or by application of supervised learning methods to known fraud/non-fraud cases.

c) Clickstream Mining

1. The approach which is used by most of the people for surfing information on Websites is difficult to analyze and understand.

2. Quantitative data can lack information about what a user actually intends to do, while qualitative data tends to be localized and is impractical to gather for large samples.

3. Once a website is made public, the user is in ultimate control of their own navigation, often employing a variety of different strategies for browsing.

4. These strategies also vary over time depending, not only on the user’s goals, but also on factors such as expertise, familiarity with the site, time pressures and perceive cost of information.

5. Given this continually shifting nature of browsing strategies, the question arises how can these strategies be identified in the use made of an existing  Website.

6. One solution is to use the clickstream logs, which contain the address of each page visited, the date and time of the visit and the referring page and are potentially rich source of data on Internet user activity.

7. Clickstream logs can be generate either by software hosted by the client application or directly from the server logs.

8. Collection and Restoration of Clickstream Data:

  1. A common tool for collecting data on the pages visited by Website users is the use of server-side clickstream data.
  1. This identifies the pages delivered by a server in response to a client’s request. However, these clickstream data logs are often large and unwieldy and present an incomplete picture of activity.
  2. For example, server-side logs do not record activities that involve browser caching, network caching, or the navigation of pages that are internal to the site but are held on another server.
  3. Despite these server-side limitations, there are some aspects of user behavior, such as use of the back button or the opening of new/additional windows within the same Website, that can be captured by such techniques such as the Pattern Restore Method(PRM) algorithm.

9. Visualization and Categorization of Clickstream Data:

  1. Once the clickstream data have been processed, a technique for analyzing and categorizing these data into usage patterns is required.
  2. The visualization techniques facilitate this by producing ‘Footstep’ graphs.
  1. These are based on the use of a 2-D x-y plot, where x-axis represents the browsing time between two Web pages and the y-axis the Web page in the users browsing route.
  2. Thus, the distance travelled on the x-axis represents the time the user has spent browsing and a change in the y-axis represents a transition from one Web page to another.

d) Market Segmentation

1. Market Segmentation is a process that segments a market into smaller sub-markets, called segments.

2. Segments are to be homogeneous or have similar attributes.

3. Purchasing patterns and trends can appear prominently in certain segments.

4. Good market segmentation is to create segments where prominent patters can emerge.

5. Market segmentation may be use to analyze the followings:

Market responsiveness analysis:  Useful in direct marketing since market responsiveness of product offerings can be readily available.

Market trend Analysis: Analyzing segment-by-segment changes of sales revenues can reveal market trends. Trending information is vital in preparing for ever-changing markets.

It may use one of the following attributes to generate market segments:

  1. Geographical Regions: Regions, countries, states, zip-codes, countries , etc.
  2. Demographics: gender , age, income, education etc
  3. Psychographics: Life style classification
  4. Sales channels, branches and departments
  5. Sales representatives
  6. Product and service types (or product categories)
  7. Products
  8. Offer types

6. Segmentation provides opportunities  for trend analysis. Trends and patterns embedded in changes of sales revenues can be useful indicators for market shifts. Trend analysis may analyse the following types of segment trend information:

  1. What are the projected sales revenues for the next three months?
  2. Which segments are having the highest growth and which segments are having the highest revenue decline?
  3. Which segments are having the highest growth rates in percentage terms?

7. Sales Trend Analysis:

Timely identification of newly emerging trends is very important to businesses.

Sales patterns of customer segments indicate market trends. Upward and downwards trends in sales signify new market trends. Time-series predictive modeling can be used to identify trends embedded in changes of sales revenues. Understanding of sales trends is important for marketing as well as for customer retention. Typical sales trend analysis includes:

  1. Which customer segments are having highest growth and highest revenue decline ?
  2. Which customer segments are having highest growth rates in percentage terms?

8. Trends may be categorized as:

  1. Short term trends capture rapidly emerging trends
  2. Mid-term trends capture trends developing in between
  3. Long term trends capture trends developing over long periods.

e) Retail Industry

1. The retail industry is a major application area for data mining, since it collects huge amounts of data on sales, customer shopping history, goods transportation, consumption, and service. The quantity of data collected continues to expand rapidly.

2. Retail data mining can help :

  1. identify customer buying behaviors,
  2. discover customer shopping patterns and trends,
  3. improve the quality of customer service,
  4. achieve better customer retention and satisfaction,
  5. enhance goods consumption ratios,
  6. design more effective goods transportation and distribution policies,
  7. reduce the cost of business.

3. Design and construction of data warehouses based on the benefits of data mining:

  1. There can be many ways to design a data warehouse for this industry.
  2. The levels of detail to include may also vary substantially.
  3. The outcome of preliminary data mining exercises can be used to help guide the design and development of data warehouse structures.
  4. This involves deciding which dimensions and levels to include and what preprocessing to performing order to facilitate effective data mining.

4. Multidimensional analysis of sales, customers, products, time, and region:

  1. The retail industry requires timely information regarding customer needs, product sales, trends and fashions, as well as the quality, cost, profit, and service of commodities.
  2. It is therefore important to provide powerful multidimensional analysis and visualization tools, including the construction of sophisticated data cubes according to the needs of data analysis.

5. Analysis of the effectiveness of sales campaigns:

  1. The retail industry conducts sales campaigns using advertisements, coupons, and various kinds of discounts and bonuses to promote products and attract customers.
  2. Careful analysis of the effectiveness of sales campaigns can help improve company profits.
  3. Association analysis may disclose which items are likely to be purchased together with the items on sale, especially in comparison with the sales before or after the campaign.

6. Customer retention—analysis of customer loyalty:

  1. Customer loyalty and purchase trends can be analyzed systematically. Goods purchased at different periods by the same customers can be grouped into sequences.
  2.  Sequential pattern mining can then be used to investigate changes in customer consumption  or loyalty and suggest adjustments on the pricing and variety of goods in order to help retain customers and attract new ones.

7. Product recommendation and cross-referencing of items:

  1. By mining associations from sales records, one may discover that a customer who buys a digital camera is likely to buy another set of items.
  2. Such information can be used to form product recommendations.
  3. Product recommendations can also be advertised on sales receipts, in weekly flyers, or on the Web to help improve customer service, aid customers in selecting items, and increase sales. Also,
  4. information such as “hot items this week” or attractive deals can be displayed together with the associative information in order to promote sales.

f) Telecommunications industry

1. The telecommunication market is rapidly expanding and highly competitive.

2. This creates a great demand for data mining in order to help understand the business involved, identify telecommunication patterns, catch fraudulent activities, make better use of resources, and improve the quality of service.

3. The following are a few scenarios for which data mining may improve telecommunication


4. Multidimensional analysis of telecommunication data:

  1. Telecommunication data are intrinsically multidimensional, with dimensions such as calling-time, duration, location of caller, location of callee, and type of call.
  2.  The multidimensional analysis of such data can be used to identify and compare the data traffic, system workload, resource usage, user group behavior, and profit.
  3.  Therefore, it is often useful to consolidate telecommunication data into large data warehouses and routinely perform multidimensional analysis using OLAP and visualization tools.

5. Fraudulent pattern analysis and the identification of unusual patterns:

  1. It  is important to :

(1) identify potentially fraudulent users and their atypical usage patterns;

(2) detect attempts to gain fraudulent entry to customer accounts; and

(3) discover unusual patterns that may need special attention, such as busy-hour frustrated all attempts, switch and route congestion patterns.

  1. Many of these patterns can be discovered by multidimensional analysis, cluster analysis, and outlier analysis.

6. Multidimensional association and sequential pattern analysis:

  1. It can be used to promote telecommunication services.
  2. For example, suppose you would like to find usage patterns for a set of communication services by customer group, by month, and by time of day.

7. Mobile telecommunication services:

  1. Mobile telecommunication, Web and information services, and mobile computing are becoming increasingly integrated and common in our work and life.
  1. One important feature of mobile telecommunication data is its association with spatiotemporal information. Spatiotemporal data mining may become essential for finding certain patterns.
  2. Data mining will likely play a major role in the design of adaptive solutions enabling users to obtain useful information with relatively few keystrokes.

8. Use of visualization tools in telecommunication data analysis:

  1. Tools for OLAP visualization, linkage visualization, association visualization, clustering, and outlier visualization have been shown to be very useful for telecommunication data analysis.

g) Banking  & Finance

1. Most banks and financial institutions offer a wide variety of banking services (such as checking and savings accounts), credit (such as business, mortgage, and automobile loans), and investment services.

2. Financial data collected in the banking and financial industry are often relatively complete,

reliable, and of high quality, which facilitates systematic data analysis and data mining.

3. Here we present a few typical cases:

 Design and construction of data warehouses for multidimensional data analysis and

data mining:

  1. Data warehouses need to be constructed for banking and financial data.
  2. Multidimensional data analysis methods should be used to analyze the general properties of such data. For example, one may like to view the debt and revenue changes by month, by region.
  3. Data warehouses, data cubes, multi feature and discovery-driven data cubes, characterization and class comparisons, and outlier analysis all play important roles in financial data analysis and mining.

 Loan payment prediction and customer credit policy analysis:

  1. This analysis is critical to the business of a bank.
  2. Data mining methods, such as attribute selection and attribute relevance ranking, may help identify important factors and eliminate irrelevant ones.
  3. For example, factors related to the risk of loan payments include loan-to-value ratio, term of the loan, debt ratio, payment to-income ratio, customer income level, education level, residence region, and credit history. Analysis of the customer payment history may find that, say, payment-to income ratio is a dominant factor, while education level and debt ratio are not.
  4. The bank may then decide to adjust its loan-granting policy so as to grant loans to those customers whose applications were previously denied but whose profiles show relatively low risks according to the critical factor analysis.

 Classification and clustering of customers for targeted marketing:

  1. Classification and clustering methods can be used for customer group identification and targeted marketing.
  2. For example, we can use classification to identify the most crucial factors that may influence a customer’s decision regarding banking.
  3. Customers with similar behaviors regarding loan payments may be identified by multidimensional clustering techniques. These can help identify customer groups, associate a new customer with an appropriate customer group, and facilitate targeted marketing.

 Detection of money laundering and other financial crimes:

  1. To detect money laundering and other financial crimes, it is important to integrate information from multiple databases, as long as they are potentially related to the study.
  2. Multiple data analysis tools can then be used to detect unusual patterns, such as large amounts of cash flow at certain periods, by certain groups of customers.
  3. Useful tools include:
  1. Data visualization tools(to display transaction activities using graphs by time and by groups

of customers),

  1. linkage analysis tools(to identify links among different customers and activities),
  2. classification tools (to filter unrelated attributes and rank the highly

related ones),

  1. clustering tools (to group different cases),
  2. outlier analysis tools (to detect unusual amounts of fund transfers or other activities),
  3. sequential pattern analysis tools (to characterize unusual access sequences).
  1. These tools may identify important relationships and patterns of activities and help investigators focus on suspicious cases for further detailed examination.

h) CRM

  1. Customer Relationship Management(CRM) emerged in the last decade to reflect the central role of the customer for the strategic positioning of a company.
  2. It encompasses all measures for understanding the customers and for exploiting this knowledge to design and implement marketing activities, align production and coordinate the supply chain.
  3. CRM puts emphasis on the coordination of such measures, also implying the integration of customer-related data, meta-data and knowledge and the centralized planning and evaluation of measures to increase customer lifetime value.
  4. CRM gains in importance for companies that serve multiple groups of customers and exploit different interaction channels for them.
  5. CRM is a broadly used term, and covers a wide variety of functions.
  6. These functions include:
  1. marketing automation (e.g. campaign management, cross and up-sell, customer segmentation, customer retention),
  2. sales force automation (e.g. contact management , lead generation , sales analytics, generation of quotes, product configuration) and
  3. contact centre management(e.g. call management , integration of multiple contact channels, problem escalation and resolution, metrics and monitoring , logging interactions and auditing), among others.
  1. Data mining helps marketing professionals improve their understanding of customer behavior.
  2. In turn, this better understanding allows them to target marketing campaigns more accurately and to align campaigns more closely with needs, wants and attitudes of customers and prospects.

Back to Tutorial

Apply for Data Mining and Warehousing Certification Now!!

Get industry recognized certification – Contact us