Certified Data Mining and Warehousing Professional Web enabled OLAP

Web enabled OLAP

Web OLAP (WOLAP) is the operation of an OLAP tool from a Web browser; a solution that provides a merging of two very dynamic technologies: the World Wide Web and OLAP tools. A great deal of the growth and interest in data warehousing is due in part to the increased richness, power and flexibility of the current set of OLAP tools. And the explosion of the World Wide Web is due partly to the ease-of-use, fast development and zero deployment effort provided by the Hyper Text Markup Language (HTML) Web browsers. It is just natural that these two technologies would merge.

But technologies should not exist just for technologies' sake. Highway 101 in California is littered with the remnants of companies that provided great technology for which there was no demand. A technology must solve a business problem. So the question becomes - what business problem does Web OLAP solve? The answer lies in the shortcomings of traditional client-server.

Traditional Client-Server

By providing a flexible, visually appealing GUI, traditional client-server has helped spur the growth of the personal computer as an application platform. Character-based applications simply cannot compete with the flexibility and user-friendliness of a well designed GUI application.

But the growth of the rich GUI and client-server has come with a price. A strong emphasis on user-friendliness combined with a software business model that needs to sell updates for increased functionality has led to an increasingly large client footprint (the size of the application). Along with this trend, the complexity of the installation and configuration process has grown dramatically. The process of deploying a large, complex traditional client-server application to 5 machines may be manageable, but deploying such an application to 50 or 100 machines could be a painful process. And unfortunately that is just what is required in a typical OLAP tool deployment effort.

This problem of installation, configuration and deployment is handily managed with Web-based applications. A Web based application requires no deployment on the client machine. All that is required is a Web browser and a network connection to the intranet or Internet. This is a deployment strategy that should be especially appealing to warehouse administrators whose user community could be a far-flung remote sales force or a set of remote district offices where deployment of traditional client-server applications would be especially tricky.

But before jumping into the world of Web OLAP, the warehouse architect or administrator should understand the implications of using a Web solution for OLAP. As always, a discussion of a solution should begin with a discussion of the architecture.

Architecture of Web-OLAP

The most common method for delivering Web applications is a client browser communicating with a HTTP server which delivers HTML (Hyper Text Markup Language) pages, the pages used to display the contents of the World Wide Web. In almost all instances, the client and the server programs run on separate machines connected via a network.

Most Web OLAP applications use a variation of this architecture which provides a middleware application that resides on a server. (Middleware is a application that operates between a client application and some other application such as a database server.) This middleware component may connect directly to the Web browser or may communicate with the HTTP server which would then return data or HTML pages with additional data supplied by the middleware to the browser.

This Web-OLAP middleware performs a number of functions that HTML pages alone cannot provide. It communicates with the database where the warehouse data resides. It maintains state (remembers previous data base transactions) and potentially performs calculations on and buffers the data that is to be returned to the client.

The database is the final component of this architecture. The Web browser does not communicate directly with the database; it is the responsibility of the middleware component. Security restrictions on the connections the client machine can make, as well as the limitations of HTML, make this the optimal solution for retrieving data from a relational database.

This architecture is also referred to as a three-tiered architecture. The traditional client-server architecture is considered a two-tiered architecture and contains two components: the client and the server. The Web architecture however contains three components: the client (browser), the middleware and the database server.

Installing, Configuring and Using Web OLAP

Installation and operation of Web applications requires little or no work on client machines. All that is required is a browser that in some cases must be Java-enabled (able to run Java language applets). Given the ubiquitous nature of Web browsers today, this is not a stringent requirement.

Installation on the server is more involved. Not only must a HTTP server be installed and configured if one is not already present, but a middleware component, a separate application component, may be required. In most cases this middleware will reside on the same server as the HTTP server, but this is not a requirement. This middleware component must also be able to communicate with the database that stores the data warehouse data. User permissions and network access must be provided so that the middleware can retrieve the data required by the OLAP application.

Tuning/Scaling Issues

Middleware represents a potential bottleneck in the Web OLAP architecture. If configured correctly, the middleware should improve performance rather than hinder it. But if not configured correctly, users may wait for query processing while a saturated middleware program instance struggles to do its work.

Any evaluation of a Web OLAP solution with middleware should consider the scalability of the middleware. Generally, scaling of middleware will allow the instantiation or spawning of multiple instances of the middleware program, thus providing more processing power for the OLAP users. (Note that this implies that there is more server CPU available to provide this processing power, so server CPU resource is a consideration with a Web OLAP solution.)

Advantages of Middleware with OLAP

There are some advantages to using middleware to perform portions of OLAP processing. By placing middleware between the OLAP front-end and the database, processing is effectively being done closer to the data. This approach is distinctly different than other types of OLAP where large amounts of data are moved to the client machine to create a cube (Desktop OLAP or DOLAP). Using middleware or some other approach that performs processing close to the data reduces the overhead of moving the data and thus has the potential to improve performance.

This approach also contrasts with a desktop OLAP approach in preventing the proliferation of warehouse data. By keeping data in a central database visible to all users through the OLAP tool, there is no risk of OLAP data on the user's desktop being out-of-date; data the users will see will always be current with what is in the central data warehouse.

Available Solutions

A number of viable Web OLAP solutions are available today, including pure Java solutions, HTML solutions and solutions that are a hybrid of the two. First, it would be useful to define the types of solutions that are available today. They are as follows.

1. HTML solutions
2. HTML w/extensions - CGI, Middleware
3. HTML w/Java applets
4. Java applets - middleware

HTML solutions

These Web OLAP solutions use only HTML to deliver some OLAP functionality to the Web browser. One example of this type of solution is an OLAP tool that allows the user to execute some specific OLAP queries or reports from a browser; no other functionality would be available.

Most HTML-only solutions offer a limited amount of functionality. HTML cannot specify a precise X,Y coordinate where a report element will appear, so there is a possibility that reports will appear differently depending on the browser or platform where the HTML is rendered. And the full set of OLAP functionality (pivots, drill-up, drill-down) that users have come to expect will probably not be available.

HTML w/extensions - CGI, Middleware

One weakness of HTML is that it cannot retain state. Each time an HTML page is retrieved, a connection is made using the TCP/IP ftp facility, the HTML page is retrieved and then the connection is closed. This is a useful approach to serving pages for the World Wide Web, where thousands of users may visit a site in a given day and maintaining a connection for those users would be prohibitive. But for a an application that needs to return 300 rows 50 rows at a time, this connection-less approach to retrieving data is problematic. After retrieving 50 rows and closing the connection, database state will be lost and the application will not know how to retrieve the next 50 rows.

One solution to this HTML shortcoming has been to use CGI (common gateway interface) applications. These CGI programs reside on the server and provide some means of retaining state. Additionally, they can provide the capability to buffer rows to be returned to the client and potentially perform calculations on the data being returned. Other solutions referred to as middleware are similar to CGI and provide the same functionality. These applications usually sit between the HTTP server and a database server and perform data retrieval and buffering services for the client application.

But using pure HTML to provide the interface still has its limitations. HTML is designed to be completely platform independent; it does not specify particularly how a display element will be displayed, but instead indicates what properties should be used to display an element (for instance bold, level 1 heading, etc.). For this reason, particular 'X,Y' coordinates are not allowed for display elements. This makes consistent display of graphs and reports somewhat limited with HTML, since a report element could be displayed in the correct position on one display device, but on another display the HTML browser could choose to display the report element in a different and potentially incorrect position. This limitation of HTML is generally addressed with Java applets extensions that provide more direct control over the display.

HTML w/Java applets

Using HTML intermixed with Java applets provides an excellent solution for Web OLAP. With this solution, HTML can be used to display menus and execute simple application interface facilities. Java applets can be used to provide more complex application interface components and to display charts, graphs and tables in a consistent manner.

This addresses approach. The OLAP reports output can be tightly controlled providing consistent output over multiple display devices.

These solutions can provide more OLAP functionality than pure HTML solutions. They generally provide the ability to drill-up and drill-down within a dimensional hierarchy as well as the ability to pivot data results. And since Java provides significantly more graphic capabilities than HTML, these applications offer the ability to graph data and interactively alter the graph.

Java applets with Middleware

Using a 100 % pure Java solution provides the highest level of functionality and application control. Java applets provide control over the positioning of elements on the display, thus allowing the addressing x,y coordinates and alleviating the problem of reports and graphs displaying differently on multiple display devices. Additionally, Java applets are more inclined to make use of middleware, an application that resides between the applet and the database engine where the OLAP data would be stored. This middleware can buffer the data and perform calculations and aggregations before returning the data to the client, thereby performing these actions on a server potentially improving the performance of the Web OLAP application as a whole.

This solution provides the most diverse set of OLAP functions. Drill-up and drill-down and data pivot capabilities are provided in addition to strong chart display capabilities.

 For Support