Certified Business Intelligence Professional Reporting and Querying

Reporting and Querying
 


Reporting refer to any activity that leads to reports.

Business Reporting or Enterprise Reporting is a fundamental part of the larger movement towards improved business intelligence and knowledge management. Often implementation involves extract, transform, and load (ETL) procedures in coordination with a data warehouse and then using one or more reporting tools. While reports can be distributed in print form or via email, they are typically accessed via a corporate intranet.

With the dramatic expansion of information technology, and the desire for increased competitiveness in corporations, there has been an increase in the use of computing power to produce unified reports which join different views of the enterprise in one place. This reporting process involves querying data sources with different logical models to produce a human readable report—for example, a computer user has to query the Human Resources databases and the Capital Improvements databases to show how efficiently space is being used across an entire corporation.

Enterprise reporting (or management reporting) as the regular provision of information to decision-makers within an organisation to support them in their work. These reports can take the form of graphs, text and tables and, typically, are disseminated through an intranet as a set of regularly updated web pages (or "enterprise portal"). Alternatively, they may be emailed directly to users or simply printed out and handed around, in the time-honoured fashion.


Types of Enterprise Reports

  • Metric Management - In many organisation, business performance is managed through outcome-oriented metrics. For external groups, these are Service Level Agreements (SLAs). For internal management, they are Key Performance Indicators (KPIs). Typically, there are agreed targets to be tracked against over a period of time. They may be used as part of other management strategies such as Six Sigma or Total Quality Management (TQM).
  • Dashboards - A popular idea is to present a range of different indicators on the one page, like a dashboard in a car. Typically, vendors will sell you "canned reports" (pre-defined reports with static elements and fixed structure). However, this approach should allow users to customise their dashboard view, and set targets for various metrics. It's common to have traffic-lights defined for performance (red, orange, green) to draw management attention to particular areas.
  • Balanced Scorecards - A method developed by Kaplan and Norton that attempts to present an integrated view of success in an organisation. In addition to financial performance, they also include customer, business process and learning and growth perspectives. (You should read about this if you're not sure what kinds of things to report on.)
  • Ad Hoc Analyses - Typically undertaken once to deal with a specific initiative, and then never revisited. They often involve building a model in a spreadsheet to allow exploration of "what-if" scenarios. Alternatively, they may take the form of a written brief or one-off report for management.
  • Interactive Querying - Best exemplified by OLAP, this refers to specific technology that allows an analyst (or savvy manager) to manipulate directly the presentation of data. The analyst can select dimensions (eg. time, location, department, employee etc) and "drill-down" (expand) and "roll-up" (collapse) the data.
  • Data Mining (and Advanced Statistics) - Here, techniques such as neural networks and machine learning are used to discover novel, interesting and useful patterns in the data. This is best suited for analyses such as classification, segmentation, clustering and prediction.


Components of Reporting System
The enterprise reporting components are described below:

  • Instrumentation - A device that measures some aspect of the real-world as events and records them.
  • Examples: Cash register, web server, handheld GPS, thermometer, card reader.
  • Data Supply - A system that takes recorded events and delivers them reliably to another system. The data supply can be "push" or "pull", depending on whether or not it is responsible for initiating delivery. It can also be "polled" (or batched) if the data are transferred periodically, or "triggered" (or online) if data are transferred in case of a specific event.
  • Examples: Logfile FTP script, SQL process, EDI, web service.
  • ETL - Extract, Transform and Load. The step where these recorded events are checked for quality, put into the appropriate format and inserted into the data store.
  • Examples: Most datawarehouse and Enterprise Application Integration (EAI) vendors sell this as part of their suite.
  • Data Store - The repository for the data and metadata. Could be a flat file or spreadsheet, but usually a relational database management system (RDBMS) setup as a datamart, datawarehouse, operational datastore (ODS), sometimes employing cubes (OLAP).
  • Examples: MySQL, MS SQL, Oracle, Lotus Notes.
  • Business Logic - The explicit steps for how the recorded events are to be converted into metrics, often implemented in a script (eg Perl) or query (eg. SQL).
  • Examples: Minute-by-minute temperature readings yield the "monthly average daily maximum" by adding and dividing in the correct sequence.
  • Publication - The system that builds the various reports and hosts them (for users) or disseminates them (to users). Users may also require notification, annotation, collaboration and other services.
  • Examples: PHP, Crystal Decisions, Lotus Domino.
  • Assurance - Any enterprise reporting system must offer a quality service to its userbase. This includes determining if and when the right information is delivered to the right people in the right way.
  • Examples: Service monitoring and alarming, user surveys, audits, focus groups, change requests and fault management.

Interfaces
Note that usually most of these systems are already in place (in some form or other) and controlled by other parts of the organisation. For example, Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) could be source systems responsible for instrumentation, data supply and ETL. Also, the data store is likely used for transaction processing too by Finance, Sales and Marketing and HR. Similarly, whoever is responsible for IT Governance may also take a strong interest in the assurance aspects of enterprise reporting.

The extent to which these established components are a help or a hinderance will be a key determinant in the success or otherwise of your project.

Effective Reporting

Regulatory reporting requires companies to consolidate, calculate, and present their financial data in the most timely and accurate manner possible.  But, with revenue and expense data residing in various disparate enterprise sources such as accounting applications, order entry systems, third-party payroll databases, and budgeting and forecasting solutions – how can an organization efficiently and effectively comply with these complex regulations?

Know the Guidelines
In many organizations, few stakeholders have full knowledge of all the guidelines and rules that govern the business.  Additionally, legislation changes frequently, and the various federal, state, and local agencies are issuing new and revised laws all the time.  Therefore, it is critical that you identify those who are most accountable for overseeing compliance with regulatory reporting guidelines, and put the appropriate measures in place to ensure that they are well-informed about existing rules, and keep on top of any new or revised ones.

It is also important to note that some regulatory bodies require companies to designate a senior representative to certify report contents, holding them personally responsible for the information’s accuracy and integrity.  Choose this person wisely.

Assess Your Risk
Are your global operations, and such factors as accounting systems in multiple languages or currency conversions, hindering your ability to generate fast and consistent financial reports?  Are consolidations and calculations manual, increasing the number of errors and negatively impacting data integrity?  Or, do you lack an effective audit trail, so you can track how financial data has been entered, altered, or accessed?

Questions like these are critical to understanding where exposure for non-compliance exists, so you can take swift corrective action.  Have a third-party “expert”, such as a consultant, come in and conduct an unbiased assessment of your current reporting processes and systems, and make recommendations for new procedures and tools to help ensure effective compliance.

Leverage Your Master Data and Metadata
Master data and metadata have both proven to be effective ways to enable improved regulatory reporting.  For example, a master data management strategy allows companies to develop enterprise definitions, such as what “open” account status means in the case of a mortgage, a checking account, a line of credit, etc.  This gives financial managers access to a common view of accurate and timely financial information from systems across the enterprise.

Additionally, metadata, particularly that generated by business intelligence systems, can help ensure the appropriate storage, retention, and disposal of mission-critical financial information – including that contained in unstructured formats such as emails and images.  It also provides a virtual “audit trail” of financial information, such as what systems key accounting data resides in, how is it aggregated from various databases, or how expense, revenue, and profit numbers are calculated.

Corporate Governance is Key
Simply creating a report, and refreshing the data on a regular basis is not enough to guarantee ongoing compliance.  If the data you are plugging into your report templates in inconsistent or inaccurate, then your reports will not meet the needed standards, and severe sanctions could result.  Since financial data exists in many disparate systems across an organization, strict policies and procedures must be developed, documented, and enforced – and a governing body must be assigned to oversee them – in order to maintain the integrity of all relevant information across the enterprise.


Typical data warehousing queries
Many kinds of commonly asked business questions can be readily expressed as SQL queries. For example, anyone familiar with SQL can write a query that returns the quarterly sales of a given product in a given year.

However, many other commonly asked questions cannot be expressed so easily. Questions that require comparisons often challenge both the query writers and SQL itself. For example, a question requesting a comparison of weekly, monthly, quarterly, and yearly values is one of the simplest questions posed during a sales analysis, but expressing this question as a query represents a formidable challenge to the query writer, the query language, and the database server.

Business questions that request sequential processing are very difficult to express as SQL queries. To derive a simple running total, for example, data analysts typically run several queries with a client tool, then paste the results together using another tool. This approach is awkward because it requires a sophisticated user, floods the network with data, and takes place on a client that is typically much slower than a database server.

The standard SQL OLAP functions and RISQL extensions to SQL provide a better solution because they are easy to use, reduce network traffic, and perform a broad range of calculations that execute quickly on the server.

 

Difference between a database and a file is the way you can access them. You can search a file sequentially, looking for particular values at particular physical locations in each line or record. That is, you might ask, “What records have the number 1013 in the first field?”

In contrast, when you query a database, you use the terms that the model defines. You can query the database with questions such as, “What orders have been placed for products made by the Shimara Corporation, by customers in New Jersey, with ship dates in the third quarter?”

In other words, when you access data that is stored in a file, you must state your question in terms of the physical layout of the file. When you query a database, you can ignore the arcane details of computer storage and state your query in terms that reflect the real world, at least to the extent that the data model reflects the real world.

Writing Own Queries
When writing your own query, you define the set of choices and conditions that you want to use to retrieve data stored in the Warehouse. You determine such attributes as data elements to be returned (e.g., last name, city, age), conditions for selecting records (e.g., equal to or less than), and sort criteria (the order and priority in which results are to be sorted).

You may want to keep in mind some general questions to guide you in composing a query:

What information are you looking for? In what collection does this data reside, and which Business Object universe would be best to use?
Bear in mind the level of detail data you need, the time periods concerned, and which source system you would use to verify the data retrieved.
Once you have a basic idea of the results you need, consider how the query should be contrained -- by time period? account segment(s)? employee or organization names/codes?

What will you do with your results? If you are presenting them to others, you may want to include segment descriptions for those unfamiliar with codes. Also, if you plan to export the data, you may want to include objects which you have used to constrain your query, to better identify the data in the exported file. (For example, although you may have used Accounting_Period as a constraint, it might help to have the period appear as a column in your exported file, so you know what period that data represents.)

General Guidelines for Executing Queries
Queries often require complex conditions to return appropriate results. WHERE clauses (also known as query filters or conditions) reduce the amount of data to be processed in a SELECT statement (comprised of desired result objects or columns) by specifying that only those rows meeting the criteria in the WHERE clause are displayed. Depending upon which tool you use to query the Data Warehouse, your degree of control over the query language and operators will vary. For example, BusinessObjects and Microsoft Access employ graphical interfaces which construct appropriate SQL behind the scenes based on the tables and joins the user indicates. Oracle SQL*Plus, on the other hand, requires the user to write his or her own SQL statments. The Oracle database system, which forms the foundation of the Data Warehouse, permits the use of powerful SQL operators, some of which may also be available as post-query operators in your desktop tool.

The following guidelines are for all users of the Data Warehouse, regardless of query tool, and provide good practices for efficient querying:

  • Refer to the help documentation. The main difficulty most people have with writing queries is knowing which table to use. If you are unsure about which table to use, refer to the table help that is available for the data collection. Pay particular attention to the sections "Common Uses" and "Cautions." These sections may help you decide if you have selected the appropriate table for your query. If you are unsure about a data element, look up its definition. The Office of Data Administration has included data element definitions and table help as part of the data collection documentation on the web.
  • Take advantage of indexes. If possible, include an indexed data element in your condition statement. A query with a record selection condition using an indexed data element tells the system to go directly to the rows in the table that contain the value indicated and to stop retrieving data when the value is no longer found. If a query does not select records based on an indexed data element in its record selection condition, the system starts searching at the first row in the table and works through every row until it reaches the last row in the table. Indexed columns are noted in each collection's documentation.
  • Certain operators or query segments are processed by the system without the use of indexes, even if the column in the condition is indexed. It may, of course, be necessary for you to construct your query in this manner to retrieve correct results, but in considering alternatives in query construction you may wish to keep in mind the following situations where indexes may not be used:
  • Negative comparisons such as Not Equal (represented by =! or <> in SQL), Different From, or Not In. Avoid negative phrasing of condition statements as much as possible. In general, it is easier (both for the system and for you) to interpret a positive phrase than a negative phrase. For example, instead of the condition statement "If term is not greater than 1998A," rephrase the statement to "If term is less than or equal to 1998A." Or, if practical, eliminate the condition from the query and filter your results on the desktop.
  • Nulls such as Is Null or Is Not Null.
  • Wildcards at the beginning of a string. Avoid matching patterns beginning with a wildcard (Like %...). A wildcard at the end of a pattern is definitely appropriate and can be very efficient (e.g., Where Fund Like 5% will retrieve all Funds 500000 - 599999).
  • If your access to data is restricted, do not force the security system to select records for you. For example, if you are authorized to access data only from a particular department, one of your record selection conditions should state "If Organization='My Organization'," where organization is the code for your department.
  • Review your query before executing it. Check to make sure that your query is as precise as possible. This includes selecting the tables that will give the best results, reviewing selection conditions and sort criteria, and if it makes sense to do so, including at least one indexed data element in the conditions statement. For example, if you want to find all undergraduate freshmen and their names, choose the Person table rather than the ADDRESS table. This is because a student can have multiple addresses, and choosing the ADDRESS table would return a name for each address the student has listed.
  • Be aware of data that is subject to change and its effect on your results. For example, a grade change can affect a student's grade point average (GPA). A query executed before and after the grade change may or may not result in a changed GPA. In addition, keep in mind that there is a "data delay" between Warehouse collections and their respective source systems. Refresh schedules are noted in each collection's documentation.
  • Give the query time to execute. Queries can take many minutes to execute; complex queries can take longer. It is not uncommon for a query to take 5 to 10 minutes to complete. In general, let the query run until it finishes. If the query takes longer than 1 hour to complete, contact Data Administration.
  •  
  • Verifying Query Results
  • Although query tools make accessing data easier, remember that University data is complex. You can easily compose a "bad query." That is, you may have selected incompatible data elements, the wrong data elements, or structured the conditions in your query such that it returns improper results. Be aware that it is possible to get an incorrect answer or misinterpret the data. For example, when mailing information to students, it is more preferable to select "current address" rather than "temporary address" as the address type. This is because the system looks at the expiration dates for the temporary and local addresses and computes which address is the most current.
  • Compare your results with known reports that are up-to-date and accurate.
  • Perform "sanity checking." Ask yourself if the information you have is a reasonable answer. For example, few classes at Penn enroll more than 100 students per class so a query returning a class enrollment of 250 students is questionable; grants are usually less than Penn's overall budget thus a grant amount that is listed as twice its Penn budget is unlikely; and Penn's student population is approximately 50% men and 50% women so a query returning a class enrollment of 98% males is probably incorrect.
  • Consider estimating likely parameters for a reasonable response before executing the query.
  • Try sequential queries that break large sorts down to components, then add them to be sure the whole is accounted for. For example, if searching for Wharton undergraduate Asian female students, look at: all Asian students, male Asian students, etc. to check the reliability of your query.
  • Use one or two conditions and zero in on what you want. When developing your query to answer a question, start by retrieving detail information, with just one or two record selection conditions. Zero in on what you want, looking at how the results change as you add each condition. If your goal is a summarized report, see how the detail report compares to the summarized one. Make sure the query includes what you want, all of what you want, and nothing but what you want.
  • Ask someone who knows the data to check your results. Ask someone in your department or school, or contact Data Administration. Also, ask the person requesting the report what results they expect to get. Or ask if the person can suggest another existing report that you can use to check to see if your results are at least in the ballpark. If your query gets results that are nowhere near what the requester expected, your query may be in error.
  • Continue to become familiar with and knowledgeable about University data by participating in the data collection listservs.
  • Just look at the report. Sometimes, just looking at the query results can help you spot glaring errors. (Why am I getting so many rows returned? Whoops--I forgot to screen on accounting period!)
  • Run a similiar query. Try running another query that approaches the question differently, and see if both queries get the same results. For example, if you want figures on telephone charges, in one query, you could set a condition on a list of object codes, and in another query, you could set a condition on the parent object code. If the queries do not get the same results, find out why not.

 

 For Support