Tuesday, March 13, 2012

star schemas and business intelligence

I saw Jeffrey Palermo, the President/COO of Headspring, speak on business intelligence at the Dallas Day of Dot Net convention this past weekend. He asserted that anyone who has to make informed managerial decisions uses business intelligence to do so, and that anyone who produces a tabular report has produced business intelligence. The majority of his talk was on how to do business intelligence well, and, to that end, Jeffrey spoke of making star schemas to easily craft tabular data.

A star schema has fact tables at its "center" and helper tables to assist fact tables outward at the "arms" of the starfish. A dimension table, containing string values, such as the names of projects, is an example of an off-center table. The visual of a star is not as important as the central fact tables themselves. They have columns which only hold numerical and datetime values. The data points in the rows of a fact table are the facts.

A typical schema is normalized with tables corresponding to domain-driven design objects which one may interact with by way of an object relational mapper. Writing objects to tables and doing basic CRUD (create, read, update, delete) is really easy in a normalized schema. A star schema offers denormalized data. Doing CRUD maneuvers would be painful, but reading metrics in the name of creating tabular data is easier than it would otherwise be. In Jeffrey's talk he showed an enormous swath of SQL that had to be crafted, joining numerous tables and holding conditional logic, in the name of building a metrics report from a normalized schema. By using a star schema, the SQL could be reduced to the twelve easy lines of SQL seen in the first image at the top of this posting. The minimized SQL is obviously going to run faster, but more importantly it is going to be easy to maintain, update, and make sibling scripts from. Jeffrey asked the audience if anyone had ever been asked to add a new column to an existing tabular report of columns only to find that the SQL that rendered the tabular report was a spaghetti mess of hard to navigate code. Imagine that pain point as it is the reason to build a star schema. You say you'd be convinced if you didn't have to pick between a normalized and denormalized schema? You don't have to pick. Build an application as you normally would with a normalized schema. Then, create a separate star schema and create a process to update the star schema's data from the normalized schema. A star schema, not unlike a SOLR table, is denormalized data existing beyond the bounds of one's primary schema that is updated (in terms of data) by a regular, scheduled process, and which makes some vital queries hurt less. The talk moved from the general (What is Business Intelligence?) to the specific (What is a Star Schema?) to the really specific (What may be bad about comparing one time period to another?). So, what may be bad about comparing one time period to another? February and March have a different number of days and thus if one sees March revenue for an enterprise reported slightly higher than that of February one may get the misimpression that company is increasing profits when it in fact may not be. Jeffrey recommended trying to break months up into predictable periods that each have the same number of work days to account for this issue. One of the helper tables he showed on a slide was an aid to that end.


I don't know who the woman with her hands extended above is. Her name was Christine Something-er-other and she had done significant business intelligence work for Sonic. She spoke of her own experiences with business intelligence at a few moments during Jeffrey's talk and in doing so contributed significantly to the session. A couple of things she said were worth Googling:

  • Star schemas may have measures in addition to dimensions. This suggests that where a dimension encapsulates character-type data being tracked, a measure is a calculation (sum, minimum, maximum, average) off of facts.
  • There is such a thing as a snowflake schema in contrast to a star schema. This suggests that a snowflake schema is a star schema in which some dimensions are appended to fact tables in lieu of being columns in a dimension table. Appended dimensions have the shape of joined tables containing normalized data for the dimension. Perhaps a list of U.S. cities that have a Sonic might merit its own table in lieu of being jammed into the regular dimension table supporting a fact table. One gets into snowflaking by breaking a dimension out of the typical dimension table supporting a fact table and making it its own delicate little snowflake.

The image below shows a dimension table. I suppose this table could be joined to the fact table in the second image in this posting by way of adding a column for TaskId at the fact table. (I'm guessing. I don't know how the association is made.) This is an example of a helper table that could be one of the arms of a star schema supporting the central fact tables.

Cool tools mentioned in this talk:

  1. RedGate SQL Data Compare for comparing the data in identical schemas and finding the SQL to bridge the delta.
  2. Highcharts JS for making cool and easy-to-read-at-a-glance charts from star schema facts.

No comments:

Post a Comment