Monday, 11 May 2020

Data Cubes


Take Aways:
1. What are data cubes? and how they are useful for business and analytics?
2. What kind of structures do the cubes take?
3. Why do we analyze data cubes?
4. How analysis are expanded with different Schemes?

Storage of Data as Cube or Cube Formation from in sourcing data.

Data Cube is just like Rubiks Cubes and different type of multidimensional Puzzling Cubes. Analysis of OLTP data via OLAP happens predominantly with snapshot versions of data. Just like in a Goods warehouse or post office or library, we would have made different racks for storing oldest to the latest and based on region segregation, we will have multiple dimension. It is easy to imagine 3 Dimension of course like region in one dimension, quarter/months in another dimension.

What is the advantage of such Cubes for Business?

1. Easy Navigation to particular month and particular region or customer / product.
2. Analysis and Dash boarding for the business becomes easy and meaningful.

What is the advantage of such Cube for Miner?

A Miner look for different patterns like linear, sequential, non linear (tree, graph or other forms). Miner likes to link a particular variable to some distant variable to gain insight. To bring about interesting findings.

It will be easy to imagine and perform image spectroscopy (i.e., pattern identification)


Twist and Turns on Data Cube:

Unlike a Rubik Cube where we could do a lot of rotation based on the freedom of rotation, on a data cube we could do few operations.

Four types of analytical operations that can be performed on Cube are as follows.

1. Roll-up (also called as Dimensional Reduction, Compression, Aggregation and grouping by)
2. Drill-down (also called as Dimensional Expansion, Decompression, detailing, index to pagination to particular page)
3. Slice and dice (filter and filter)
4. Pivot (rotate)

How are the Parts of Cube Defined?

Just like any Rubik cube will have colored pixelated squares and faces of cube.

Dimensions with numerical values (like color value on rubiks cube)
It is placed on one of the faces called facts
Values in RDBMS are nothing but Measures.

Facts/Measurement (like faces of the cube)
It denotes all different colors of the cubes together.
Attribute / column names of a table in RDBMS mostly become Dimensions. Mostly dimension only have all the primary keys of various fact tables.
Facts are the axis upon which the data/measures/dimension could rotate.
Rotation of data/measures/dimension is constrained by facts.

Sometimes, Dimension looks like 1D, 2D, 3D like Axis, and facts look like actual values placed on 1D, 2D and 3D. It really does not matter if it has been named differently except for communication without fellow beings. We can say business or miner wants to measure some thing to derive facts, so chooses what to measure and how to measure, such new measurement with units are called Facts and Dimensions are nothing but the measured actual value.

Is Normalization of Entity Relationship in RDBMS equivalent to Cube?

No. Not at all.
ER is mainly to reduce storage volume of data by removing redundancy in similar attributed grouped together into Entities. As far as Cube are concerned they are answering the future predictions of the business and it is not performed on all the data collected via OLTP system, a subjective question is raised by business, based on the question different data are collected from various OLTP system and file system and analyzed together to come up with an answer. It is a very subjective analysis. ER is made for performance and for system design. Cubes are made for analysis and analysis design.

What are the different Structures of Data Cubes that can be formed?

Cubes are also called as Schemes, In general they fall into following categories,
1. Star Schema
2. Snow flake schema
3. Constellation or Galaxy

Please check the two belong link for the types of schema and differences. For me they are very deep and advanced topic and of less use while doing a bird eye view.

https://www.guru99.com/star-snowflake-data-warehousing.html
https://www.vertabelo.com/blog/data-warehouse-modeling-star-schema-vs-snowflake-schema/

The Dimension & Facts combined in different patterns form Schema for pattern analysis each with their pros and cons.

What are the different types of OLAP system Available?
Systems are of least importance for both business and miner but they are infrastructure where the subjective analysis take place. The subjective analysis can take place in RDBMS or Desktop, Web, Mobile, Spatial (GPS based) or in Dataware house system. They have separate names coined, but that is not of focus for now from my point of view.


References:
https://www.guru99.com/online-analytical-processing.html
https://www.guru99.com/fact-table-vs-dimension-table.html
https://www.tutorialspoint.com/dwh/dwh_data_warehousing.htm
https://www.guru99.com/star-snowflake-data-warehousing.html
https://www.vertabelo.com/blog/data-warehouse-modeling-star-schema-vs-snowflake-schema/

No comments:

Post a Comment

Skill, Knowledge and Talent

I kept overwhelming with data, information, knowledge and wisdom over a period of time. And I really wanted to lean towards skilling on few ...