Page tree
Skip to end of metadata
Go to start of metadata

About this module

This module shows you how to work with data sources, data models, data formats (transformations) and pipelines (orchestration).


This module shows you how to work with excel data that contains sales records by employee.  As the data contains multiple rows of data this tutorial will show you how to identify unique records through using external keys to define uniqueness.  The end result is a fully normalised set of data in SQL Server or RDBMS and you'll be more comfortable with several data topics in Universal Platform.

Wikipedia Definition:

Database normalization is the process of organizing the attributes and tables of a relational database to minimize data redundancy.

Normalization involves refactoring a table into less redundant (and smaller) tables but without losing information; defining foreign keys in the old table referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

 

So what does this module cover? The scenario is as follows:

Sales records in a spreadsheet will normally repeat certain information such as:

  • Sales person (employee)
  • Products
  • Customer

The normalisation of this data will ensure we have a single reference to each employee, customer and product.  The sales records will then reference the employeeID, customerID and productID.

The video tutorial explores several Universal Platform features such as:

  1. Storage and File Systems
  2. Data Modeling
  3. Master Data Management
  4. Extract, Transform and Load
  5. Pipelines