Page tree
Skip to end of metadata
Go to start of metadata

About this module

This module shows you how to manage data updates in a .csv file stored on Dropbox and synchronise the changes to a SQL Server database, using Universal Platform constraints.
Module 1 of this tutorial explained how to configure the methods used to extract data in a .csv file from Dropbox to a SQL database, and successfully import the required data.
So what does this module cover? Well, the scenario is this:

In Module 1, Harmony, a member of the HR team, was required to capture all company staff records in a SQL database and ensure that all changes to staff information was subsequently updated in the database.
 
Harmony does not have any SQL database experience but does know how to work with a .csv file. So, Harmony was happy to add staff information to a .csv file but asked Bob from the IT Department to assist her with importing the data to the database. As Harmony works from a number of different global locations, the solution was to enable Harmony to add staff records to the HR.csv file on Dropbox, and Bob would then use Universal Platform to import the data to the database.
 
At the end of Module 1, Harmony and Bob had successfully imported the data in HR.csv to the SQL database, using Universal Platform.
 
Harmony and Bob now need to manage any changes to staff records in HR.csv and ensure only the updated information is imported to the database. Bob recommends using Universal Platform's constraints to facilitate this.
 
Your task is to help Bob by following Module 2 to ensure that only changes to the data in HR.csv is imported to the database.


Module 2 of this tutorial explains how to configure the constraints used when extracting data in a .csv file from Dropbox to a SQL database, to ensure only new records and updated data is imported. So in this module:

  • You will start by querying the database table you created in Module 1 to view the data in the database.
  • You will then run the pipeline to import data from the HR.csv file to your database table.
  • At this stage you will see that the records are duplicated in your database table. This is because there are no constraints in place and so data is re-imported as new instances.
  • Once you have analysed the imported data, you will clean up your database table and configure a constraint to restrict the data imported, based on specific entity attributes in your data model.
  • You will run the pipeline again and analyse the data imported to your database table with the constraint applied.
  • There should now be no duplicate records when you re-import HR.csv to you database table.
  • You will then make changes to information in HR.csv and import the updated data to your database.
  • You will complete the module by querying the database table to ensure the data is in the table.

Now that you are familiar with what we are going to do in this module, let's begin! 

 Viewing the data in the database table

Your first task is to view the data in your database table. The current data in the database is what you imported from HR.csv in Module 1 of this program. 

To see the data that is already in your database:

  1. Go to the browser tab that has Data Manager open.
  2. Select Queries. The Queries screen is displayed.
  3. Click the CheckTable_DataHub SQL query you created in Module 1 to check that the data in the .csv file imported to your database table correctly. The Query Details screen is displayed.
  4. Select the Editor tab and ensure the following query is displayed: Select * from DataHub
  5. Click Run to execute your query and view the eight records imported from HR.csv in your database table.
  6. Select the Log tab to view the number of records imported to your database table. There should be eight rows in your table as there are eight records in HR.csv.


 Importing data from the .csv file to the SQL database through the pipeline

Now, let's say that the data in HR.csv is updated and so you need to import the data to your database again.
To populate your database table with the data in HR.csv, you must run the pipeline you created in Module 1. 
To do this:

  1. Go to the browser tab that has Platform Manager open.
  2. Select Pipelines. The Pipelines screen is displayed.
  3. Click the Pipeline_CSV pipeline you created in Module 1 of this tutorial. The Pipeline Details screen is displayed.
  4. Select the Editor tab. The pipeline editor may take some time to load, so don't run away!
  5. Click Run to execute your program. The results are displayed in the Output pane for you to review and should complete with no errors. If you receive errors, please check your pipeline configuration.



Next you must check that the data in the .csv file imported to your database table.

 Viewing the updated data imported to the database table

 You view that the data imported to your database table by going back to Queries and running your table query again. 

To do this:

  1. Go to the browser tab that has Data Manager open and ensure the following query is displayed: Select * from DataHub
  2. Click Run to execute your query and view the eight records imported from HR.csv in your database table.
  3. Select the Log tab to view the number of records imported to your database table. You will see that there are now sixteen rows in your table. This is because you have re-imported the data from HR.csv and the data is duplicated.



If you run the pipeline and query again, you will that see the eight records continue to duplicate to 24, then 32 and so on.
To avoid duplicate data and ensure that only changes to the data in the .csv file is imported to the database, you need to create constraints. Constraints specify what data to import and what to ignore, based on specific entity attributes in your data model.

 Deleting records in the database table

Before you continue, let's clean up the duplicate data in your database table.

  1. Go to the browser tab that has Data Manager open.
  2. Select Queries. The Queries screen is displayed.
  3. Click Add.
  4. On the General tab, enter a name for the query in the format: ClearTable_DataHub.
  5. Click Save.
  6. Select the Editor tab and enter the following query, written in the SQL Standard Query Language: delete dbo.DataHub
  7. Click Run to execute your query and delete the records in your database table.
  8. Click Save & Close.




To check that the data in your database is deleted:

  1. Click the CheckTable_DataHub SQL query you created in Module 1 to check that the data in the .csv file imported to your database table correctly. The Query Details screen is displayed.
  2. Select the Editor tab and ensure the following query is displayed: Select * from DataHub
  3. Click Run to execute your query.
  4. Select the Log tab to view the number of records in your database table. There now should be zero records in the table.

 What you've done so far

 Now that all data in the database table is removed, let's have a quick look at what you have done so far, before you continue and create an import constraint.

 Creating a data import constraint

As you have seen, importing data from a .csv file to a database table creates duplicate records when you re-import the file, rather than updating new or changed information and ignoring information that is the same. To avoid this, you must configure a constraint that specifies what data in the .csv file to import and what to ignore. Constraints restrict the data imported based on specific entity attributes in your data model.

In this example you are going to constrain data imported from HR.csv, based on the EmployeeNo field in your database table. According to the data model you set up in Module 1 of this tutorial, the EmployeeNo field in your database table is mapped to the Employee_Number field in HR.csv. You will now set EmployeeNo as unique so that only one record is imported for each Employee_Number in HR.csv. This will ensure that no duplicate employee records are imported. 
To create a constraint:

  1. Go to the browser tab that has Data Manager open.
  2. Select Data Models. The Data Models screen is displayed.
  3. Click the DataModel_DataHub data model you created in Module 1 of this tutorial. The Data Model Details screen is displayed.
  4. Select the Model tab.
  5. Select the DataHub database schema.
  6. Select the Constraints tab and add the following:
    • Employee Number
  1. Configure the constraint as follows:
    • Type: External Key
    • Is Unique?: On
  1. Select EmployeeNo from the list in the Constraint Attributes section.
  2. Click Add.
  3. Click Save & Close.




Now that the constraint is configured, let's run the pipeline and populate your database table with the data in HR.csv

 Importing data from the .csv file to the SQL database with a constraint applied

You are now ready to run the pipeline and import the data in the HR.csv file to your SQL database with a constraint applied. 
To do this:

  1. Go to the browser tab that has Platform Manager open and ensure the Pipeline_CSV pipeline is displayed in the Editor tab.
  2. Click Run to execute your program. The results are displayed in the Output pane for you to review and should complete with no errors.


Next you must query your database table to check that the data in the .csv file imported correctly.

  1. Go to the browser tab that has Data Manager open.
  2. Select Queries. The Queries screen is displayed.
  3. Click the CheckTable_DataHub SQL query you created in Module 1 to check that the data in the .csv file imported to your database table correctly. The Query Details screen is displayed.
  4. Select the Editor tab and ensure the following query is displayed: Select * from DataHub
  5. Click Run to execute your query and view the eight records imported from HR.csv in your database table.
  6. Select the Log tab to view the number of records imported to your database table. There should be eight rows in your table as there are eight records in HR.csv. If you run the pipeline and query again, you will that see the eight records do not duplicate as the constraint is checking Employee_Number in HR.csv and EmployeeNo in the database table, and only importing unique records and updated data for existing records.



Now that you have resolved the issue of importing duplicate records to your database table, let's see how updates to information in the .csv file work.
In this example, Harmony Cunningham is now married and her last name has changed. This is updated in HR.csv. If you are sitting in instructor-led training, your presenter will now make changes to employee information in HR.csv and update the file on Dropbox. If you are working through this module on your own, please skip this section and go to the Updating the .csv file on Dropbox section to import the updated HRNew.csv file.
Now that HR.csv is updated, you must run the pipeline again and import the data in the file to your SQL database, with the constraint applied. 
To do this:

  1. Go to the browser tab that has Platform Manager open and ensure the Pipeline_CSV pipeline is displayed in the Editor tab.
  2. Click Run to execute your program. The results are displayed in the Output pane for you to review and should complete with no errors.
  3. Go to the browser tab that has Data Manager open and ensure the following query is displayed: select * from DataHub
  4. Click Run to execute your query and view the results displayed in the Output pane. There should be eight rows in your table and Harmony's last name should now be changed to the new value.
  5. If you run the pipeline and query again, you will note that only unique records and updated data for existing records is imported.

 Updating the .csv file on Dropbox

If you are working through this module on your own, we have created a variation of HR.csv called HRNew.csv, with updated employee information. Using HRNew.csv enables you to simulate the information update, so that you do not need to edit HR.csv yourself, and can simply proceed by importing the new file.

In this example, Harmony Cunningham is now married and her last name has changed to Osullivan. To use the new .csv file, you must edit the Get File function in the pipeline to call HRNew.csv. Next you must run the pipeline and import the data in the file to your SQL database, with the constraint applied. 
To do this:

  1. Go to the browser tab that has Platform Manager open and ensure the Pipeline_CSV pipeline is displayed in the Editor tab.
  2. Select the Get File function block on the editor to view the function Details and Properties. If the Properties pane is not displayed, click Open Properties.
  3. Enter /HRNew.csv in the Get File Properties Path field. This is the relative path to where the .csv file is stored on Dropbox.
  4. Click Save.
  5. Click Run to execute the program. The results are displayed in the Output pane for you to review and should complete with no errors.
  6. Go to the browser tab that has Data Manager open and ensure the following query is displayed: select * from DataHub
  7. Click Run to execute your query and view the results displayed in the Output pane. There should be eight rows in your table and Harmony's last name should now be changed to the new value, Osullivan.

 And that's a wrap

 And that's a wrap - congratulations on completing this module! Before we finish, let's have a quick look at what you have done: 


The next part of this tutorial, Module 3, explains how to create a job that automatically synchronises any updates to data in the .csv file to the database, at set intervals.
See you next time!

  • No labels