Lab 1: Using the Bluemix Analytics for Hadoop Service to Analyse Data Using the Bluemix Analytics for Hadoop Service to Analyse Data Hands-On Lab
Lab 1: Using the Bluemix Analytics for Hadoop Service to Analyze Data Lab Objectives: This lab will show you how to use the Analytics for Hadoop Service in Bluemix to analyse large volumes of medical data collected by heart monitors using BigSheets, a spreadsheet-style tool accessible from the console of the Analytics for Hadoop Service in Bluemix. Lab Duration : 45 minutes 1. Creating a Bluemix Application with an Hadoop Service Instance In this section you'll create an Bluemix application with an Hadoop Service instance that will be used throughout the rest of the labs 1. In your browser go to the Bluemix URL http://bluemix.net and login if necessary 2. Make sure you're in the Dashboard tab (if not click on the Dashboard link at the top of the page to take you there) 3. Scroll down to the Applications section and click on CREATE AN APP 4. For the template choose Web 5. As a starter we will choose Browse samples and click Browse samples 2 / 15
6. We will select the boiler plate Internet Of the Things 7. Choose a name and click create Your application should restart, wait for a few minutes. 8. Let s add a new service to the app, from the app overview click on ADD A SERVICE 9. Scroll down to the Big Data category and click on the icon for the IBM Analytics for Hadoop service. Then click the application must Restage again. 2. Uploading the medical data to the Analytics for Hadoop Service instance In this section you'll upload medical data to the instance of the Analytics for Hadoop Service instance that you just created 3 / 15 Create,
You ll have to upload files then create a new working environment to have an spreadsheet like environment with several tabs. 1. Click on the Hadoop service of your APP then on the Launch icon to launch the console of the service instance you just created in another tab. Figure 3 Analytics for Hadoop console icon 2. From the console click the Files tab 3. In the DFS navigator, expand the user directory and select the directory biblumix 4. Click the Upload icon. Figure 4 Upload icon 5. Click Browse and select the file \BDADaysLabs\Lab1-BigInsight\Historical_Personal_Data.txt where \BDADaysLabs is the root folder of the files provided to you by the instructor. Click OK. 6. Repeat to upload the file \BDADaysLabs\Lab1\Historical_Health_Data.txt 3. Importing the Data into BigSheets Now that you have the sample data uploaded to HDFS, you can import the data into BigSheets and create workbooks that contains that data. 1. From the console click the BigSheets tab and click New Workbook. 4 / 15
Figure 5 Create BigSheets Workbook 2. Name the workbook PersonalData. In the distributed file system browser, select the file Historical_Personal_Data.csv that you uploaded in the previous step. 3. In the preview pane on the right, select a new reader to map the data into a spreadsheet format. (Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma Separated Value (CSV) Data from the drop-down list. Figure 6 Changing the reader 4. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the tabular data appear more compact. Figure 7 Fit Columns to width 5. Click the green check at the bottom to save the workbook PersonalData. 6. Click on the Workbooks link as shown below. 5 / 15
Figure 8 Workbooks link Click Build new workbook 13. Name the workbook HealthData. In the distributed file system browser go to user/biblumix/ select the file Historical_Health_Data that you uploaded in the previous step. 14. Change Line Reader : In the preview pane on the right, select a new reader to map the data into a spreadsheet format. (Currently the reader is Line Reader.) Click the edit icon next to 'Line Reader' and select Comma separated Values (CSV) Data from the drop-down list. 15. Click the green check mark to change the reader. Click Fit Columns in the preview pane to make the tabular data appear more compact. 16. Click the green check at the bottom to save the workbook HealthData. 17. You now have 2 workbooks HealthData and PersonalData 4. Combining the Data From Multiple Workbooks and Creating Ccharts Now, since we have two workbooks with the a common field (PatientID), we can perform a join of these two workbooks as the basis for exploring the medical study data by heart failure Figure 9 Workbooks link 1. Click on the Workbooks link as shown above 6 / 15
2. Click on the PersonalData workbook link 3. Click Build New Workbook. 4. Rename it PatientData by clicking on the pen and click Save At this stage, at the bottom of the page you have : 5. In the top left-hand side, you should see a link called Add sheets. This allows you to perform additional analysis on your data within the current workbook. Click Add Sheets 7 / 15
Figure 10 Load additional sheet 6. The Load option will allow you to load data into the current workbook from another workbook. Click the Load icon and SELECT the HealthData workbook link. 8 / 15
7. Set the Sheet name to HealthData1 and click the green arrow icon to load the new workbook into your current workbook. 8. Verify that you see two tabs at the bottom on your current workbook. Move your mouse over the second one, and a tool tip will show the action and the name you provided for this sheet / tab within your current workbook. 9. Next add another sheet (a sheet should be considered as a new tab in a workbook) in order to perform the Join function. 9 / 15
Figure 12 Add sheet using JOIN 10. Name the sheet PatientByHeartFailure 11. Select Inner as the Join type 12. Click on the arrow next to Add sheets (at least 2) to join and select PersonalData 13. Click on the green plus next to Add sheets (at least 2) to join: 14. Click on the arrow next to Add sheets (at least 2) to join and select HealthData1 15. Click on the green plus next to Add sheets (at least 2) to join: 10 / 15
16. Select the PatientID column as the Join column Figure 14 Join column 17. Click the green checkmark icon to complete the operation 18. Click the arrow in any column header and select Organize Columns 19. Scroll down to the end and Remove the column PatientID1 (we have 2 Patient ID columns because of the Join) 20. Click the green checkmark icon to complete the operation 21. Select Save and Exit. 11 / 15
22. Click the Run button to run the workbook (it takes a while). 23. We will now export this workbook to a table, for the data to be accessible by our Analytics tools such as SPSS and Cognos. We will use in the next labs this dataset in SPSS to build predictive Heart failure model and in Cognos to build a simple dashboard. Click on the button create table and hit confirm. 22. Let s now check that our table has been created. Go the Files tab then select the Catalog Tables tab and you should see the schemas sheets and under the table patientdata. We will use this table in the next labs when we will be using SPSS and Cognos 23. Le ts go back to the BigSheets tab and leverage analytical functions embedded in the tool, select the workbook PatientData. 24. To easily visualize the data by Heart failure, you can create a chart. Click Add chart and then click on chart and then on Pie 12 / 15
25. Complete the following information to produce a pie chart 13 / 15
14 / 15
By clicking on the small icon above, you get a view of the workflow used to create the Patient Data workbook Congratulations You ve successfully completed Lab 1 where you have explored the Bluemix Hadoop Analytics service powered by IBM BigInsights 15 / 15