Advanced Analytics & IoT Architectures Presented by: Tom Marek and Orion Gebremedhin
Use Case: ETL Offloading Have you outgrown your data delivery SLAs? Get the right data at the right time 2
ETL Processing Yesterday SLA windows are slowly being filled and even missed File Processing Backlog ETL Server Database Server Analyze 3
ETL Processing Today SLA windows are easy to hit with MPP File Processing Backlog Apache Hadoop Database Server or Spark Server Analyze 4
ETL Processing Today SLAs move from nightly to the minute Service Bus Hadoop / Spark Streaming Spark SQL Analyze 5
Hybrid Big Data for Smart Meter Analytics Azure Blob Storage + HDInsight Input: Base Facts 2.6B Rows 160GB HDInsight Output: Derived Facts 188K Rows 20MB Download derived facts Base Facts and Confidential data BI Reporting & Analytics Blob Storage BI Staging Extract meter reads from data mart and upload to blob storage 81.6M Rows 3.5GB Customer data (CRM) MDMs Data Store Reporting and Analytics SQL Server BI Smart Meters Meter Data Management System SQL Server BI Suite Input files 6
EDWDEV SQL Database Neudesic Hybrid Big Data Processing Framework for SMAS ELT and Processing Architecture Container:srphadoop ClusterNme=srphadoopcluster AZCopy Container: Staging_Dailyfactintervalreads Content: 1 file per day Paritioned By: None External Table: factintervalreadsstaging Purpose: This container is used for staging the incremental (daily) interval read data. C:\FactTable Extract Usually a single file will be uploaded to this container. An external table will then be used to extract the data and load it in to the partitioned target table. Container: PartitionedFactIntervalReads Content: Historical Data Partitioned By: DateSK External Table:FactIntervalReads Purpose: This table is paritioned by DateSK and serves as the main source for the Gap Analytics or any future Interval read analytics. Container: StagingreadsWithGaps Content:30 days Paritioned By: DateSK, and Clustered By: ChannelSK External Table: stagingreadswithgaps Purpose: This container is used to store all the reading from meters with gaps in their reading. This table also contains a count of reads in the duration as well as additional flags Container: edwfactintervalreadshadoop Content:30 days worth of gaps Paritioned By: None External Table: edwfactintervalreadgaphadoop Purpose: This is the final table whose files will be downloaded and inserted in to a SQL Server Table. AZCopy C:\MDMSPoCDailyDownloads\FactIntervalReads Daily file Extracts Container: archivefactintervalreads Content: all historical meter data Patitioned By: None External Table: FactIntervalReadArchive Purpose: Full load to FactIntervalRead and archive daily flat files for historical record. SSIS 7 7
Use Case: Iterative Exploration What can we do with all of this data? Mine for answers, one question at a time. 8
Iterative Exploration Build expert systems, move to supervised learning, and evolve to reinforced learning Web Service used for Orchestration Apache Hadoop Apache Hadoop Spark SQL Analyze Machine Learning API End Point 9
Iterative Exploration Build expert systems, move to supervised learning, and evolve to reinforced learning Web Service used for Orchestration HD Insight Azure Machine Learning API End Point Azure Data Warehouse Power BI 10
Iterative Exploration Monitor and remove noise from textual data Web Service used for Orchestration Azure SQL DB Keyword Analytics Power BI Dataset Statistical Media Services Power BI Dashboards Machine Learning API End Point Event Hubs Stream Analytics Power BI Dataset Temporal 11
Use Case: Self Service Are your self service reports only telling half the story? Quickly deliver large datasets for ad hoc analysis. 12
Self Service Allowing business to fulfill their analytics needs Semi-structured Files Apache Hadoop Spark SQL Analyze Service Bus SQL Server 13
Hybrid Self-Service 14
Hybrid Self-Service 15
Use Case: IoT What action does your IoT device drive? Help guide the buyers of the device to the action they are looking to take. 16
IoT Using Azure Standard processing of data in real time HD Insight Spark SQL Analyze Device API Azure Event Hub Azure Stream Analytics Power BI Dataset Temporal Power BI Dashboards 17
Next Step: Become the BI Superhero Information Management Big Data Storage Apache Hadoop Real-time intelligence Machine learning IoT Dashboards and Visualizations and more! Ideate, chart your quick wins, ask questions and get answers to your real Big Data challenges. It s insightful, it s easy and can be done from the comfort of your conference room www.neudesic.com/meetneat 18
Questions? Tom Marek Tom.Marek@Neudesic.com Twitter: @twmarek Orion Gebremedhin Orion.Gebremedhin@Neudesic.com Twitter: @oriongm