1 Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson
2 Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over 20 years. Specializing in Data and Storage Technologies Worked with IT Manager, SAN technology, ERP Applica6ons, Database Admin, UNIX Admin, Enterprise Architecture, Data Warehousing
3 Data & Informa>on What is Data? Raw, unorganized facts that need to be processed. What is Informa>on? Processed, organized, structured data that is useful. Data is plain facts that is processed, organized, structured or presented into useful informa>on
4 Facts about Data Data is growing at an incredible rate Gartner and IDC state that data is doubling every 18 months Current es6mate is that there is over 4 zesabytes of data in the world If the trend con6nues, by 2020 data will be over 40 zesabytes
5 What is a ZeFabyte? 1 zesabyte = 1 billion terabytes 1,000,000,000,000,000,000,000 bytes 4 zesabytes is equivalent to; 2 Quin6llion jpg images 456 Billion hours of digitally recorded music 1 Trillion HD Digital Movies 166 Billion 32GB ipad s
6 4 ZeFabytes visualized 1 Million 4TB Hard Drives 250 Billion DVD s stacked on top of one another would reach the moon - 3 >mes All data printed on 8 x 10 paper and laid end to end is 210 Trillion Miles or 35.8 Light years All data printed would require 16.4 Trillion Tree s NASA es'mates there is 400 Billion tree s on Earth
7 Imagine what 40 ZeFabytes would look like
8 What is causing Data Explosion? Internet Connec6ng everything to everyone Billions of people to Billions of devices Online Shopping (Amazon, Wal- Mart, ebay, BestBuy) File Sharing (Drop box, Google Drive, icloud, SkyDrive) Social Media Facebook Google+ TwiSer YouTube Store Everything, Delete nothing, mul>ple copies of it all
9 Structured vs. Unstructured Structured informa6on with a degree of organiza6on that is readily searchable and quickly consolidate into facts. Examples: RDMBS, spreadsheet Unstructured informa6on with a lack of structure that is 6me and energy consuming to search and find and consolidate into facts Examples: , documents, images, reports
10 Expansion of data? Structured Data (databases) Produc6on DB, Test DB, Dev DB, Repor6ng DB Mul6ple backups of data Genera6ons of DB backups Replicated copies of DB Every Produc6on database has between 3-12 copies Unstructured Data (Files, media, images) Desktop, Network share, , mobile device, Cloud Copies sent to other people Backup copies
11 Current controls of data expansion Data Compression Data Deduplica6on Data Cloning Data Archiving
12 How to control data growth? Change data management policies Create data reten6on procedures Store data more efficiently Purge data that is no longer needed Backup data less ojen Archive Data Develop more efficient backup policies
13 Analyzing Structured Data (RDBMS) Challenges DB growth impacts data analysis Too much data to analyze Analyze only relevant data (current) Improvements Purge data that is no longer relevant Historical data should be summarized Compress data to store less on disk Improve DB performance with Caching technologies and Flash Storage
14 Improved Analysis of Structured Data Normalize Databases to minimize redundancy & dependency Divide large tables into smaller tables Par66on data Move data into a third normal form (3NF) generally used in a data warehouse U6lize and leverage Business Intelligence applica6ons on Normalized data Remove Source data once Normalized
15 Trends in Structured Data Structured data is gelng too big for tradi6onal RDBMS requiring BIG DATA solu6ons Big Data is handled with applica6ons like Hadoop Big Data is leveraging new technologies such as MongoDB CouchDB Oracle NoSQL Database Apache Cassandra New systems some6mes referred as document- oriented database system or distributed key- value databases
16 What is Big Data? Tradi>onal Data Gigabytes to Terabytes Centralized Structured Stable data model Known complex interrela6onships BIG DATA Petabytes to Exabytes Distributed Semi- Structured and Unstructured Flat schemas Few Complex interrela6onships Real- >me transac6onal, online, low latency data Analy>cal aggregated data from real- 6me feeds or other sources Search suppor6ng data, both external and internal, used for loca6ng desired informa6on and/or objects
17 Technology for Structured Data SSD / Flash Technology All Flash arrays Hybrid Storage arrays SSD / Flash is gelng cheaper, more reliable, & larger capaci6es Incredible performance 10 s to 100 s of thousands of IOPS Inline Compression and/or Deduplica6on Store more data in less space Snapshots = reduced RTO/RPO s and less Cloning = less data consumed for Development and test Energy efficient SSD uses less than ¼ the power as hard drives SSD requires less cooling Hard Drives, how much longer un6l we remember it as fondly as floppy drives, dot- matrix printers, Betamax and 8- track?
18 Unstructured Data Challenges How do you storage Billions of Files? How do you store 100s of TBs or PBs of data? How long does it take to migrate 100 s of TB s or data every 3-5 years No structure to data Legacy File System approach to file organiza6on Resource limita6ons Data has lots of duplica6on How do you find data that isn t organized or searchable? Lack of reten6on policies adds to massive data explosion Data is gelng too big to backup How do you backup PBs of unstructured data?
19 Unstructured Data Current Improvements External search engines (MS Enterprise Search or Google Search appliance) Archive data into cheaper solu6ons Backup data less frequently Implement deduplica6on technologies Purge data using reten6on policies
20 Trends in Unstructured Data Object Storage Trea6ng files as Objects Crea6ng data describing unstructured data Metadata data about data Crea6on date, owner, subject, reten6on period, importance, Leverage Commodity hardware to create clusters to store data Store replicas of objects for data protec6on Store replicas between mul6ple sites for DR / BC Store revisions of data Reten6on can allow for automa6c purging of old data Backup data less frequently if at all.
21 Object Storage
22 Tradi>onal vs.. Object storage
23 Sharing Objects
24 Structure to Unstructured Data Object storage has data to describe the data Object storage is searchable Object storage is shareable Object storage can be stored once Object storage doesn t need to be migrated Object storage doesn t need to be backed up
25 What can you do? Data isn t going away, growth in inevitable Implement energy efficient storage that u6lized data reduc6on technology (compression & deduplica6on) Summarize data into useful informa6on Implement ways to reduce data cluser Implement more efficient methods of storing data Bring structure to unstructured data Archive and purge data over 6me
26 Dave Larson Solu>ons Architect PH: (800) x104 Thank You.