Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson
Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over 20 years. Specializing in Data and Storage Technologies Worked with IT Manager, SAN technology, ERP Applica6ons, Database Admin, UNIX Admin, Enterprise Architecture, Data Warehousing
Data & Informa>on What is Data? Raw, unorganized facts that need to be processed. What is Informa>on? Processed, organized, structured data that is useful. Data is plain facts that is processed, organized, structured or presented into useful informa>on
Facts about Data Data is growing at an incredible rate Gartner and IDC state that data is doubling every 18 months Current es6mate is that there is over 4 zesabytes of data in the world If the trend con6nues, by 2020 data will be over 40 zesabytes
What is a ZeFabyte? 1 zesabyte = 1 billion terabytes 1,000,000,000,000,000,000,000 bytes 4 zesabytes is equivalent to; 2 Quin6llion jpg images 456 Billion hours of digitally recorded music 1 Trillion HD Digital Movies 166 Billion 32GB ipad s
4 ZeFabytes visualized 1 Million 4TB Hard Drives 250 Billion DVD s stacked on top of one another would reach the moon - 3 >mes All data printed on 8 x 10 paper and laid end to end is 210 Trillion Miles or 35.8 Light years All data printed would require 16.4 Trillion Tree s NASA es'mates there is 400 Billion tree s on Earth
Imagine what 40 ZeFabytes would look like
What is causing Data Explosion? Internet Connec6ng everything to everyone Billions of people to Billions of devices Online Shopping (Amazon, Wal- Mart, ebay, BestBuy) File Sharing (Drop box, Google Drive, icloud, SkyDrive) Social Media Facebook Google+ TwiSer YouTube Store Everything, Delete nothing, mul>ple copies of it all
Structured vs. Unstructured Structured informa6on with a degree of organiza6on that is readily searchable and quickly consolidate into facts. Examples: RDMBS, spreadsheet Unstructured informa6on with a lack of structure that is 6me and energy consuming to search and find and consolidate into facts Examples: email, documents, images, reports
Expansion of data? Structured Data (databases) Produc6on DB, Test DB, Dev DB, Repor6ng DB Mul6ple backups of data Genera6ons of DB backups Replicated copies of DB Every Produc6on database has between 3-12 copies Unstructured Data (Files, media, images) Desktop, Network share, email, mobile device, Cloud Copies sent to other people Backup copies
Current controls of data expansion Data Compression Data Deduplica6on Data Cloning Data Archiving
How to control data growth? Change data management policies Create data reten6on procedures Store data more efficiently Purge data that is no longer needed Backup data less ojen Archive Data Develop more efficient backup policies
Analyzing Structured Data (RDBMS) Challenges DB growth impacts data analysis Too much data to analyze Analyze only relevant data (current) Improvements Purge data that is no longer relevant Historical data should be summarized Compress data to store less on disk Improve DB performance with Caching technologies and Flash Storage
Improved Analysis of Structured Data Normalize Databases to minimize redundancy & dependency Divide large tables into smaller tables Par66on data Move data into a third normal form (3NF) generally used in a data warehouse U6lize and leverage Business Intelligence applica6ons on Normalized data Remove Source data once Normalized
Trends in Structured Data Structured data is gelng too big for tradi6onal RDBMS requiring BIG DATA solu6ons Big Data is handled with applica6ons like Hadoop Big Data is leveraging new technologies such as MongoDB CouchDB Oracle NoSQL Database Apache Cassandra New systems some6mes referred as document- oriented database system or distributed key- value databases
What is Big Data? Tradi>onal Data Gigabytes to Terabytes Centralized Structured Stable data model Known complex interrela6onships BIG DATA Petabytes to Exabytes Distributed Semi- Structured and Unstructured Flat schemas Few Complex interrela6onships Real- >me transac6onal, online, low latency data Analy>cal aggregated data from real- 6me feeds or other sources Search suppor6ng data, both external and internal, used for loca6ng desired informa6on and/or objects
Technology for Structured Data SSD / Flash Technology All Flash arrays Hybrid Storage arrays SSD / Flash is gelng cheaper, more reliable, & larger capaci6es Incredible performance 10 s to 100 s of thousands of IOPS Inline Compression and/or Deduplica6on Store more data in less space Snapshots = reduced RTO/RPO s and less Cloning = less data consumed for Development and test Energy efficient SSD uses less than ¼ the power as hard drives SSD requires less cooling Hard Drives, how much longer un6l we remember it as fondly as floppy drives, dot- matrix printers, Betamax and 8- track?
Unstructured Data Challenges How do you storage Billions of Files? How do you store 100s of TBs or PBs of data? How long does it take to migrate 100 s of TB s or data every 3-5 years No structure to data Legacy File System approach to file organiza6on Resource limita6ons Data has lots of duplica6on How do you find data that isn t organized or searchable? Lack of reten6on policies adds to massive data explosion Data is gelng too big to backup How do you backup PBs of unstructured data?
Unstructured Data Current Improvements External search engines (MS Enterprise Search or Google Search appliance) Archive data into cheaper solu6ons Backup data less frequently Implement deduplica6on technologies Purge data using reten6on policies
Trends in Unstructured Data Object Storage Trea6ng files as Objects Crea6ng data describing unstructured data Metadata data about data Crea6on date, owner, subject, reten6on period, importance, Leverage Commodity hardware to create clusters to store data Store replicas of objects for data protec6on Store replicas between mul6ple sites for DR / BC Store revisions of data Reten6on can allow for automa6c purging of old data Backup data less frequently if at all.
Object Storage
Tradi>onal vs.. Object storage
Sharing Objects
Structure to Unstructured Data Object storage has data to describe the data Object storage is searchable Object storage is shareable Object storage can be stored once Object storage doesn t need to be migrated Object storage doesn t need to be backed up
What can you do? Data isn t going away, growth in inevitable Implement energy efficient storage that u6lized data reduc6on technology (compression & deduplica6on) Summarize data into useful informa6on Implement ways to reduce data cluser Implement more efficient methods of storing data Bring structure to unstructured data Archive and purge data over 6me
Dave Larson Solu>ons Architect PH: (800) 478-5161 x104 Email: dave@freeitdata.com Thank You.