Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

Similar documents
Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering

Big Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas

NextGen Infrastructure for Big DATA Analytics.

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Big Data Zurich, November 23. September 2011

Virident HGST Leading the Flash Pla6orm Transforma:on March 2014

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

The Future of Data Management

So What s the Big Deal?

The 3 questions to ask yourself about BIG DATA

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Flash Use Cases Traditional Infrastructure vs Hyperscale

Big Data Technologies Compared June 2014

Meeting Increased Storage and Infrastructure Needs Accelerate Business Success

Using RDBMS, NoSQL or Hadoop?

Large scale processing using Hadoop. Ján Vaňo

Hadoop. Sunday, November 25, 12

Neil Stobart Cloudian Inc. CLOUDIAN HYPERSTORE Smart Data Storage

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Merit Member Conference 2015 Does Migra+ng to a Virtualized Data Center Make Sense in Higher Educa+on?

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

INTRODUCTION TO CASSANDRA

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction

EMC FLASH STRATEGY. Flash Everywhere - XtremIO. Massimo Marchetti. Channel Business Units Specialty Sales EMC massimo.marchetti@emc.

Understanding Object Storage and How to Use It

DNS Big Data

Informa*on Management

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Breaking the Storage Array Lifecycle with Cloud Storage

Transform Your Business Using the IBM FlashSystem

Arif Goelmhd Goelammohamed Solutions Hyperconverged Infrastructure: The How-To and Why Now?

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Driving Datacenter Change

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Hadoop implementation of MapReduce computational model. Ján Vaňo

Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

Big Data Realities Hadoop in the Enterprise Architecture

Data Warehousing. Yeow Wei Choong Anne Laurent

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE

NoSQL Data Base Basics

Real Time Analy:cs for Big Data Lessons Learned from Facebook

ווירטואליזציה להאצת המערכות הרפואיות

Lecture Data Warehouse Systems

The Flash-Transformed Financial Data Center. Jean S. Bozman Enterprise Solutions Manager, Enterprise Storage Solutions Corporation August 6, 2014

Cyber Security With Big Data

FLASH ARRAY MARKET TRENDS

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

Big Data; Old News or New Hype? Marcel den Hartog, June 2012

Big Table in Plain Language

[Hadoop, Storm and Couchbase: Faster Big Data]

BIG DATA CHALLENGES AND PERSPECTIVES

Using Ultra-Large Data Sets in Healthcare New Questions-New Answers

StorReduce Technical White Paper Cloud-based Data Deduplication

How To Scale Out Of A Nosql Database

An to Big Data, Apache Hadoop, and Cloudera

Red Hat Storage Server

All You Wanted to Know About Big Data Projects Chida Jan 2014

BIG DATA TRENDS AND TECHNOLOGIES

Hyper-converged IT drives: - TCO cost savings - data protection - amazing operational excellence

Journey to the All-Flash Data Center

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

The Flash Transformed Data Center & the Unlimited Future of Flash John Scaramuzzo Sr. Vice President & General Manager, Enterprise Storage Solutions

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Introduction to NetApp Infinite Volume

Manage Video Clutter and Organize Your Digital Library

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

EMC - XtremIO. All-Flash Array evolution - Much more than high speed. Systems Engineer Team Lead EMC SouthCone. Carlos Marconi.

NoSQL for SQL Professionals William McKnight

Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS

The Rembrandt Group Strategies for BIG DATA

Deploying Flash in the Enterprise Choices to Optimize Performance and Cost

EMC BACKUP MEETS BIG DATA

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Intro to AWS: Storage Services

The Enterprise Data Hub and The Modern Information Architecture

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

Introduction to Predictive Analytics. Dr. Ronen Meiri

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Moving Virtual Storage to the Cloud

FLASH FOR ALL. Virtualizing Tier 1 Applications. Ravi Venkat Data Center Architect vexpert 2013,VCAP5-DCA

Nimble Storage Replication

TCO Case Study. Enterprise Mass Storage: Less Than A Penny Per GB Per Year. Featured Products

Forward Looking Statements

Mobile Big Data AnalyEcs

Chapter 1. Contrasting traditional and visual analytics approaches

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

The Flash Based Array Market

TUT NoSQL Seminar (Oracle) Big Data

Transcription:

Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson

Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over 20 years. Specializing in Data and Storage Technologies Worked with IT Manager, SAN technology, ERP Applica6ons, Database Admin, UNIX Admin, Enterprise Architecture, Data Warehousing

Data & Informa>on What is Data? Raw, unorganized facts that need to be processed. What is Informa>on? Processed, organized, structured data that is useful. Data is plain facts that is processed, organized, structured or presented into useful informa>on

Facts about Data Data is growing at an incredible rate Gartner and IDC state that data is doubling every 18 months Current es6mate is that there is over 4 zesabytes of data in the world If the trend con6nues, by 2020 data will be over 40 zesabytes

What is a ZeFabyte? 1 zesabyte = 1 billion terabytes 1,000,000,000,000,000,000,000 bytes 4 zesabytes is equivalent to; 2 Quin6llion jpg images 456 Billion hours of digitally recorded music 1 Trillion HD Digital Movies 166 Billion 32GB ipad s

4 ZeFabytes visualized 1 Million 4TB Hard Drives 250 Billion DVD s stacked on top of one another would reach the moon - 3 >mes All data printed on 8 x 10 paper and laid end to end is 210 Trillion Miles or 35.8 Light years All data printed would require 16.4 Trillion Tree s NASA es'mates there is 400 Billion tree s on Earth

Imagine what 40 ZeFabytes would look like

What is causing Data Explosion? Internet Connec6ng everything to everyone Billions of people to Billions of devices Online Shopping (Amazon, Wal- Mart, ebay, BestBuy) File Sharing (Drop box, Google Drive, icloud, SkyDrive) Social Media Facebook Google+ TwiSer YouTube Store Everything, Delete nothing, mul>ple copies of it all

Structured vs. Unstructured Structured informa6on with a degree of organiza6on that is readily searchable and quickly consolidate into facts. Examples: RDMBS, spreadsheet Unstructured informa6on with a lack of structure that is 6me and energy consuming to search and find and consolidate into facts Examples: email, documents, images, reports

Expansion of data? Structured Data (databases) Produc6on DB, Test DB, Dev DB, Repor6ng DB Mul6ple backups of data Genera6ons of DB backups Replicated copies of DB Every Produc6on database has between 3-12 copies Unstructured Data (Files, media, images) Desktop, Network share, email, mobile device, Cloud Copies sent to other people Backup copies

Current controls of data expansion Data Compression Data Deduplica6on Data Cloning Data Archiving

How to control data growth? Change data management policies Create data reten6on procedures Store data more efficiently Purge data that is no longer needed Backup data less ojen Archive Data Develop more efficient backup policies

Analyzing Structured Data (RDBMS) Challenges DB growth impacts data analysis Too much data to analyze Analyze only relevant data (current) Improvements Purge data that is no longer relevant Historical data should be summarized Compress data to store less on disk Improve DB performance with Caching technologies and Flash Storage

Improved Analysis of Structured Data Normalize Databases to minimize redundancy & dependency Divide large tables into smaller tables Par66on data Move data into a third normal form (3NF) generally used in a data warehouse U6lize and leverage Business Intelligence applica6ons on Normalized data Remove Source data once Normalized

Trends in Structured Data Structured data is gelng too big for tradi6onal RDBMS requiring BIG DATA solu6ons Big Data is handled with applica6ons like Hadoop Big Data is leveraging new technologies such as MongoDB CouchDB Oracle NoSQL Database Apache Cassandra New systems some6mes referred as document- oriented database system or distributed key- value databases

What is Big Data? Tradi>onal Data Gigabytes to Terabytes Centralized Structured Stable data model Known complex interrela6onships BIG DATA Petabytes to Exabytes Distributed Semi- Structured and Unstructured Flat schemas Few Complex interrela6onships Real- >me transac6onal, online, low latency data Analy>cal aggregated data from real- 6me feeds or other sources Search suppor6ng data, both external and internal, used for loca6ng desired informa6on and/or objects

Technology for Structured Data SSD / Flash Technology All Flash arrays Hybrid Storage arrays SSD / Flash is gelng cheaper, more reliable, & larger capaci6es Incredible performance 10 s to 100 s of thousands of IOPS Inline Compression and/or Deduplica6on Store more data in less space Snapshots = reduced RTO/RPO s and less Cloning = less data consumed for Development and test Energy efficient SSD uses less than ¼ the power as hard drives SSD requires less cooling Hard Drives, how much longer un6l we remember it as fondly as floppy drives, dot- matrix printers, Betamax and 8- track?

Unstructured Data Challenges How do you storage Billions of Files? How do you store 100s of TBs or PBs of data? How long does it take to migrate 100 s of TB s or data every 3-5 years No structure to data Legacy File System approach to file organiza6on Resource limita6ons Data has lots of duplica6on How do you find data that isn t organized or searchable? Lack of reten6on policies adds to massive data explosion Data is gelng too big to backup How do you backup PBs of unstructured data?

Unstructured Data Current Improvements External search engines (MS Enterprise Search or Google Search appliance) Archive data into cheaper solu6ons Backup data less frequently Implement deduplica6on technologies Purge data using reten6on policies

Trends in Unstructured Data Object Storage Trea6ng files as Objects Crea6ng data describing unstructured data Metadata data about data Crea6on date, owner, subject, reten6on period, importance, Leverage Commodity hardware to create clusters to store data Store replicas of objects for data protec6on Store replicas between mul6ple sites for DR / BC Store revisions of data Reten6on can allow for automa6c purging of old data Backup data less frequently if at all.

Object Storage

Tradi>onal vs.. Object storage

Sharing Objects

Structure to Unstructured Data Object storage has data to describe the data Object storage is searchable Object storage is shareable Object storage can be stored once Object storage doesn t need to be migrated Object storage doesn t need to be backed up

What can you do? Data isn t going away, growth in inevitable Implement energy efficient storage that u6lized data reduc6on technology (compression & deduplica6on) Summarize data into useful informa6on Implement ways to reduce data cluser Implement more efficient methods of storing data Bring structure to unstructured data Archive and purge data over 6me

Dave Larson Solu>ons Architect PH: (800) 478-5161 x104 Email: dave@freeitdata.com Thank You.