Why NoSQL databases are needed for the Internet of Things

Similar documents
REAL-TIME BIG DATA ANALYTICS

How To Handle Big Data With A Data Scientist

How To Make Data Streaming A Real Time Intelligence

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

Architecting for the Internet of Things & Big Data

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

No-SQL Databases for High Volume Data

Transforming the Telecoms Business using Big Data and Analytics

Machina Research Viewpoint. The critical role of connectivity platforms in M2M and IoT application enablement

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

onetransport 2016 InterDigital, Inc. All Rights Reserved.

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

INTRODUCTION TO CASSANDRA

Blueprints and feasibility studies for Enterprise IoT (Part Two of Three)

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Evolving from SCADA to IoT

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Master big data to optimize the oil and gas lifecycle

IoT and Big Data- The Current and Future Technologies: A Review

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Complex, true real-time analytics on massive, changing datasets.

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Getting Real Real Time Data Integration Patterns and Architectures

Cloud Big Data Architectures

Data Modeling for Big Data

Essential Elements of an IoT Core Platform

secure intelligence collection and assessment system Your business technologists. Powering progress

FORRESTER CONSULTING INTERNET OF THINGS SURVEY - KEY FINDINGS. Building Value from Visibility: 2012 Enterprise Internet of Things Adoption Outlook

SAP and Hortonworks Reference Architecture

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Infrastructures for big data

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Big Data - Infrastructure Considerations

JOURNAL OF OBJECT TECHNOLOGY

Big Data and Healthcare Payers WHITE PAPER

Luncheon Webinar Series May 13, 2013

Integrating a Big Data Platform into Government:

Time-Series Databases and Machine Learning

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

A new agenda item for enterprise executives: Enterprise IoT (Part One of Three)

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Explore the Art of the Possible Discover how your company can create new business value through a co-innovation partnership with SAP

Horizontal IoT Application Development using Semantic Web Technologies

Big Data and Data Science: Behind the Buzz Words

Real-Time Handling of Network Monitoring Data Using a Data-Intensive Framework

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

The emergence of big data technology and analytics

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Understanding traffic flow

Industry 4.0 and Big Data

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Informix The Intelligent Database for IoT

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Data Refinery with Big Data Aspects

Ganzheitliches Datenmanagement

Big Data Analytics. Rasoul Karimi

Big Data Analytics in Health Care

White Paper April 2006

BIGDATA GREENPLUM DBA INTRODUCTION COURSE OBJECTIVES COURSE SUMMARY HIGHLIGHTS OF GREENPLUM DBA AT IQ TECH

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Cloud Scale Distributed Data Storage. Jürmo Mehine

Accenture and Oracle: Leading the IoT Revolution

SAP HANA Data Center Intelligence - Overview Presentation

The Power And Use of FireScope Unify ESB

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

Performance Management of SQL Server

Big Systems, Big Data

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

I D C T E C H N O L O G Y S P O T L I G H T

Choosing The Right Big Data Tools For The Job A Polyglot Approach

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION

Dominik Wagenknecht Accenture

CLOUD BASED SEMANTIC EVENT PROCESSING FOR

Making Sense of Big Data in Insurance

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Towards Smart and Intelligent SDN Controller

Transcription:

MachinaResearch//ResearchNote MachinaResearch,2014 MachinaResearch ResearchNote WhyNoSQLdatabasesareneededfor theinternetofthings 1 EmilBerthelsen,PrincipalAnalyst April2014 Theissue Ourview BigDataintheInternetofThingsissnowballing.Theamountofdataisever-increasing and becoming more and more varied. The impact on traditional relational database managementsystems(rdbms)willbesignificant.databasesneedtoadoptandmeet these new IoT requirements with greater data processing agility, multiple analytical tools including real-time analytics, and aligned and consistent views of the data. This Research Note examines what database capabilities will be required to address data managedintheinternetofthings,andhownosqlsystemslikemongodb,cassandra andhbasearemeetingthischallenge. Thetraditionalrelationaldatabasemanagementsystemswillcontinuetohavearole in the Internet of Things when processing structured, highly uniform data sets, generated from a vast number of enterprise IT systems and where this data is managed in a relatively isolated manner. When it comes to managing more heterogeneous data generated by millions and millions of sensors, devices and gateways, each with their own data structures and potentially becoming connected and integrated over the course of many years, databases will require new levels of flexibility, agility and scalability. In this environment, NoSQL databases are proving theirvalue. ThesignificanceoftheInternetofThingsisnotthatmoreandmoredevices,peopleandsystemsare connected with one another. It is that the data generated from these things is shared, processed, analysedandacteduponthrough newandinnovativeapplications,applying completelynew analysis methods and within significantly altered timeframes. The Internet of Things will drive Big Data, providingmoreinformation,frommanydifferentsources,inreal-time,andallowustogaincompletely newperspectivesontheenvironmentsaroundus. The difference between machine-to-machine (M2M) and the Internet of Things (IoT) is driven by significant changes in thehandling of data. In M2M, machine-generated data generally reflectswell- defineddatasets,communicatedwithinestablishedprotocolsandformats,anddeliverswell-defined alertsandnotificationswhenvaluesexceedtheirparameters.applicationsinm2mmakeefficientuse ofthisdataastheseapplicationshavebeendevelopedhand-in-handwithwhatthecharacteristicsof

MachinaResearch//ResearchNote MachinaResearch,2014 2 thedata.ineffect,applicationanddataareintrinsicallydesignedasonetomeetthespecificpurposes oftheapplicationinfairlyrobustyetstaticmodel. IntheInternetofThings,theworldofdataandapplicationsrequiressignificantlymoreflexibility,agility andscalability.theserequirementsaredrivenbyalimitednumberofunderlyingneeds.thisresearch Noteexplorestheserequirementsandunderlyingneedsinmoredetail,including:! Diversity of devices and data Data generated from an exponentially growing number of diversesensors,devices,applications,andthingswillbeaccompaniedbyagrowingdiversityin thestructureandscaleofthatdata andmoreandmoresourcesofadditionaldata ranging fromdatasourcedfromcorporatesystemstocrowdsourceddatawillneedtobecombinedwith thisdata.! Flexibleandagilesystems AstheInternetofThingsbecomesamoreopensystemwherenew sensors,devices,applicationsandthingsareaddedandconnected,and evolve what weterm Subnets of Things, 1 thesupportingapplicationsanddatabasesthatmanagethedatagenerated from these things will need to remain flexible and agile, allowing businesses to meet their requirementswithouthavingtoreinventtheapplicationeverytime.! Sophisticatedanalytics Where simple alerts and notifications were the backbone of M2M systems,analyticsbecomesthecornerstoneoftheinternetofthings,requiringenhancedand multipleanalyticalapproachestoaddresstherequirementsoftheapplications. It should come as no surprise that developments in the Internet of Things and databases have run parallelwithdiscussionsaroundbigdata,atopicexploredinotherpublicationsbymachinaresearch. 2 TherelationshipbetweenIoTandBigData,andthechangesfromM2MtoIoTarecapturedinFigure1. Figure1:M2M,IoTandBigDataconnected[Source:MachinaResearch,2014] 1 MachinaResearchcreatedthe term Subnets of Things in contextofour BigDatainM2M:Tippingpointsand Subnets of Things White Paper released in February 2013, in referring to an island of interconnected devices, driven either by a single point of control, single point of data aggregation, or potentially a common cause or technologystandard 2 MachinaResearchhaspublishedaseriesofResearchNotes,aStrategyReportandaWhitePaperonthetopicof Big Data as this remains a significant opportunity area in M2M and IoT for operators, system integrators, and solutionandplatformproviders.

MachinaResearch//ResearchNote MachinaResearch,2014 3 What is important to take notice of is that improvements in technologies as illustrated by M2M/IoT ApplicationPlatforms 3 andnosqldatabasesareenablingthesedevelopmentstotakeplace. 1 WhatisNoSQL? SQL stands for Structured Query Language, designed for managing data in relational database management systems (RDBMS). NoSQL, referred to as Not Only SQL,wasdesignedformanagingdata whichdidnotnecessarilyhavethestructureofrdbms.someoftheleadingplayersinrdbmsarewell- establishednamessuchasoracle,ibm,microsoftandsap. Instead of data being structured in fixed relational columns, data could be stored in objects of, for example,graphformat(neo4jandvirtuoso),keyvalueformat(riakanddynamodb)orwidecolumn formats (Accumulo, Cassandra and HBase). One additional and widely adopted format of NoSQL databaseisthedocument-basedsolutionaspresentedbymongodbandcouchdb. ToillustratethefundamentaldifferencebetweenRDBMSandadocument-basedNoSQLsolution,the formerstoresdatainhighlystructuredrelationaldatabaseswheredataschemesare fixed, andcurated datahastoconformtospecificcharacteristics(fieldlength,characters,etc.)defined.nosqldatabases, inanutshell,allowfordatatobestoredinwhatessentiallymaybedefinedasform-freeformats,i.e. withoutarigidityorstructureofrdbms,andallowingforalltypesofstructured,semi-structuredand completely unstructured data to be stored in an object. The significant benefit of this approach becomesclearwhenconsideringthevastamountsofdatageneratedby things which may exhibit an equally extensive range of structures and qualities and from which diverse data needs to be pulled togetherandanalysedinordertocreatetheiot.thisdiversitymayrangefromsmallerdifferencesin data structures (for example different field lengths or which data fields are captured) to more substantial differences such as data without any immediate structure as exhibited by images, audio files,randomcontentintextmessages,facebooklikes,andsoon.nosqlprovidessignificantflexibility withregardstodatamanagement. 2 Extendingdatamodelschemaandanalytical processing Another challenge presented by the Internet of Things involves the extensibility in the software. As businessesrecognizeandrealizenewopportunitiesfromtheexpandingestateofsensorsanddevices implemented and the data generated, additional requirements from applications and supporting 3 For more detailed information about M2M/IoT Application Platforms, see Machina Research s White Paper on TheEmergenceofM2M/IoTApplicationPlatforms published in September 2013

MachinaResearch//ResearchNote MachinaResearch,2014 4 systems emerge. As seen with the development in M2M/IoT Application Platforms, 4 managing and developingapplicationsandaddressingscalabilityaresignificantandnewcapabilitiesforplatformsin the age of the Internet of Things. These developments will drive innovations in databases, data analytics,andthestructureofdatastorage. Responding to the Internet of Things with relational database management systems (RDBMS) is an optionbutpresentsalimitingfactorwhichintimewillbecomeasignificantobstacletorealizationofthe full opportunities available from all types of data. Databases in the Internet of Things require the flexibilityofthenosqlapproach,allowingasexploredearlier,differenttypesofdatatobestoredbut moreimportantly,theagilityandflexibilitytoadapttheunderlyingdatamodelstonewandchanging businessrequirementsandapplications. Applications and data models follow the requirements of businesses, and another key development areaneededwillbeindataanalytics.nolongerlimitedtohistoricaldataanalysis,dataanalyticsmoves to as close to real-time analytics as possible, introducing and requiring tools such as data streaming analysis, and with the growing numbers of multiple data sources, Complex Event Processing. In the Internet of Things, data analysis will require multiple analytical approaches, and in some cases, significantvalueisachievedfromreal-timedataanalytics(forexample,inmissioncriticalorlife-saving scenarios where information may guide rescue services within dangerous environments) or from historicalanalysisforpredictivemaintenanceservices. TheInternetofThingsgeneratessignificantlymoredatatobestored,andwhilecloudbasedservices havecertainlyprovidedanefficientandhighlyscalablesolution,developingtoolstomanagedistributed databases (rather than located on a single server) emerge asonemore capability required from the platforms.inthiscorner,hadoopwithitshdfsarchitectureallowsforhundredsandeventhousandsof nodestobecomepartofthedatacluster,anewanddistributedwayofmanagingdata. Finally,thelocationofdatastorageandstorageoptimizationwillbecomeimportantfactors.Thespeed and frequency with which data will need to be accessed by different applications may change, from instancetoinstance,theoptimalconfigurationofdatastorage.essentially,theidealdatabasestructure requiredtosupport(forexample)offlinehistoricalanalysesisdifferenttothatrequiredtosupportreal time analyses. However, specific datasets may be required to support either off line or real time applications(or,morerealistically,aneverevolvingmixofrealtimeandnon-realtimeapplicationsthat changesovertimeasnewiotapplicationsaredeveloped).theimplicationsthatanevolvingmixofreal timeandofflineapplicationsaccessingrelateddatasetsmayhaveonoptimaldatabasestructureisjust oneillustrationofawiderdynamic:everchangingapplicationrequirementsintheinternetofthingswill resultinaneedtodynamicallymanagetheoptimalstructureofassociateddatabasesonanongoing basis. 4 For more detailed information about M2M/IoT Application Platforms, see Machina Research s White Paper on TheEmergenceofM2M/IoTApplicationPlatforms published in September 2013

MachinaResearch//ResearchNote MachinaResearch,2014 3 Dataintegrationbecomesthechallenge 5 Howtohandledatainterrogationisnotanewchallenge,butitisasignificantlygreateronegiventhe volumeofdatainvolved,itsdiversity,multiplestoragelocationsanddifferentanalyticalprocesses.the unifiedviewofdatabecomesthechallengethatdataanalyticsplatformswillneedtoaddress. This challenge involves, at a fundamental level, the development and application of semantics technologies,recognising,forexample,commonvocabulariesorlinkeddata.atmoreadvancedlevels, dataintegrationinvolvesmappingandovercomingtheheterogeneityofthedata,andpresentingthis withaunifiedview. Theadditionalchallengecomesfromunifyingthedataattwolevels,thefirstofwhichisclearlyinterms of the actual data produced by sensors and devices such as temperature readings, on/off status, vibration,etc. Atasecondlevel,andperhapssignificantwhenlookingtogroupsensorsanddevicesaccordingtoany numberofcriteriathatmaybechosen,thetaskthenbecomesoneofaugmentingdataasitarrivesin supporting platforms or the databases, and ensuring that these data identifiers (or metadata) are correctlyaggregatedandprocessedalongsideassociateddata,enhancingthecapabilityofplatformsin verifyingandcontextualisingdata. 4 Conclusionsandrecommendations Platformsmanagingdataandapplicationsrequiresignificantlymoreflexibility,agilityandscalabilityto meet the requirementsof businesses in the Internet of Things. These requirementswill be dynamic, constantly changing as the opportunities from implemented sensors and devices, and their data are identified. Businesses will look to gain and maintain competitive advantage from innovative applications,requiringquickerapplicationdevelopmentandagiledatamodels. In this environment of the Internet of Things and Big Data, Machina Research makes the following recommendations:! EnterprisesshouldcarefullyconsiderdifferenttypesofdatabasesasillustratedinFigure2to addressthedatageneratedfromtheexponentiallygrowingnumberofdiversesensors,devices, andthingsaswellasthegrowingdiversityinstructureandscaleofthatdata.insomeinstances, SQLandhybriddatabaseswillmeettherequirementsofbusinesses,andinothers,asaddressed inthisresearchnote,anosqldatabasewillbethewayforward.

MachinaResearch//ResearchNote MachinaResearch,2014 6 Figure2:NewCapabilitiesintheInternetofThings[Source:MachinaResearch,2014]! Providersofdatabaseswillneedtoensurethattheirsolutionsareflexibleandagile,meeting thedynamicrequirementsofbusinesswithouthavingtoreinventtheapplicationordatamodel everytime,andhavingrobustintegrationswithm2m/iotapplicationplatforms! Adiverseapproachto dataanalyticswillpaydividends.forprovidersofdataanalytics, the abilitytosupportmultipleanalyticalapproachestoaddressthevariedrequirementsofbusiness applicationsanddatawillbecomeanincreasinglyimportantfeatureinthefuture.