W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract
|
|
- Adele Baker
- 8 years ago
- Views:
Transcription
1 W H I T E P A P E R Building your Big Data analytics strategy: Block-by-Block! Abstract In this white paper, Impetus discusses how you can handle Big Data problems. It talks about how analytics on Big Data is changing the way in which technology is evolving, as well as bringing in new challenges. It discusses best practices for addressing Big Data analytics concerns and how strategies can be created to cope up with these challenges. The white paper guides you on choosing the right strategy with the optimal technology stack to address Big Data analytics problems. Impetus Technologies, Inc. February
2 Table of Contents Introduction... 2 Building a Big Data strategy... 3 Big Data and the three Vs... 3 Volume... 3 Variety... 3 Velocity... 4 The Big Data Analytics lifecycle... 4 Key concerns in Big Data Analytics lifecycle... 4 Selecting the right Big Data Analytics strategy... 6 The Hadoop eco-system... 8 Putting it all together... 9 Conclusion Introduction With the number of software, Internet, and mobile users growing exponentially, there is a huge demand on software infrastructure to deal with the voluminous data being created, and that too, at high speed. Ever increasing Internet access bandwidth is allowing huge data sets to flow into the WWW. Just ten years ago, Gigabytes and Petabytes were terms meant only for academics and R&D experts. Now Gigabytes are eaten up by even small hand held devices. All of this has resulted in a unique problem. Not only do we have to deal with this very large data, we have to do it quickly, and do it in a manner that we gain business insights from it. 2
3 Building a Big Data strategy With almost all software strategies, the first step in building a Big Data plan is to gather the requirements correctly, so that the business problems can be thoroughly understood. This means that we identify and define what needs to be done and lay down the expectations from the solution. The next step is to assess and select the right strategy. This involves finding the right patterns and best practices to architect, design, and implement the relevant solution. Another important step is to determine the right tool-sets and/or technology stacks that will fulfill all the defined requirements as well as support the best practices. Finally, it is important to implement the chosen strategy and resolve the business problems at hand in a cost-effective manner. Big Data and the three Vs One of the major problems faced by data architects and stakeholders is ascertaining whether their problem is actually a Big Data issue. Unfortunately, there is no magical volume limit or an algorithm to help you decide when a data problem evolves into a big data problem. The usual trend is to define Big Data in terms of data volumes or sizes, Also, whenever regular RDBMS breakdown with excessive data, big data solutions appear as the right choices. However, a better way of classifying Big Data is by understanding the concept of the 3Vs model Variety, Volume, and Velocity of data. Volume Volume is simply the data size that we are capturing and is measured in bytes of data. While earlier, the Gigabyte was supposed to be Big, today, terms like Terabytes, Petabytes, and Exabytes are heard in the context of Big Data. Variety Variety means the different kinds of data that we are trying to capture. A simple example of variety can be a social web site capturing data from its own site, as well as drawing inputs from Twitter or Facebook using Google analytics and internally using data from other third party products. This will result in a number of data formats that may vary from text to audio to video to databases to log files to web services call, and so on. 3
4 Velocity The last V stands for velocity of data which means the speed at which the data is being captured. Again, using the earlier example, a feed from Twitter might be 10s of tweets being fired for a user, while some keyword-driven feeds can reach viral status, with thousands of tweets getting fired simultaneously. Therefore, when we are classifying a data problem as a Big Data problem, we need to consider all the three factors in the 3V model that is velocity, volume and variety. A simple volume problem might not be a Big Data problem at all, but even a marginally large data problem might get converted into a Big Data problem if velocity and variety are important parts of that issue. The Big Data Analytics lifecycle Every software product has a life cycle and same holds true for Big Data. This life cycle starts with the Creation of data. It can be created in multiple ways and have several formats. The second step after creation is Ingestion, where the data may undergo complex transformation, filtering or enrichment before it becomes suitable for the third stage, which is Analytics. Analytics may also call for some processing of data before it can be understood, to derive valuable insights from it. Visualization is the final aspect of understanding data Analytics, and hence is an important step. Key concerns in Big Data Analytics lifecycle There are underlying concerns and problems that that need to be address in every stage of the Big Data Lifecycle. The data is usually created as part of external systems like RDBMSs or server logs or audio/video streams or third party data sources. 4
5 Since the data volumes are huge, there is need to address concerns such as how to store the data, how to optimize and compress it in the data creation stage. We also need to monitor the data creation phase and take important decisions such as using Cloud-like elasticity, as well as data back up and disaster recovery strategies. The Ingestion phase has its own challenges, where various transformations and integrations play a major role. The data warehousing industry has been traditionally using ETL (extract, transformation, and load) techniques which help overcome most of the challenges related to the Ingestion phase. Here, some of the key decisions involve finding the right tools and technologies. The same holds good for the Analysis phase too, and suitable tools and technology decisions need to be taken. These decisions may involve addressing the classical build versus buy question, and assessing how existing investments can be re-used. Moreover, the data may have hidden trends and traits that are immensely useful. Statistical data mining, machine learning and NLP (natural language processing) are becoming essential parts of the Analytical phase today. The last and very critical phase is the packaging or presentation of Analytics. Here too, tools and technologies play a pivotal role and standardization is one of the key requirements. With so many new modes of data delivery available, the visualization for various channels also needs to be considered seriously. For example, it is important to understand whether a graphical view would be a better depiction or a classical tabular report, a better representation of the given problem. Similarly, mobile and other handheld devices may require a different representation. Addressing the 3Vs with the Big Data Analytics lifecycle Having understood the 3Vs as well as about how the foundation of the Big Data Analytics lifecycle has to be laid, it is important to see how these can be combined together to define and create a Big Data strategy. Organizations can begin by creating a matrix where it can capture answers to straightforward questions related to volume, variety and velocity of data against each phase of the Big Data life cycle. These questions can be as simple as how much, what type and at what rate. Once the relevant columns of the matrix are filled up, the matrix can be used as a foundation to create a strategy that can address Big Data problems. 5
6 Selecting the right Big Data Analytics strategy Impetus Technologies has been working with Big Data problems for many years now and has fashioned a master strategy that can address almost all the major Big Data issues and problems. Impetus has been using this strategy successfully for many of its customers, providing them with a failure-proof solution to their pertinent Big Data challenges. The fact is that an ideal Big Data Analytics solution needs to be able to scale easily to support the large data, which can be in Terabytes or Petabytes. The system should also be distributed across geographically unaware processors. It should be able to respond quickly to highly complex queries as well as support a wide variety of data types, including images and arbitrary data structures. The ideal analytical solution should be able to provide data scientists with all the necessary tools, using which they can explain the significance of data in a manner that is easily understood by others. An analytical solution should have the ability to incorporate machine learning, providing recommendations, and executing analytics on real-time incoming data such as logs, as well as providing domain specific canned reports. 6
7 It should also be able to handle data from heterogeneous sources, whether structured or unstructured, while providing a high rate for loading and analysis, as well as the ability to handle software or hardware failures. A Big Data Analytics strategy therefore involves creating a platform or a solution that covers all aspects of the Big Data lifecycle as well as manages the 3Vs variety, volume and velocity of data. The ideal solution for the strategy can be a platform that allows different kinds of data to be ingested. One of the ways of implementing such a solution is to utilize the Service Oriented Architecture (SOA) in the form of an extensible connector based mechanism. This connector mechanism can then allow new connectors to be added or modified, thereby making it possible to cater to new kinds of data sources efficiently and in a fool proof way. Another requirement, which is gradually gaining importance, is real-time analytics. The ideal solution should also facilitate complex, Real-time processing and transformation before the data is used for complex analytics. Complex event processing and rule engine integration is a related requirement and can be used to solve a variety of real world problems. Hence, the ideal solution should also provide CEP (Complex Event Processing) support. The analytical phase should enable easy data modeling and transformation, helping data scientists to derive the maximum value. Therefore, the solutions need to have user-friendly interfaces for data modeling as well as offer easy-touse configurable workflow management interfaces. And of course, the interaction with the existing visualization tools completes the entire life cycle. The solution must therefore allow easy integration with visualization tools, which will enable analytical data to be understood easily and also provide deeper insights into sparse or complex data sets. In order to create the ideal Big Data Analytics strategy and achieve the most optimum results, users will need to handpick the tools and technologies. They must also create a framework that uses a leading open source solution Apache Hadoop for solving Big Data problems. 7
8 The Hadoop eco-system Hadoop has certainly come a long way from its humble origin. It was initially introduced as a simple file system in the Apache Nutch project, a massive web crawler which needed a file system to store large volumes of data across the Internet. There are several tools and components that are an integral part of the Hadoop ecosystem. These tools and components are aligned with the Big Data Lifecycle and are serving different purposes for Creation, Ingestion, Analytics and Visualization. Sqoop, Flume or Chukwa allow users to procure the data to be ingested and place it in a Hadoop-based data warehouse. The Ingestion and Analytical phases may utilize Hive, PIG or programmatic processing, or workflow systems like Oozie for data transformation and enrichment. Apache Mahout can be used for a wide range of machine learning and data mining algorithms including clustering, classification, collaborative filtering and frequent pattern mining. These will also cover the advanced data analytics requirements in the Analytics phase. 8
9 Currently, Hadoop is the leader in the Open source Big Data technology world However, there are many other products and initiatives, both commercial and Open Source, that are foraying this space. There have been attempts and even some successes in running Hadoop or similar distributed processing technologies faster and also adding real-time processing support. MapR, DataRush, Hstreaming, HPCC, Platform computing, Datastax etc. are the examples of faster technologies that can serve as alternatives to Apache Hadoop. The major database and dataware house vendors like Oracle, IBM, Microsoft, HP and EMC have also jumped on to the Big Data bandwagon and come up with their own customized solutions which are usually categorized as MPP (Massively parallel processing) databases. NoSQL is another important Big Data Technology. While some call NoSQL No to SQL, Impetus prefers terming it Not only SQL, due the fact that slowly but surely, the gap between regular RDBMSs and the NoSQL world is getting reduced. There are other options, such as graphical databases like Neo4j, which can help users address Big Data issues emerging as part of exploding social media data. There are also faster versions of SQL databases such as VoltDB, which bring together the capabilities of RDBMS ACID with the power of Big Data. Hardware or appliance-based solutions also offer alternative solutions for Big Data problems. Putting it all together Now that we have the strategy, tools and technologies in place, it is all a matter for putting them together. Essentially, this is about using Hadoop as the Big Data Analytics solution. As explained earlier, Hadoop is an excellent Big Data technology that is slowly becoming the de-facto leader with the Open Source Big Data domain. There are multiple ways in which the power of Hadoop can be used or combined in the Analytical and Visualization phases of the Big Data Lifecycle. Impetus has been using Hadoop for cleaning/transforming the data into a structured form, and then loading the same into the RDBMS databases. Here, Hadoop capabilities are being harnessed to handle Ingestion and some part of the Analytical phase. On the other hand, some analytical processing is handled at the RDBMS level as the data sinks. It is now possible to use any existing visualization technique or tool from the rich world of RDBMS 9
10 visualization products. The Visualization phase can therefore be handled by existing toolsets. Hadoop can efficiently access the data between the RDBMS data sources and Hadoop systems through DBInputFormat and DBOutputFormat interfaces. Once the unstructured data is processed, it can be pushed to an RDBMS database, which can subsequently act as a data source for any BI solution. This approach provides the end-user with the flexibility of parallel processing with Hadoop and an SQL interface at the summarized data level. It is good when the summarized data is not big enough to pose a challenge for the RDBMS database being used. This solution is not as expensive as some of the other options. This approach is also suitable for the high touch queries where the user wants to perform real- time, ad hoc analytics as most of the RDBMS databases come with a comprehensive set of performance enhancement techniques. However, when the summarized data is very large, this approach might fail to deliver. Also, if batch analysis is the key requirement, then moving data to an RDBMS database could be a redundant activity. Take the instance of a scenario where the processed and summarized data, which in itself is very huge, is placed on the Hadoop system. Therefore, what can be done in a situation where there is need to use the summarized data for batch reporting without getting into the complications of moving the data out of the Hadoop system either to a MPP DW or an RDBMS? This can be done by using Hive as an interface for the data present on the Hadoop system. Hive provides a very promising interface for executing the SQLlike queries by converting them into MR (MapReduce) jobs. These MR Jobs are executed on the Hadoop clusters for the data that is itself present on Hadoop. This approach allows users to do batch and asynchronous analytics over the same data present in the Hadoop system. It is very cost-effective as it does not involve managing separate data sources, other than the existing Hadoop System. It also provides users with the flexibility to scale to any level with their summarized data. Today, several options are available in the market that allows the integration of Massively Parallel Processing Data warehouses (MPP DWs) with Hadoop. This is worth considering if you have a large amount of data even after applying summarization over it. 10
11 Using Hadoop for cleaning/transforming the data into a structured form allows users to load the data into any of the available options of MPP DWs. While the data is being uploaded, they can write User Defined Functions to perform database level analytics and then integrate the same with Business Intelligence (BI) solutions using ODBC/JDBC connectivity for end-user analytics and reporting. Also, using MPP Data Warehouses will allow users to deploy various performance enhancement techniques like index compression, materialized views, result set caching and I/O sharing. Alternatively, some of the MPP DWs may also provide users with a good framework that supports MR jobs executions within their own clusters at MPP levels providing them with second levels of parallel processing. This feature is really good for working with high touch queries and also provides an excellent framework for end user ad hoc analytics. However, the disadvantage of using this approach could be the cost involved. Most of the MPP DWs are expensive to acquire and some also require high-end servers for deployment, which could be expensive. Using this approach also calls for an expert team that has hands-on experience on MPP Data warehouse management and development. This could turn out to be a challenge in itself in today's rapidly changing technology space where Open Source technologies like Hadoop are getting widely accepted and adopted. 11
12 Conclusion In summary it can be said that an ideal Big Data strategy can lead users to create a platform or solution that covers all the aspects of the Big Data Lifecycle and manage these as well. Organizations are using the Hadoop ecosystem or a blend of alternate technologies, including FOSS and commercial technologies such as NoSQL DataRush HStreaming etc. to address Big Data problems today. There are three strategies involved in using Hadoop as the Big Data Analytics Solution. The first option is indirect analytics over Hadoop, which provides the flexibility of parallel processing of Hadoop and an SQL interface at the summarized data level. This solution is not very expensive when compared with other options. The second option is direct analytics over Hadoop, which allows you to perform batch and asynchronous analytics over the same data present over the Hadoop system. It is a very cost-effective approach as it does not involve any expense in managing the separate data sources. The third option is integrating MPP DWs with Hadoop when there is a large amount of data. This is an expensive option when compared with the two approaches discussed above. Impetus has successfully used the Hadoop ecosystem to create a comprehensive Big Data platform that provides the capabilities required to solve all concerns in the various stages of Big Data Lifecycle. About Impetus Impetus Technologies offers Product Engineering and Technology R&D services for software product development. With ongoing investments in research and application of emerging technology areas, innovative business models, and an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility Solutions, Test Engineering, Performance Engineering, and Social Media among others. Impetus Technologies, Inc Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA Tel: inquiry@impetus.com Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad To know more visit: Disclaimers The information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus Technologies Inc. 12
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationMicrosoft SQL Server 2012 with Hadoop
Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationSELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM
David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationTAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationG-Cloud Big Data Suite Powered by Pivotal. December 2014. G-Cloud. service definitions
G-Cloud Big Data Suite Powered by Pivotal December 2014 G-Cloud service definitions TABLE OF CONTENTS Service Overview... 3 Business Need... 6 Our Approach... 7 Service Management... 7 Vendor Accreditations/Awards...
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationDAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY
Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationSunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
More informationBig Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
More informationMicrosoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationThe Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
More informationBig Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
More informationData Virtualization A Potential Antidote for Big Data Growing Pains
perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationPlease give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationBig Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
More informationBig Data Defined Introducing DataStack 3.0
Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...
More informationWHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationOffload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
More informationwww.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
More informationNavigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
More informationExtending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago
More informationA Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationWhite Paper: Evaluating Big Data Analytical Capabilities For Government Use
CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationANALYTICS BUILT FOR INTERNET OF THINGS
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationTRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS
9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationMicrosoft Analytics Platform System. Solution Brief
Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationHadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
More informationApplication and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
More informationCIO Guide How to Use Hadoop with Your SAP Software Landscape
SAP Solutions CIO Guide How to Use with Your SAP Software Landscape February 2013 Table of Contents 3 Executive Summary 4 Introduction and Scope 6 Big Data: A Definition A Conventional Disk-Based RDBMs
More informationTraditional BI vs. Business Data Lake A comparison
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationInformatica and the Vibe Virtual Data Machine
White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBig data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
More informationWhite Paper: Hadoop for Intelligence Analysis
CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and
More informationParallel Data Warehouse
MICROSOFT S ANALYTICS SOLUTIONS WITH PARALLEL DATA WAREHOUSE Parallel Data Warehouse Stefan Cronjaeger Microsoft May 2013 AGENDA PDW overview Columnstore and Big Data Business Intellignece Project Ability
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More informationA Case Study of Hadoop in Healthcare
Leading a Healthcare Company to the Big Data Promised Land: A Case Study of Hadoop in Healthcare Mohammad Quraishi (IT Senior Principal - Cigna) atif71@gmail.com About me BS in Computer Science and Engineering
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationManaging Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationQUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES
[ Consumer goods, Data Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES QUICK FACTS Objectives Develop a unified data architecture for capturing Sony Computer Entertainment America s (SCEA)
More information