Big Automotive Data. Leveraging large volumes of data for knowledge-driven product development
|
|
|
- Juniper Austin
- 10 years ago
- Views:
Transcription
1 Big Automotive Data Leveraging large volumes of data for knowledge-driven product development Mathias Johanson, Stanislav Belenki, Jonas Jalminger, Magnus Fant Alkit Communications AB Mölndal, Sweden {mathias, stanislav, jonas, Mats Gjertz Volvo Car Corporation Gothenburg, Sweden Abstract To be successful in the increasingly competitive consumer vehicle market, automotive manufacturers must be highly responsive to customer needs and market trends, while responding to the challenges of climate change and sustainable development. One key to achieving this is to promote knowledge-driven product development through large scale collection of data from connected vehicles, to capture customer needs and to gather performance data, diagnostic data and statistics. Since the volume of data collected from fleets of vehicles using telematics services can be very high, it is important to design the systems and frameworks in a way that is highly scalable and efficient. This can be described as a Big Data challenge in an automotive context. In this paper, we explore the opportunities of leveraging Big Automotive Data for knowledge driven product development, and we present a technological framework for capture and online analysis of data from connected vehicles. Keywords big data; automotive telematics; analytics. I. INTRODUCTION The globalization of markets, resources and knowledge require product development companies to be highly responsive to customer needs and to environmental changes. In the automotive industry, the major challenges of limiting CO 2 emissions while delivering high quality products to a growing number of customers in expanding and highly heterogeneous markets require very efficient and powerful tools and methods to capture customer needs and also to gather performance data and statistics to improve product development. In the testing, verification and validation phases of automotive product development, large volumes of measurement data are being gathered from fleets of connected test vehicles [1, 2]. With the advent of telematics systems and improved means of wireless vehicular communication more or less ubiquitously, the opportunities to capture and collect data has improved tremendously over the past few years. This has an enormous potential of improving automotive product development, by making reliable performance data, statistics and customer behavior information available as quickly and efficiently as possible in the development process. The ability to make good use of this valuable resource can be clearly identified as a key means to competitiveness in the automotive industry. The big challenge is how to be able to efficiently capture, collect, manage, analyze and make good use of the large volumes of data, i.e. to convert collected data into useful knowledge. The term Big Data has recently been popularized, referring to data sets that are so large and complex that they are difficult to handle using conventional database management systems and traditional data processing tools. In this paper we will study Big Data applications in an automotive context. This involves a number of applications that can be expected to benefit from large scale capture and analysis of data from vehicles, driven by connectivity and onboard telematics services. II. BIG DATA Design of scalable and distributed data management systems has been the goal of the database research community for a long time. Initial approaches include distributed databases for update intensive applications, and parallel database systems for analytics-oriented tasks. Whereas parallel database systems have matured into large commercial systems, distributed database systems were never very successfully commercialized. Instead, different ad-hoc approaches to achieve scalability were developed. New database approaches to achieve scalability are sometimes collectively referred to as NoSQL solutions, to distinguish them from traditional relational database systems (RDBMS), which are usually based on the query language SQL. However, since SQL is frequently used as the query language also in NoSQL solutions, the term is sometimes interpreted as Not only SQL. One breed of NoSQL database systems, motivated by changes in data access patterns of applications and the need to scale out to large clusters of distributed processing units, is the new class of systems referred to as Key-Value stores [3] which are now being widely adopted. Examples include Amazon's Dynamo and LinkedIn's Voldemort. A particularly popular type of key value store is a Document Store database, which relies on the basic concept of a document for encapsulating data. The document encoding can be any kind of structured data container, including XML, JSON or Word, etc, or in the automotive domain various measurement data
2 file formats, such as MDF. Examples of document-oriented database systems include MongoDB and Couchbase. Another approach is column-oriented database management systems [4], which store data tables as sections of columns of data, rather than as rows of data, which most RDBMSs do. Examples include Google's BigTable and Apache Cassandra. In the field of data analytics, the MapReduce paradigm [5], pioneered by Google, and its open source implementation Hadoop [6] have seen widespread adoption in both industrial and academic contexts. There are many initiatives to improve Hadoop based systems in terms of usability and performance, and to integrate it with mathematical and statistical analysis frameworks such as R [7, 8]. Apache Spark [9] is another analytics cluster computing framework, which is based on the same distributed file system as Hadoop (HDFS), but is not limited to the two-stage MapReduce paradigm, which makes it considerably faster for certain applications. An important point to make about Big Data is that it is not merely about data volume. Sometimes Big Data is described as spanning three dimensions: Volume, Velocity and Variety (the three V s of Big Data, originally proposed by Doug Laney in a 2001 paper [10]). Frequently, a fourth V is added, for Veracity, and sometimes also a fifth for Value. In this multi-dimensional definition, volume refers to the size of the data sets, velocity highlights the need for time-sensitive processing of some data, variety implies that data can be of many different kinds with widely different characteristics, and veracity refers to the problem of deciding whether data from different sources can be trusted. Ultimately, the goal of any Big Data application is to create added value in terms of increased revenue, new services, higher quality products, or other benefits, which is captured by the fifth V representing value. The three original V s are present in Gartner s oft-cited definition of Big Data [11]: Big Data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. III. BIG DATA IN AN AUTOMOTIVE CONTEXT To arrive at a definition of Big Automotive Data, we will explore the dimensions of the five V model in the context of automotive data application scenarios. A. Volume of Automotive Data To get a feeling for what volumes of data can be expected in a Big Automotive Data context, we have performed some estimations of the size of data sets for a number of relevant application scenarios. The estimations should be seen as rough order-of-magnitude approximations in order to illustrate the need for unconventional data processing and storage techniques. 1) Extensive CAN bus monitoring The first application scenario is an extensive CAN 1 bus monitoring scenario, intended to serve as an example of the magnitude of data that results if data is captured at a high rate in a fleet of vehicles. Let us assume that there are five CAN buses in a modern vehicle, where each CAN bus supports communication at 500 kilobits per second (kbps), and let us furthermore assume an average bus load of 50%. For an application monitoring CAN data in customer vehicles, let us assume the vehicles are on average used for one hour each day. In a test vehicle, we assume the vehicle is used for 8 hours every day. Then, for a customer vehicle fleet of one million vehicles, the total size of the CAN signal data set will grow by about 560 terabytes per day, or about 200 petabytes per year. For test vehicle applications (where extensive CAN bus monitoring is commonplace already today), a fleet of 1000 test vehicles will produce 4.5 terabytes of data per day, or equivalently, around 1.6 petabytes per year. 2) Remote Diagnostic Read-Out A Diagnostic Read-Out (DRO) application is a concept whereby a chunk of vehicle related data is accessed from a vehicle using the vehicle s diagnostic system. In a usage scenario whereby DRO can be performed remotely using onboard telematics services, DRO can be performed on a regular basis to find faults (by reading Diagnostic Trouble Codes, DTC), and to access statistics and performance data. Currently, DRO is typically performed when a vehicle is brought in for service to a repair shop, whereupon the DRO data set is uploaded to the data warehouse of the car manufacturer. In a future remote DRO scenario, where a fleet of one million customer vehicles are read out once a day, assuming a data chunk size of 100 kilobytes, will result in an aggregate data volume increase of about 100 gigabytes per day, or about 36 terabytes per year. For a test vehicle fleet of 1000 vehicles, assuming 10 read outs per day and a data chunk size of one megabyte (reflecting the fact that testing and validation typically require more extensive data at higher sampling rates), the corresponding numbers are 10 gigabytes per day or 36 terabytes per year. Although less dramatic than the CAN monitoring scenario, this is still a lot of data to manage. 3) State-of-Health As a final example, an automotive State-of-Health application monitors a number of carefully selected parameters in the vehicles, and triggers data upload to allow the status of the vehicle to be assessed. This typically requires substantially lower data rates. For a vehicle fleet of one million customer vehicles, the estimated data size grows by about one gigabyte per day (i.e. 365 gigabytes per year). For a test vehicle fleet of 1000 vehicles, the corresponding 1 CAN Controller Area Network, is an in-vehicle communication bus technology heavily used for broadcasting of sensor signal values between Electronic Control Units, and for diagnostic services
3 estimates are 100 megabytes per day or 36 gigabytes per year. The numbers are summarized in Table 1. TABLE I. ESTIMATED SIZE OF DATA SETS FOR THREE DIFFERENT AUTOMOTIVE APPLICATIONS IN FLEETS OF TEST VEHICLES AND CUSTOMER VEHICLES RESPECTIVELY Application Fleet Per day Per year CAN bus monitoring Customer 560 TB 206 PB CAN bus monitoring Test 4.5 TB 1.6 PB Remote Diagnostic Read-Out Customer 100 GB 36 TB Remote Diagnostic Read-Out Test 10 GB 3.6 TB State-of-Health Customer 1 GB 365 GB State-of-Health Test 100 MB 36 GB B. Velocity of Automotive Data Some automotive applications require quick response times for processing of data, whereas others have more relaxed real-time requirements. Examples of delay-sensitive applications include collaborative active safety functions or autonomous driving services, whereby vehicles communicate safety-related data used by in-vehicle components to other vehicles, to make safety related decisions. For such use cases, the processing of the data must typically be performed with very low latency. In general, use cases that are based on positioning of vehicles typically are time-sensitive in different degrees, since the position is constantly updated when the vehicle is moving. The need for real-time processing, or near real-time processing can therefore be seen to apply for Big Automotive Data applications. C. Variety of automotive data Automotive data sets are very diverse. Common types of data are time-based signals, such as CAN bus signals from the multitude of onboard sensors in the vehicle, scalar data, such as DTCs or parameters that are accessed through diagnostic services, images, video, statistical data, legal and administrative data and more. Moreover, there is a plethora of different data formats and standards in use in the automotive industry. Variety of data is definitely a relevant issue for big automotive data sets. D. Veracity of automotive data Veracity of data is highly relevant in the automotive context, in order to guarantee security and safety in using customer vehicles and connected services. To ensure the relevance and quality of data captured for the benefit of knowledge-driven product development, provenance and traceability of data is very important. For instance, when carrying out performance tests of vehicle components or subsystems, it is of vital importance that the data captured originate from the correct components, with the correct configuration, software versions, etc. Another aspect of veracity in this context is that automotive companies tend to be very secretive about their engineering data, requiring sophisticated information security mechanisms when communicating data over public network infrastructures. E. Value of automotive data There is a broad range of possible benefits enabled by Big Automotive Data services. New aftermarket services and product features can be designed based on information resources generated from data captured from connected vehicles, aggregated and analyzed using cloud based data processing and management services. This includes predictive and preventive maintenance services, various infotainment services, active safety and autonomous driving support services, to name just a few. In addition to novel aftermarket services, product development processes will benefit from access to large volumes of data captured from test vehicles as well as customer vehicles. This makes it possible to make better informed decisions in more or less all stages of the product development process, from the early concept development to testing, validation and verification. This is what we refer to as knowledge-driven product development, which is the focus of this paper and the main driving force behind the technological framework described in section IV. In addition to the value of Big Automotive Data in supporting knowledge-driven product development and as an enabler for new services, we can also identify a more direct value of the data itself. Automotive companies are increasingly exploring the opportunities of selling carefully selected and processed data sets to third parties. Potential customers of this kind of data sets are for instance road administration authorities, insurance companies and automotive e-service developers. With this perspective, Big Automotive Data becomes a new source of revenue for the automotive companies. F. Properties of Big Automotive Data As we have seen, the five V model of Big Data applies well in an automotive context, so Big Automotive Data can be loosely defined simply as Big Data for automotive applications. To probe a little deeper into the distinguishing characteristics of Big Automotive Data - to see what might make it different from other Big Data applications - we can immediately identify a few salient points. The data sources are typically mobile (i.e. moving vehicles), requiring wireless communication networks in many cases to collect data. Compared to many other Big Data applications, there is often (but not always) an emphasis on time-series data, typically originating from sensors connected to the in-vehicle communication buses (e.g. CAN bus signals). This is particularly pronounced for automotive testing and validation
4 applications. Generally, automotive data is characterized by a multitude of formats and data types and has therefore, as previously noted, a pronounced Variety dimension. Automotive applications that have a direct impact on the performance of vehicles are generally very concerned with safety issues. Automotive technology and systems in general have a very strong safety focus, and this is reflected also for Big Automotive Data applications. IV. BAUD: A BIG DATA FRAMEWORK FOR AUTOMOTIVE TELEMATICS AND ANALYTICS To explore the opportunities of leveraging Big Automotive Data for knowledge-driven product development, we have developed a prototype telematics and analytics framework in the BAuD project. The design of the BAuD framework is based on a number of use cases identified at Volvo Car Corporation, with an emphasis on R&D use cases. The core functionality developed so far is concerned with capturing and analyzing data from vehicles for use in the product development process. The hypothesis is that Big Automotive Data is a resource that can be exploited to improve product quality and reduce time to market. Although this is the initial focus of the BAuD framework, we also foresee use cases where the information resources are exploited for novel aftermarket services. A. BAuD Applications For a typical R&D use case, the BAuD framework is used as follows. A stakeholder in the automotive development process has a specific question regarding how the product is used or how some subsystem is performing. To gain knowledge about the particular phenomenon of interest, an engineer designs a measurement task, defining what data will be captured, and then assigns this measurement task to be executed in a fleet of test vehicles. The measurement assignment is uploaded to a server and scheduled for download to the target vehicles. In parallel, an analytics task is defined, describing what kind of analysis will be performed on the data collected by the corresponding measurement task. Depending on the type of analysis requested, the result of the analytics task can be different kinds of visualizations or reports, assembled and made available to the users through a web-based user interface. B. BAuD Framework Architecture The BAuD framework is a complex technological platform designed to support the applications discussed above, with a specific design goal of being flexible and extensible to cater for new needs and novel applications. Moreover, the framework has been designed for scalability to large numbers of connected vehicles and large volumes of captured data. Although the primary focus is heavily on preproduction test vehicles and R&D use cases, the emphasis on scalability will allow also aftermarket use cases of millions of connected vehicles to be supported by the platform. A schematic overview of the architecture is shown in Fig. 1. The core components of the BAuD framework are: A telematic service platform allowing wireless access to data from vehicles in use, A cloud-based back-end infrastructure, including application programming interfaces to provide controlled access to the information resources and framework services, A Task Manager, handling the execution of measurement and analytics services, A Data Broker mechanism, handling the relay of data from data sources to data sinks, An analytics service architecture, enabling automated data-driven analysis of data originating from connected vehicles, A web-based user interface front-end. For the main BAuD use case described above, the Task Manager component of the architecture shown in Fig. 1 keeps track of all measurement and analytics tasks, and configures the data broker to forward the data to the proper analytics service as it is uploaded from the vehicles of the fleet. The telematic service layer handles the capture and upload or streaming of data from connected vehicles to the back-end infrastructure. The presentation layer and user interface components provide the means by which the endusers and administrators of the system access the framework services. Fig. 1. BAuD framework architecture C. BAuD Telematics Framework The core component of the telematics framework is a Linux-based data capture and communication unit installed in vehicles. The unit executes measurement tasks that support data capture both by passive in-vehicle communication bus monitoring and active diagnostic services. The captured data is uploaded to the cloud-based infrastructure using 2G, 3G or 4G wireless mobile data
5 communication. The telematics unit also provides additional services such as GPS-based positioning. The telematics software architecture is designed in a modular fashion with a strong emphasis on portability, for the purpose of stepwise integration into future production vehicle ECU architectures. This means that a common telematic service platform for both aftermarket services and R&D services will be possible to realize. There is also a strong emphasis on security in the design, to prevent malicious unauthorized access to both in-vehicle information sources and the cloud infrastructure. Since mobile wireless data communication can be expensive, and since the available bandwidth in many parts of the world is still very limited, data compression techniques are used to reduce the required network bandwidth. To some extent there is also data reduction due to onboard preprocessing analytics. These mechanisms are mainly statistical processing functions, like histogram generation, which can be performed directly in the telematics units, before data is uploaded. The communication architecture can handle both bulk upload of data and streaming of data without the need to store it on the solid state disks of the telematics units. For most data capture services, the data is stored to disk while the measurement assignment is running, and then preprocessed and uploaded when the ignition of the vehicle is turned off. (The triggering of data upload is configurable, but the most common trigger is ignition off.) The streaming mode is used for time-sensitive applications, such as positioning services where it is important to show the current location of moving vehicles. D. BAuD Analytics Framework The analytics framework of the BAuD architecture is based on a data-driven approach, whereby data sets uploaded from connected vehicles are automatically analyzed based on an analytics task definition, and the result of the analysis is incrementally refined and stored in a knowledge base. Typically, the measurement tasks and the analytics tasks are designed in concert, since the input signals referenced in the analytics task must be the result of the corresponding measurement task. The core of the analytics framework implemented so far is concerned with analysis of CAN bus signals and diagnostic data, such as Diagnostic Trouble Codes, originating from the measurement assignments executed in the in-vehicle telematics units. The captured data sets are uploaded using telematics services and accessed by the analytics framework in the in form of MDF files. MDF (Measurement Data Format) is a standardized binary data format widely used in the automotive industry, which supports trigger events, data conversion formulas, and sample reduction. It provides a very compact representation of time series data (compared to e.g. text-based formats) and as such is highly suitable for the data capture and telematics services. However, due to the binary representation and the need for data conversion, extracting signal values from MDF files requires non-negligible processing resources and can therefore be time consuming for large files. Specifically, when a large number of files need to be processed for one analysis, which is often the case, we see the need for parallel processing in a distributed computing environment to achieve scalability and performance. 1) Scaling the analytics framework with Spark As our investigation in section III reveals, the volumes of data collected from connected vehicles can potentially be very large, which calls for a distributed and scalable approach to both storage and data processing. To achieve this, we have chosen Apache Spark as the Big Data processing platform for BAuD, considering its superiority over Hadoop MapReduce in case of machine learning and iterative data mining applications. The architecture is based on MLlib, Spark's machine learning library, and SparkR, which is R for Spark, to perform analytics operations requested by the user. With this approach, the Automotive Data Broker submits MDF files into a HDFS file system, which is Hadoop's distributed file system, as a data source for Spark. Since Spark operates by RDD - Resilient Distributed Datasets - a mapping function is required to convert a set of MDF files into a set of RDDs. In our design, a set of MDF files is defined by a set of vehicles identified by VIN (Vehicle Identification Number), a time range ( t), and a set of CAN signals (S 1 to S n ). When the analysis assignment returns, the results are submitted to the presentation layer of the system. The conceptual design is illustrated in Fig. 2. Fig. 2. BAuD Scalable Analytics Framework Architecture E. Presentation layer / User interface To interact with the framework/system a user interface and an administrative infrastructure is needed. In this particular case we have chosen to use a web based user interface. The rationale behind this decision is based on several factors, including end-user familiarity with webbased interfaces, simplified remote access to the system in presence of corporate firewalls, and simplified software maintenance. Another strongly contributing factor is the
6 multitude of frameworks and toolkits available to build web front ends, which speeds up user interface development and promotes reusability of code. For this reason we have chosen the framework SmartGWT, which is based on Google Web Toolkit, an open source set of tools supporting development and deployment of complex JavaScript front-end applications in Java. For users or administrators of the system to be able to manage a large number of vehicles, some user-interface mechanism is needed to handle these data sources in an aggregated form. Therefore, we have introduced the concept of resource groups. A resource group is a way to aggregate a set of data sources to be handled in a uniform manner through the user interface. In the example of configuring the telematic units of a set of vehicles, the administrator simply manipulates the configuration of the resource group and when saving the configuration it is fanned out to all vehicles in the resource group. Another example is when a user is about to start a new measurement task on a set of vehicles. Instead of selecting the individual vehicles, a resource group is selected and the system makes sure that the measurement task is propagated to each vehicle of the group. The notion of resource groups has been found to be a very powerful abstraction, which can be used throughout almost all user interface parts of the system. Yet another effect of resource groups is that grouping of data sources can be used to handle a set of vehicles based on geographical location, vehicle model, target market or other selection criteria. This way, the administration of vehicle fleets can be separated from the administration of measurement tasks. F. Scalability issues As the number of data sources (i.e. the number of vehicles) grows, there are a number of scalability challenges to address in the design of the BAuD framework. The main server-side approach to handle the large volumes of data produced by measurement tasks in a large fleet of vehicles is to scale out the corresponding processing and analytics tasks over a cluster of computational units using a distributed computing framework. To improve the scalability of the communication architecture, the upload of data from vehicles is designed using a multi-stage approach, whereby the data is uploaded to different access network servers depending on the geographical location of the vehicle. Data is then successively aggregated by the broker into the core of the cloud-based server architecture. The downloading of measurement tasks and configuration data to vehicles is also performed using a hierarchical, multi-tiered architecture. Apart from technological scalability issues, we have also identified many usability and user interface related scalability challenges. The approach we have employed to allow users to manage large numbers of vehicles and other resources in a scalable and efficient way is based on aggregation of similar resources into resource groups. V. CONCLUSIONS AND FUTURE DIRECTIONS In this paper we have explored the opportunities and challenges of leveraging Big Automotive Data for knowledge-driven product development. The BAuD framework, a scalable and efficient Big Automotive Data platform including integrated telematics and analytics services, is currently being evaluated in two case studies conducted at Volvo Car Corporation. The two case studies are focused on active safety development and battery performance for hybrid vehicles respectively. As part of our future work we will explore how the BAuD framework can be extended to capture not only objective measurement data from connected vehicles, but also subjective usage information from customers. We intend to do this by designing a smartphone app, which will allow specialized questionnaires to be presented to selected customers, to capture the usage experience and feed the subjective data into the BAuD analytics framework. ACKNOWLEDGMENT This work was co-funded by VINNOVA, the Swedish Governmental Agency for Innovation Systems. REFERENCES [1] M. Johanson, "Information and Communication Support for Automotive Testing and Validation," in M. Chiaberge (ed.) "New Trends and Developments in Automotive System Engineering," INTECH, ISBN , January [2] M. Johanson, L. Karlsson, "Improving Vehicle Diagnostics through Wireless Data Collection and Statistical Analysis," IEEE International Symposium on Wireless Vehicular Communications, Baltimore, MD, USA, September October [3] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, Dynamo: Amazon s highly available key-value store, In SOSP, pp , [4] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, Bigtable: A Distributed Storage System for Structured Data, In OSDI, pp , [5] J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, In OSDI, pp , [6] The Apache Hadoop Project, [7] A. Abouzeid, K. B. Pawlikowski, D. J. Abadi, A. Rasin, and A. Silberschatz, HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, PVLDB, 2(1): , [8] Prajapati. V. Big Data Analytics with R and Hadoop, Packt Publishing, ISBN : X, November [9] The Apache Spark Project, [10] D. Laney, 3D Data Management: Controlling Data Volume, Velocity and Variety, Application Delivery Strategies, META Group, February, [11] Gartner Big Data definition,
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
USC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].
Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
INTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.
Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER
BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER TABLE OF CONTENTS INTRODUCTION WHAT IS BIG DATA?...
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Slave. Master. Research Scholar, Bharathiar University
Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions
Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
Big Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Data Challenges in Telecommunications Networks and a Big Data Solution
Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
Rotorcraft Health Management System (RHMS)
AIAC-11 Eleventh Australian International Aerospace Congress Rotorcraft Health Management System (RHMS) Robab Safa-Bakhsh 1, Dmitry Cherkassky 2 1 The Boeing Company, Phantom Works Philadelphia Center
NoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION
TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION AN ARRIS WHITE PAPER BY: BHAVAN GANDHI, ALFONSO MARTINEZ- SMITH, & DOUG KUHLMAN TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION INTERSECTION OF TV & BIG DATA...
What is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon [email protected] [email protected] XLDB
A survey of big data architectures for handling massive data
CSIT 6910 Independent Project A survey of big data architectures for handling massive data Jordy Domingos - [email protected] Supervisor : Dr David Rossiter Content Table 1 - Introduction a - Context
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
A Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM
MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer
How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS
INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS Koyyala Vijaya Kumar 1, L.Sunitha 2, D.Koteswar Rao
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
Scalable Multiple NameNodes Hadoop Cloud Storage System
Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai
Massive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Big Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL
A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL *Hung-Ming Chen, Chuan-Chien Hou, and Tsung-Hsi Lin Department of Construction Engineering National Taiwan University
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
Hadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
Navigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015
Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
From Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
Search and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
A Study on Data Analysis Process Management System in MapReduce using BPM
A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,
Big Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
Generic Log Analyzer Using Hadoop Mapreduce Framework
Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
Big Data Analytics in LinkedIn. Danielle Aring & William Merritt
Big Data Analytics in LinkedIn by Danielle Aring & William Merritt 2 Brief History of LinkedIn - Launched in 2003 by Reid Hoffman (https://ourstory.linkedin.com/) - 2005: Introduced first business lines
Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
Approaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] This paper
An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
How To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
Convergence of Big Data and Cloud
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
How To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
Hadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
