Research Statement Immanuel Trummer

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Research Statement Immanuel Trummer www.itrummer.org"

Transcription

1 Research Statement Immanuel Trummer We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses on optimization and planning problems that arise in the context of big data analytics and data science. I generally apply a broad portfolio of techniques in order to tackle those problems, ranging from approximation algorithms over massive parallelization to quantum computing. The latter research branch is based on a grant giving me access to a D-Wave adiabatic quantum annealer. Beyond optimization, I am interested in text mining and machine learning approaches that allow us to extract novel insights from large data sets. I have recently completed a research project in that space in collaboration with several researchers from Google Mountain View. The primary goal of my research is to make big data analytics more efficient and to save users from daunting configuration and tuning tasks. 1 Dissertation Work My dissertation focuses on generalizations of the classical query optimization problem that address the specific context of big data analytics. Those generalizations lead to excessively hard optimization problems. When I started working in this area, existing approaches required hours to solve a single problem instance. In my dissertation, I have developed various methods that make those problems tractable. In order to achieve that, I also experimented with several novel techniques, namely massive parallelization and quantum computing, that had never been used before for query optimization or related problems. The goal of query optimization is to map a declarative query (describing the data to generate) into an optimal query plan (describing how to generate the data). Many recently released systems for big data analytics (e.g., Hive, Google s BigQuery, Facebook s Presto, or Pivotal s Greenplum) allow users to analyze data via declarative query languages. All those systems must solve the query optimization problem, and my dissertation work is therefore relevant to all of them. The query optimization problem has been intensively studied in my community. Nearly all work on query optimization assumes however that alternative query plans are compared according to a single cost metric. This model is appropriate for desktop analytics but not for big data analytics. Large data sets are often analyzed in the Cloud, using approximate processing techniques to reduce the computational burden. In this context, execution fees and result precision or recall become important cost metrics in addition to run time. Cloud providers need for instance to consider tradeoffs between the amount of system resources dedicated to a single user and the performance perceived by that user. Query optimization is therefore a multi-objective optimization problem in the context of big data analytics. Current query optimizers do not consider multiple cost metrics in a principled fashion. This forces users to explore many related options in a trial and error fashion. From my own experiences with big data analytics, for instance during a recent internship at Google Mountain View, I can say that this is a painful process. During that internship, I had to tune the execution of a workflow processing large amounts of data that consisted mainly of binary join operations. I had to choose the join order, the amount of system resources dedicated to execution, and sample sizes for several input data sets. All those decisions were dependent upon each other and led to different tradeoffs between execution time, resource consumption, and output quality. Having a query optimizer that is able to automatically find optimal tradeoffs between those cost metrics would have saved me a lot of time. It would also have led to better decisions as I was certainly never able to find optimal cost tradeoffs. Building such an optimizer requires however efficient algorithms for multi-objective query optimization. Such algorithms did not exist prior to my dissertation. At the start of my dissertation, there was only one query optimization algorithm that would have been generic enough to consider all relevant cost metrics in big data analytics. This algorithm had not been experimentally evaluated prior to my PhD. I integrated that algorithm into the Postgres database system and tested it on standard queries. Even the optimization of relatively simple queries took several hours. This algorithm was therefore not suitable to be used in a real system. During my dissertation, I have explored various approaches to make query optimization with multiple cost metrics practical. I have also explored novel approaches (e.g., massive parallelization and quantum computing) to make classical query optimization and related optimization problems more efficient. Figure 1 shows an overview of my work in that space. I discuss other research on large-scale text mining and machine learning at the end of this section. I developed approximation schemes for multi-objective query optimization that allow users to gradually relax optimality guarantees to decrease optimization time [2]. They represent a sweet spot between the aforementioned algorithm (which guarantees to find an optimal plan but has prohibitive optimization time) and pure heuristics (which can produce arbitrarily bad query plans in the worst case). It turns out that we can find guaranteed near-optimal plans in seconds where finding guaranteed optimal plans takes hours. Later, I have proposed an incremental algorithm that is based on those approximation schemes [3]. The latter algorithm divides approximate optimization into many small incremental computation steps. Users can integrate their feedback after each step to guide the optimizer quickly towards interesting 1

2 User Pre-Computed Solutions [4, 8, 9] Interactive Optimization [3] Optimizer Multi-Objective Approximation Schemes [2] Multi-Objective Randomized Algorithms [7] Optimization Platform Probing on Data Samples [5]... Massively Parallel Optimization [11] Quantum Computing [10] Linear Programming [6] Figure 1: Overview of my dissertation work with corresponding publication references. parts of the search space. The main goal of that algorithm is to enable interfaces that let users find their preferred execution cost tradeoff in an interactive process. Both of the aforementioned approaches make multi-objective optimization practical by reducing optimization time. Another possibility is to move optimization before run time. Optimization time may be high, but the time constraints for optimization are relaxed. Moving optimization before run time is possible if queries correspond to a query template that is known before run time. I have presented an algorithm that calculates all plans realizing optimal execution cost tradeoffs for each possible instantiation of a given query template [4, 8, 9]. This requires us to consider multiple parameters and multiple cost metrics during optimization. This problem model unifies and generalizes most previously proposed problem variants in query optimization. The publication in which I introduce the problem and propose a corresponding algorithm was selected as ACM SIGMOD Research Highlight 2015 and invited into the Best of VLDB 2015 special issue of the VLDB Journal. A video recording of a talk in which I describe the three aforementioned approaches in greater detail can be found online at All approaches proposed so far work well for medium-sized queries. They are not sufficiently efficient to handle large queries with huge search spaces. To deal with such queries, I have introduced a randomized algorithm for multi-objective query optimization [7]. This algorithm exploits several specific properties of the query optimization problem and outperforms general-purpose algorithms for multi-objective optimization significantly. The drawback of this algorithm is the lack of worst-case guarantees on the quality of the generated plans. In order to treat large queries while maintaining formal quality guarantees, we can exploit a particularity of big data analytics platforms: they have to be massively parallel. If we use that parallelism for query execution, why shouldn t we use it for optimization as well? Parallel algorithms for query optimization have been proposed prior to my dissertation. Those prior algorithms are however only able to exploit very moderate degrees of parallelism. They employ a finegrained problem decomposition method that requires parallel optimizer threads to share intermediate results. This leads to huge communication overhead when used in the shared-nothing architectures that are typical for large-scale analytics platforms. I proposed a radically different parallelization method that decomposes the search space in the coarsest possible way [11]. The search space is divided into a number of equal-sized partitions that corresponds to the number of optimizer threads. Those partitions can be searched independently without communication between different threads. I was able to show that this approach can parallelize query optimization over large clusters with hundreds of nodes. The decomposition method is not limited to multi-objective query optimization but can be used for classical query optimization and many other variants as well. Quantum computing can be seen as a different form of parallelization (this intuition is of course highly simplifying). Quantum computers leverage quantum physics to speed up computation. They operate on qubits instead of bits. Qubits can be in a superposition of states (1 and 0) that would be considered mutually exclusive according to the laws of classical physics. Operating on qubits allows quantum computers to explore multiple computational paths in parallel. I obtained a research grant giving me access to the first commercially available machine that is claimed to 2

3 exploit quantum physics to solve hard optimization problems: the D-Wave adiabatic quantum annealer. So far I have used that machine to solve the multiple query optimization problem [10]. This is a query optimization variant whose goal it is to merge query plans of different tenants into a globally optimal plan, taking into account possibilities to share computation between different tenants. This optimization problem is highly relevant to big data analytics where multiple users often analyze the same centrally stored data set. My research led to the first experimental paper on quantum computing in the database community. The main contribution in that paper is a mapping algorithm that translates multiple query optimization instances into strength values for magnetic fields on and between the qubits of the quantum annealer. Using the quantum annealer is indeed challenging, and many research problems need to be solved before quantum computing becomes useful for analytics optimization. I discuss some of them in the next section. In my work on quantum computing, I developed a problem transformation that allows us to leverage a sophisticated hardware solver. In another work stream, I am currently developing approaches that enable us to leverage software solvers for query optimization. More precisely, I transform query optimization instances into mixed integer linear programming (MILP) problems. My first results show that such solvers can treat significantly larger search spaces than classical query optimization algorithms [6]. This is not surprising since MILP solvers have very steadily improved their performance (hardware independently) over the past twenty years. By linking query optimization to MILP, we will benefit from all future advances in this highly fruitful research domain. Most of my work in query optimization addresses the challenge of having large search spaces and many cost metrics. My most recent (ongoing) work in this area addresses a complementary problem: the problem of missing information. Analytics queries nowadays often contain user-defined predicates that have to be treated as black boxes by the optimizer. Their properties can be estimated by evaluating them on data samples, but we need a formal framework to decide how much to sample and how to prioritize sampling. I recently introduced the probably approximately optimal query optimization framework [5] to guide such decisions. Evaluating predicates on data samples yields only confidence bounds on their properties. Confidence bounds on predicates translate into confidence bounds on the cost of alternative query plans. For that reason, I have argued that the goal of query optimization should be to find a plan whose cost is near-optimal with high probability. In ongoing work, I am exploring optimization approaches for that problem model. Beyond my work on analytics optimization, I have also developed new analytics applications. Together with researchers from Google Mountain View, I have developed a system that mines Web text to determine for knowledge base entities the subjective properties that the average user associates with them [1]. Knowing such associations enables us to answer Google queries containing subjective predicates (e.g., the query big cities in California ) from structured data. Mining subjective properties is challenging since we find many conflicting statements about the same entity. I developed an approach for learning user behavior models for specific entity types and properties in an unsupervised fashion. Those models are used to reliably infer the majority opinion from conflicting statements. I used that system to mine billions of subjective associations from terabytes of Web text. The mined associations match the opinions of a test user group in the majority of cases. This is not the case when using other state-of-the-art systems for mining objective properties. This shows that subjective property mining requires specialized systems. 2 Future Work One of my future research goals is to make data analysis more efficient. I intend to pursue two research directions that contribute towards that goal in different ways: First, I will study optimization approaches that allow us to make better use of current computer technology. Second, I plan to study the potential of an entirely new technology, namely quantum computing, for complex data analytics. Note that I have already taken the first steps in this direction, leading to the first experimental paper on quantum computing in the database community [10]. The first of those two research branches is targeted at immediate impact while the second one lays the foundations for adopting a novel technology for certain analysis tasks in the long term. Beyond optimization, I plan to build novel systems for knowledge extraction. In particular, I plan to start new projects based on my prior research on information extraction at Google Mountain View. The amount of data keeps growing while the evolution of classical computers is slowing down. This makes it interesting to explore new computational paradigms. Quantum computing seems currently like a promising technology to complement conventional computers for certain tasks in the future. Major IT companies such as Google and IBM have heavily invested in this technology. With the D-Wave adiabatic quantum annealer, the first device that exploits quantum mechanics to solve optimization problems of non-trivial size has recently appeared. Quantum annealing is a fast-evolving technology (e.g., the number of qubits has so far steadily doubled from one annealer model to the next) that suffers however from various restrictions. Those restrictions make it challenging to use quantum annealing in practice. It is unclear, if, how, for which problems, and within which time frame data analysis can benefit from quantum annealing and related technologies. I plan to answer those questions in my future research. I generally see two ways in which data analysis can benefit from quantum annealing. Either directly, by solving 3

4 certain analysis tasks that can be expressed as optimization problems on the quantum annealer (e.g., problems related to the training of binary classifiers, a key step in many data science applications, have already been solved via quantum annealing), or indirectly, by solving optimization problems that optimize the use of conventional computers for data analysis. The work that I have done so far in this area (solving multiple query optimization on the quantum annealer) falls into the second category. In my future work, I will consider the first possibility as well. Note that my research goal is not to improve quantum computing hardware. My perspective on quantum computing is that of a user. My goal is to find out how to exploit quantum computing technology that is available today (or will be available in the foreseeable future) for problems that are relevant to data analysis and to the database community. There is a large body of work in the database community describing problem transformations by which existing software solvers can be exploited for database-related optimization problems. My focus is similar in that I plan to develop problem transformations that allow using a very specific hardware solver (the adiabatic quantum annealer). A first research goal would be to identify and to characterize analytics-related optimization problems that can benefit from quantum annealing in the long term. This is certainly not the case for all problems: for many optimization problems, the transformation into the restrictive input format supported by current quantum annealers causes excessive blowups in the problem representation size. Second, I plan to investigate decomposition methods that divide optimization problems into smaller sub-problems that can be represented with the available number of qubits. I believe that such methods are required since the number of qubits is still very limited on current machines (the annealer I was experimenting with had around 1000 qubits). Furthermore, I plan to develop problem-specific mapping methods that efficiently map problem instances to the restrictive input format supported by the quantum annealer. Finding a problem mapping that consumes the minimal number of qubits is in the general case an NP-hard optimization problem. As I have shown in the case of multiple query optimization, it is however possible to find efficient, problem-specific transformation algorithms that achieve asymptotically optimal qubit counts. Finally, my experimental results for the multiple query optimization problem show that the relative performance of the quantum annealer compared to conventional computers varies significantly even between different instances of the same optimization problem. I therefore believe that the first systems to exploit quantum computing in the future will be hybrids that exploit a combination of conventional and quantum computing. Finding out how to combine those two computational paradigms in the best way (e.g., finding out how to decide for specific problem instances which one of them to use) is therefore another important research challenge in this domain. I plan to integrate some of my research in this space into the first prototypical implementation of a hybrid classicalquantum analytics engine that uses a combination of conventional computing and quantum computing to speed up data analysis. I believe that building such a system is required in order to answer many questions relating to the practical long-term potential of quantum annealing and related technologies for large-scale data analysis. Note finally that I currently have a grant giving me access to a quantum annealer. I might apply for additional grants in the future. In each case, I do not expect my future university to provide me with access to such a machine. In principle, it would even be possible to gain insights using a simulated annealer instead of a real machine within the system I described before. Quantum computing has the potential to make complex data analysis more efficient in the long term. I believe that advanced optimization methods can help us to better exploit our current technology in the short and medium term. In my dissertation, I have experimented with several novel techniques to solve classical analytics-related optimization problems. In particular, I have shown that massive degrees of parallelism (which are typical for big data analytics platforms) can be exploited to make query optimization more efficient. In the future, I plan to use those techniques to address novel optimization problems that we have not even dared to address so far. In particular, I plan to treat classical analytics-related optimization problems in more fine-grained search spaces than before. In query optimization, we would for instance typically generate query plans that consist of standard operators. If we increase the resolution and consider the sub-functions or even the instructions that implement those operators then we discover optimization potential that is not visible in the more coarse-grained perspective. This idea connects to approaches for synthesizing low-level code for single operators via cost-based optimization that have recently appeared in the database community. The optimization methods that are currently used require several minutes to synthesize a single operator on a standard computer. Generating entire query plans at run time will require fundamentally different optimization methods. I believe that some of the techniques that I used for classical query optimization could be helpful in that context. Instead of increasing resolution, we might also zoom out and treat analytics-related optimization problems in a more holistic fashion than nowadays. There are many optimization problems that we currently break into sub-problems to reduce optimization overhead (e.g., breaking query optimization into planning and scheduling decisions, considering different tenants independently in a multi-tenant system). Breaking up optimization problems often means that we lose formal guarantees to find optimal solutions to the original problem. Having more powerful optimization methods, we can afford not to make some of those compromises anymore. Independently of the search space, the optimizer needs to estimate the cost of alternative processing plans to compare alternative solutions. This is increasingly difficult. Queries include nowadays user-defined functions and predicates that 4

5 can be implemented by complex code, by calls to external services, or even by calls to human crowd workers. In many of those cases, it is impossible for the optimizer to estimate the properties of such operators via static analysis. We need to think of novel ways in which the optimizer can obtain information about them. Evaluating such operators on data samples is one possibility. Another possibility is to make optimization an interactive process in which the optimizer asks well-targeted questions to the user about specific operators. In each case, we need formal frameworks that prioritize information collection and weigh the cost of collecting additional information against the risk of choosing sub-optimal plans due to missing information. I have recently started to develop formal frameworks that enable us to model such scenarios. In the future, I plan to explore novel interaction models between user and optimizer and novel optimization methods that can deal with high degrees of uncertainty. In particular, I plan to study methods from the area of reinforcement learning. This area offers a rich set of approaches for prioritizing information collection that could be helpful for query optimization and related problems. Query optimization is just the tip of the iceberg. There are various optimization problems that relate to efficiency in data analytics. In order to name just a few examples, we need to decide where to store data, how to store data, and which auxiliary index structures to create. Similar to query optimization, many of those optimization problems need to be revisited in light of the specific challenges (e.g., having multiple cost metrics) and opportunities (e.g., having massive degrees of parallelism) that characterize the context of big data analytics. I plan to do so in my future research. Beyond my research on optimization, I am interested in developing novel systems for knowledge extraction. For instance, I plan to start a new project based on the research that I did at Google Mountain View. We nowadays have large knowledge bases describing objective properties of various objects. As a result of my research at Google, we also have large knowledge bases containing subjective property associations. Combining both data sets would lead to interesting insights about the semantics of subjective properties. In principle, it is possible to infer the semantics of many subjective properties by correlating information about subjective properties with information about objective properties. For instance, by manually correlating the output of the workflow I developed at Google with objective properties, I was able to infer that cities in California are commonly called big starting from a population size of around 250,000. How to generate such insights automatically and reliably at a very large scale is an interesting research question. We would have to identify subjective properties that relate to objective properties and, for a given subjective property, we would have to identify the most relevant group of objective properties. A threshold is an example of a simple dependency while more complex relationships are possible. By restricting the input corpus to Web content generated in specific regions, we could infer localized semantics. This might lead to interesting insights about cultural differences. Having translated subjective properties into conditions on objective properties, we will also be able to associate subjective properties to entities based on their objective properties alone. I believe that this approach yields subjective associations of a higher quality than when mining them directly from Web text. References [1] I. Trummer, A. Halevy, H. Lee, S. Sarawagi, and R. Gupta. Mining subjective properties on the Web. In SIGMOD, pages , Talk Recording: [2] I. Trummer and C. Koch. Approximation schemes for many-objective query optimization. In SIGMOD, pages , [3] I. Trummer and C. Koch. An incremental anytime algorithm for multi-objective query optimization. In SIGMOD, pages , Talk Recording: [4] I. Trummer and C. Koch. Multi-objective parametric query optimization. VLDB, 8(3): , Talk Recording: [5] I. Trummer and C. Koch. Probably approximately optimal query optimization Url: [6] I. Trummer and C. Koch. Solving the join ordering problem via mixed integer linear programming Url: [7] I. Trummer and C. Koch. A fast randomized algorithm for multi-objective query optimization. In SIGMOD, [8] I. Trummer and C. Koch. Multi-objective parametric query optimization. ACM SIGMOD Research Highlights, Currently Invited. [9] I. Trummer and C. Koch. Multi-objective parametric query optimization. VLDB Journal, Currently Invited. [10] I. Trummer and C. Koch. Multiple query optimization on the D-Wave 2X adiabatic quantum computer. In VLDB, Conditionally Accepted. Preprint: [11] I. Trummer and C. Koch. Parallelizing query optimization on shared-nothing architectures. In VLDB, Conditionally Accepted. Preprint: 5

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Topics in basic DBMS course

Topics in basic DBMS course Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

How Can I Deliver Innovative Customer Services Across Increasingly Complex, Converged Infrastructure With Less Management Effort And Lower Cost?

How Can I Deliver Innovative Customer Services Across Increasingly Complex, Converged Infrastructure With Less Management Effort And Lower Cost? solution brief September 2012 How Can I Deliver Innovative Customer Services Across Increasingly Complex, Converged Infrastructure With Less Management Effort And Lower Cost? Converged Infrastructure Management

More information

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) large-scale machine learning revisited Léon Bottou Microsoft Research (NYC) 1 three frequent ideas in machine learning. independent and identically distributed data This experimental paradigm has driven

More information

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications

More information

WORKFLOW ENGINE FOR CLOUDS

WORKFLOW ENGINE FOR CLOUDS WORKFLOW ENGINE FOR CLOUDS By SURAJ PANDEY, DILEBAN KARUNAMOORTHY, and RAJKUMAR BUYYA Prepared by: Dr. Faramarz Safi Islamic Azad University, Najafabad Branch, Esfahan, Iran. Workflow Engine for clouds

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

The D-Wave 2X Quantum Computer Technology Overview

The D-Wave 2X Quantum Computer Technology Overview The D-Wave 2X Quantum Computer Technology Overview D-Wave Systems Inc. www.dwavesys.com D-Wave Systems Founded in 1999, D-Wave Systems is the world s first quantum computing company. Our mission is to

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Why is SAS/OR important? For whom is SAS/OR designed?

Why is SAS/OR important? For whom is SAS/OR designed? Fact Sheet What does SAS/OR software do? SAS/OR software provides a powerful array of optimization, simulation and project scheduling techniques to identify the actions that will produce the best results,

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

More information

Data Mining and Database Systems: Where is the Intersection?

Data Mining and Database Systems: Where is the Intersection? Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap

More information

Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project

Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project This document describes the optional class project for the Fall 2013 offering of CS191x. The project will not be graded.

More information

Auto-Classification for Document Archiving and Records Declaration

Auto-Classification for Document Archiving and Records Declaration Auto-Classification for Document Archiving and Records Declaration Josemina Magdalen, Architect, IBM November 15, 2013 Agenda IBM / ECM/ Content Classification for Document Archiving and Records Management

More information

On some Potential Research Contributions to the Multi-Core Enterprise

On some Potential Research Contributions to the Multi-Core Enterprise On some Potential Research Contributions to the Multi-Core Enterprise Oded Maler CNRS - VERIMAG Grenoble, France February 2009 Background This presentation is based on observations made in the Athole project

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

BUSINESS RULES AND GAP ANALYSIS

BUSINESS RULES AND GAP ANALYSIS Leading the Evolution WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Discovery and management of business rules avoids business disruptions WHITE PAPER BUSINESS RULES AND GAP ANALYSIS Business Situation More

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Building a Database to Predict Customer Needs

Building a Database to Predict Customer Needs INFORMATION TECHNOLOGY TopicalNet, Inc (formerly Continuum Software, Inc.) Building a Database to Predict Customer Needs Since the early 1990s, organizations have used data warehouses and data-mining tools

More information

Prescriptive Analytics. A business guide

Prescriptive Analytics. A business guide Prescriptive Analytics A business guide May 2014 Contents 3 The Business Value of Prescriptive Analytics 4 What is Prescriptive Analytics? 6 Prescriptive Analytics Methods 7 Integration 8 Business Applications

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

Preview of Award 1320357 Annual Project Report Cover Accomplishments Products Participants/Organizations Impacts Changes/Problems

Preview of Award 1320357 Annual Project Report Cover Accomplishments Products Participants/Organizations Impacts Changes/Problems Preview of Award 1320357 Annual Project Report Cover Accomplishments Products Participants/Organizations Impacts Changes/Problems Cover Federal Agency and Organization Element to Which Report is Submitted:

More information

agility made possible

agility made possible SOLUTION BRIEF Converged Infrastructure Management from CA Technologies how can I deliver innovative customer services across increasingly complex, converged infrastructure with less management effort

More information

Load Balancing in Distributed Data Base and Distributed Computing System

Load Balancing in Distributed Data Base and Distributed Computing System Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located

More information

Optimization applications in finance, securities, banking and insurance

Optimization applications in finance, securities, banking and insurance IBM Software IBM ILOG Optimization and Analytical Decision Support Solutions White Paper Optimization applications in finance, securities, banking and insurance 2 Optimization applications in finance,

More information

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago

The Theory And Practice of Testing Software Applications For Cloud Computing. Mark Grechanik University of Illinois at Chicago The Theory And Practice of Testing Software Applications For Cloud Computing Mark Grechanik University of Illinois at Chicago Cloud Computing Is Everywhere Global spending on public cloud services estimated

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Customized Report- Big Data

Customized Report- Big Data GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

npsolver A SAT Based Solver for Optimization Problems

npsolver A SAT Based Solver for Optimization Problems npsolver A SAT Based Solver for Optimization Problems Norbert Manthey and Peter Steinke Knowledge Representation and Reasoning Group Technische Universität Dresden, 01062 Dresden, Germany peter@janeway.inf.tu-dresden.de

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao Requirements Analysis Concepts & Principles Instructor: Dr. Jerry Gao Requirements Analysis Concepts and Principles - Requirements Analysis - Communication Techniques - Initiating the Process - Facilitated

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Questions to be responded to by the firm submitting the application

Questions to be responded to by the firm submitting the application Questions to be responded to by the firm submitting the application Why do you think this project should receive an award? How does it demonstrate: innovation, quality, and professional excellence transparency

More information

Autonomic computing: strengthening manageability for SOA implementations

Autonomic computing: strengthening manageability for SOA implementations Autonomic computing Executive brief Autonomic computing: strengthening manageability for SOA implementations December 2006 First Edition Worldwide, CEOs are not bracing for change; instead, they are embracing

More information

ANALYTICS IN BIG DATA ERA

ANALYTICS IN BIG DATA ERA ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut

More information

Network-Wide Class of Service (CoS) Management with Route Analytics. Integrated Traffic and Routing Visibility for Effective CoS Delivery

Network-Wide Class of Service (CoS) Management with Route Analytics. Integrated Traffic and Routing Visibility for Effective CoS Delivery Network-Wide Class of Service (CoS) Management with Route Analytics Integrated Traffic and Routing Visibility for Effective CoS Delivery E x e c u t i v e S u m m a r y Enterprise IT and service providers

More information

Component visualization methods for large legacy software in C/C++

Component visualization methods for large legacy software in C/C++ Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University mcserep@caesar.elte.hu

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Big Data Optimization at SAS

Big Data Optimization at SAS Big Data Optimization at SAS Imre Pólik et al. SAS Institute Cary, NC, USA Edinburgh, 2013 Outline 1 Optimization at SAS 2 Big Data Optimization at SAS The SAS HPA architecture Support vector machines

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

The Rise of Industrial Big Data

The Rise of Industrial Big Data GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial

More information

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

A Case for Static Analyzers in the Cloud (Position paper)

A Case for Static Analyzers in the Cloud (Position paper) A Case for Static Analyzers in the Cloud (Position paper) Michael Barnett 1 Mehdi Bouaziz 2 Manuel Fähndrich 1 Francesco Logozzo 1 1 Microsoft Research, Redmond, WA (USA) 2 École normale supérieure, Paris

More information

Report Data Management in the Cloud: Limitations and Opportunities

Report Data Management in the Cloud: Limitations and Opportunities Report Data Management in the Cloud: Limitations and Opportunities Article by Daniel J. Abadi [1] Report by Lukas Probst January 4, 2013 In this report I want to summarize Daniel J. Abadi's article [1]

More information

GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"

GETTING REAL ABOUT SECURITY MANAGEMENT AND BIG DATA GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA" A Roadmap for "Big Data" in Security Analytics ESSENTIALS This paper examines: Escalating complexity of the security management environment, from threats

More information

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

More information

Cloud Management: Knowing is Half The Battle

Cloud Management: Knowing is Half The Battle Cloud Management: Knowing is Half The Battle Raouf BOUTABA David R. Cheriton School of Computer Science University of Waterloo Joint work with Qi Zhang, Faten Zhani (University of Waterloo) and Joseph

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Integrating Benders decomposition within Constraint Programming

Integrating Benders decomposition within Constraint Programming Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP

More information

A financial software company

A financial software company A financial software company Projecting USD10 million revenue lift with the IBM Netezza data warehouse appliance Overview The need A financial software company sought to analyze customer engagements to

More information

Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

BIG DATA THE NEW OPPORTUNITY

BIG DATA THE NEW OPPORTUNITY Feature Biswajit Mohapatra is an IBM Certified Consultant and a global integrated delivery leader for IBM s AMS business application modernization (BAM) practice. He is IBM India s competency head for

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Big Data and Healthcare Payers WHITE PAPER

Big Data and Healthcare Payers WHITE PAPER Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other

More information

ANALYTICS STRATEGY: creating a roadmap for success

ANALYTICS STRATEGY: creating a roadmap for success ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling

More information

Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks

Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de

More information

An Implementation of Active Data Technology

An Implementation of Active Data Technology White Paper by: Mario Morfin, PhD Terri Chu, MEng Stephen Chen, PhD Robby Burko, PhD Riad Hartani, PhD An Implementation of Active Data Technology October 2015 In this paper, we build the rationale for

More information

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data 101: Harvest Real Value & Avoid Hollow Hype Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information

More information

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)

Algorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven) Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)

More information

Fairness issues in new large scale parallel platforms.

Fairness issues in new large scale parallel platforms. Fairness issues in new large scale parallel platforms. Denis TRYSTRAM LIG Université de Grenoble Alpes Inria Institut Universitaire de France july 5, 25 New computing systems New challenges from e-science

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

6.2.8 Neural networks for data mining

6.2.8 Neural networks for data mining 6.2.8 Neural networks for data mining Walter Kosters 1 In many application areas neural networks are known to be valuable tools. This also holds for data mining. In this chapter we discuss the use of neural

More information

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS

CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 137 CHAPTER 8 CONCLUSION AND FUTURE ENHANCEMENTS 8.1 CONCLUSION In this thesis, efficient schemes have been designed and analyzed to control congestion and distribute the load in the routing process of

More information

MapReduce and Hadoop Distributed File System V I J A Y R A O

MapReduce and Hadoop Distributed File System V I J A Y R A O MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

High-performance local search for planning maintenance of EDF nuclear park

High-performance local search for planning maintenance of EDF nuclear park High-performance local search for planning maintenance of EDF nuclear park Frédéric Gardi Karim Nouioua Bouygues e-lab, Paris fgardi@bouygues.com Laboratoire d'informatique Fondamentale - CNRS UMR 6166,

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Challenges and Opportunities for formal specifications in Service Oriented Architectures

Challenges and Opportunities for formal specifications in Service Oriented Architectures ACSD ATPN Xi an China June 2008 Challenges and Opportunities for formal specifications in Service Oriented Architectures Gustavo Alonso Systems Group Department of Computer Science Swiss Federal Institute

More information

Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems

Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems paper:38 Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems Edson Ramiro Lucsa Filho 1, Ivan Luiz Picoli 2, Eduardo Cunha de Almeida 2, Yves Le Traon 1 1 University of Luxembourg

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles,

More information

Daniel J. Adabi. Workshop presentation by Lukas Probst

Daniel J. Adabi. Workshop presentation by Lukas Probst Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information