Big Data Analytics in a Connected World

Size: px
Start display at page:

Download "Big Data Analytics in a Connected World"

Transcription

1 Big Data Analytics in a Connected World Kurt Stockinger, Frank van Lingen, and Marco Valente Kurt Stockinger is an associate professor for the Zurich University of Applied Sciences, Switzerland. stog@zhaw.ch Frank van Lingen is a technology strategist for Cisco Systems International, Switzerland. fvanling@cisco.com Marco Valente is a technology strategist for Cisco Systems International, Switzerland. marcvale@cisco.com Abstract Enterprises often struggle with the design of big data architectures. Some argue that the traditional data warehouse is the silver bullet for solving most data-intensive enterprise problems; others claim that the new big data ecosystem based on Hadoop is a much better solution. In this article we discuss industry analytics use cases with a particular focus on the Internet of things (IoT), describe the challenges and opportunities of each case, and provide guidelines and best practices for choosing the most appropriate technology. Introduction Modern applications and the sensorization of society are driving unprecedented data growth and fueling the need for big data and analytics. Price/performance improvements in network, storage, and computing, together with the rise of cloud computing, make it more cost effective to deploy large IT infrastructures and capture large amounts of data. Another driver is the IoT. Cheaper sensors and improved connectivity are bridging the gap between the physical and digital worlds, enabling us to collect data from more devices and environments than ever before. We are now able to gather enormous amounts of data about almost everything, and it is challenging the conventional approach of analytics. Many enterprises are deploying (or at least evaluating) big data 1 technology in their efforts to become data-driven, meaning that decision making is based on data rather than feelings. Early excitement about big data technology especially among the most technology-savvy 1 Big data is a common term with several meanings. We refer here to the software and hardware infrastructure needed to handle more data, the increasing rate at which it is produced, and the increasing range of formats and representations employed. This is often referred to as volume, velocity, and variety. A detailed discussion on the different definitions appears in Ward and Barker (2013). 44 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

2 employees is often followed by unexpected disillusion once the highly praised technology does not fulfill its promises (such as enabling new business opportunities or improving scalability). An equal challenge for enterprises is determining the best time to begin and deploy a big data strategy. Technology changes rapidly, and the enterprise value of new technology is not always fully understood from the beginning. Several iterations between IT and business departments are required to develop promising use cases and sound business models. In the following sections, we present an overview of big data architectures, including traditional data warehouses, Hadoop-based technology, and stream processing systems. This is not an exhaustive discussion of all big data architectures. We have chosen a few key technologies that are currently popular. Next, we discuss several analytics use cases to demonstrate the potential of big data across different verticals, and we explore the relationship between use cases and the architectures. Finally, we advocate for a highly interoperable analytics platform that allows for seamless processing of structured and unstructured data as well as for batch and real-time data analysis. Big Data Architectures In this section we will discuss four approaches to handling big data workloads: Classical data warehouses Batch processing (e.g., Apache Hadoop) Real-time processing (e.g., Twitter Storm) Edge computing We also discuss a modern big data architecture that integrates both batch and real-time processing capabilities. The Classical Data Warehouse The classical approach for analyzing large amounts of structured data in an enterprise is the data warehouse. According to the Inmon architecture (Inmon, 2005), this system typically consists of three layers: the staging area, the integration layer, and the analysis layer (data marts), as depicted in Figure 1. Staging Integration Enrichment Analysis Staging Data Universe Data Marts Analysis Services Presentation Front End File Data Sources XML DB SOAP File... Historization I Historization II Applications: Aggregation, Calculation, Event Detection, Reusable Entities Selection, Aggregation, Calculation Reporting OLAP Mining Web/App Servers GUI Metadata Management Figure 1: A typical data warehouse architecture. BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 45

3 Data is ingested via ETL (extract, transform, and load) processing into the data warehouse in batches, with daily snapshots of various online systems. After ingestion, data quality checks (data types, fields, business keys, etc.) are performed in the staging area. The integration layer harmonizes the data sets from a variety of systems by loading the data into a common data model. In addition, data is historized, i.e., records with different time stamps are stored to allow historical data analysis (Ehrenmann et al., 2012). Finally, the actual analytics takes place in data marts, where data is physically reorganized to optimize query performance. The pros and cons of traditional data warehouse architectures are summarized in Table 1. PROS Established technology Large user community, midsize to large enterprises Enables an enterprisewide view of the business and enhances factbased decision making CONS Expensive to build and maintain Typically long development cycles from data source to final business intelligence report Building an enterprisewide data model requires a mature enterprise with backing from all involved business groups Table 1: Pros and cons of a traditional data warehouse architecture. Batch Processing Hadoop is an open source software framework for storing and processing petabytes of unstructured data across highly distributed commodity hardware. It is a proven, viable extension or alternative to more traditional relational databases and centralized file systems. map phase. Partial results are gathered and aggregated to compose the final output the reduce phase. One of the main advantages of Hadoop is that it makes handling data at extreme scale economical by utilizing commodity hardware. Unlike data warehouses, where data is first cleaned and integrated, Hadoop does not require data integration. The mantra is often: load first, think about how to analyze later. Hadoop provides different interfaces for analyzing data. The most basic one is based on the MapReduce paradigm, which allows data to be analyzed with common programming languages such as Java or Python. Users who want to analyze their data in a higher-level language can use Pig Latin a statement-based query language with SQL-like features that can be extended by user-defined functions. If a user wants to leverage data warehousing features, the best choice is Hive, which is even more similar to SQL. For more complex tasks such as finding clusters and correlations in log files of customers or applications the machine learning library Apache Mahout provides a rich set of algorithms. The Hadoop ecosystem shows good scalability for analyzing large data sets in batch mode. Adding more nodes to a problem typically reduces the processing time linearly by the number of nodes. However, the disadvantage of Hadoop is that it is not well suited for real-time processing due to the high latency of the system. Systems such as Impala have been introduced to overcome the latency problems. These systems trade off performance Hadoop consists of a wide selection of components for data modeling, processing, management, and development. The core pieces are HBase, a non-relational database to store large volumes of sparse data, and HDFS (Hadoop Distributed File System), a distributed file system optimized for very large data sets. On top of these runs Hadoop MapReduce (see Figure 2), a software framework for large-scale parallel processing across compute clusters. A MapReduce processing job is first split into subtasks to be executed in parallel on different nodes known as the Map Node 1 Reduce Figure 2: The Hadoop architecture. Hadoop Job Management and Scheduling HDFS Map Node N Reduce 46 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

4 against fault tolerance. For an overview of how various companies have integrated or plan to integrate Hadoop into data warehousing and business intelligence, see Russom (2013). The pros and cons of a Hadoop-based architecture are listed in Table 2. Real-Time Processing Systems such as Apache Storm are much better suited to processing large amounts of streaming data than are Hadoop-based systems. These systems were designed from the ground up for real-time processing of data streams rather than fault-tolerant batch processing. The advantage of these systems is that they can process much more data per second. However, Storm does not provide high-level query languages for data analysis such as Pig or Hive; a strong knowledge of programming languages such as Java or Python is required. See Figure 3 for a topology (data flow) for real-time processing of data streams in Storm. The figure shows two so-called spouts (i.e., data sources such as Twitter feeds or streams of stock prices) and various bolts (i.e., real-time processing units). The pros and cons of a Storm-based architecture are listed in Table 3. An interesting recent development is SAMOA (Scalable Advanced Massive Online Analysis), a distributed machine learning framework mining big data streams. PROS Highly scalable at low costs Flexible, reliable, and distributed Due to open source software, the total cost of ownership to build an analytics solution is significantly lower than a commercial BI/DW solution CONS Because it is a fairly new technology, Hadoop-knowledgeable workers may be hard to find No established enterprisewide security concept yet in place Technology is rapidly evolving, making it difficult for enterprises to choose the right moment for adoption Table 2: Pros and cons of a Hadoop-based architecture. Figure 3: Overview of real-time processing system with Apache Storm. Edge Computing With the rise of the IoT and increased distribution of data sources, systems will have to scale and integrate many more applications, devices, and data sources, deal with data in motion, and handle data with increased volume, velocity, and variety. Edge computing (applications, data, and services residing in, or extending into, the network rather than residing uniquely in a traditional cloud or data center) will, therefore, become more relevant. Real-time requirements (fast feedback or response to devices) and limited or ad hoc connectivity for example in cities (smart traffic lights), wind farms, agriculture, and energy production such as oil and gas will increase the need for filtering, forwarding, and processing data in the network or gateways (Bessis and Dobre, 2014). Data will be moving from the edge (sensors and devices) with filtering, analysis, triggering, transformation, and reduction performed along the way coming to rest in more traditional data sources such as data warehouses PROS Highly scalable architecture for processing streaming data Enabling technology for real-time applications CONS Currently no visual programming model available as in data flow systems or ETL tools Fairly new technology; lack of knowhow Table 3: Pros and cons of an Apache Storm-based architecture for real-time processing. Spout: Data Source Bolt: Realtime Process Data Flow BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 47

5 Decisions & Feedback or Hadoop clusters, hosted in clouds or data centers (Figure 4). Edge computing covers a wide range of technologies such as distributed data storage, mobile data acquisition, and fog computing, which extends the cloud computing paradigm to the edge of the network (Bonomi, 2012). Content delivery networks (CDNs) such as Akamai or Level3 are examples of edge computing we all frequently use, as content viewed in a Web browser is typically not retrieved from the data center or cloud but rather cached on so-called edge servers. The edge servers are globally distributed to ensure consistent low latency when you request a page. The use cases in this article focus on other roles edge computing can play in data management and processing. Table 4 lists the pros and cons of an edge-processing architecture. PROS In-motion At-rest Scalable data gathering across multiple gateway and network components The edge is closer to devices and suitable for low-latency feedback and automation toward devices, machines, etc. Optimizes bandwidth utilization by preventing unnecessary data transfers (especially important with limited connectivity) Business Decisions Cloud & data-center-based storage (warehouse, Hadoop, etc.) Edge (network, gateways, devices) People, Processes, Things CONS Real-Time Batch Figure 4: The role of edge processing within data processing. Processing should be simple, as it runs (within the context of the IoT) on relatively constrained devices Early stage technology Table 4: Pros and cons of an edge-processing architecture. Data & Insight A Modern Big Data Architecture To process a variety of data sets and workloads, such as structured and unstructured data as well as batch- and real-time-oriented workloads, an enterprise needs an architecture that enables the convergence of batch and real-time processing technologies into unified data fabrics. For example, Figure 5 shows the reference architecture of a major European bank (Brändli, 2013). In this architecture, the data warehouse is used for analyzing structured data that is fully integrated, harmonized, historized, and cleaned. Semi-structured and unstructured data such as log files or text documents are processed with Hadoop. The advantage is that data can be loaded when it arrives and can be analyzed in an ad hoc way later. Once specific features in the data are discovered, the data can be structured and stored in the data warehouse and made available for more traditional standard reports (for example). Real-time data analyses such as Twitter stream analysis or online risk calculations are performed by stream-based systems such as Storm. The key of a modern big data architecture is that the three platforms should not be seen as independent silos but rather as allowing interoperability and hence easy data exchange. For instance, it must be feasible to correlate results from Twitter streams with the data from a data warehouse or Hadoop, perhaps by using data virtualization techniques. Big Data Analytics Use Cases We will now introduce three specific use cases and discuss which of the big data architectures and technologies just described are best suited for each. The first two use cases focus on the increasing connectivity of our physical world (the IoT). The third use case focuses on the service provider side of managed IT. Managed IT is already a reality, but with the proliferation of connected devices, the IoT, and enterprises increasing dependency on technology, this area is likely to grow substantially. 48 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

6 Use Case #1: Smart Factories The Internet of things has the potential to establish the fourth industrial revolution, following those triggered by mechanization, electricity, and information technology. In manufacturing environments, businesses will establish global networks spanning connected machinery, data warehouses, and production facility systems to construct so-called cyber-physical systems or CPS (Kagermann et al., 2013). Companies will connect their manufacturing plants to form global, integrated manufacturing platforms (so-called smart factories) that provide a virtualized view of the physical supply chain as well as production statistics, maintenance or failure prediction, downtime management, and safety and security management (e.g., who is allowed to access or operate which machine). This view will enable companies to optimize operations, identify inefficiencies, and increase safety while lowering operational costs. In tomorrow s smart factories, manufacturing structures and processes will not be fixed and predefined. Instead, factories will be dynamically reconfigured on a case-by-case basis to automatically create a specific infrastructure for every situation and product, including all the associated requirements in terms of models, data, communication, and algorithms. Companies will increasingly combine their respective logistics and factory infrastructures to create products together. Good examples of product collaboration are the airplane construction processes of Boeing and Airbus, which involve complex supply chain models and many suppliers. The success of a smart factory will depend on many factors, ranging from automated and integrated decision and logistics models to communication protocols to sensors inside machines and robots. Most important in the evolution toward smart factories will be the ability to dynamically interconnect data sources, devices, and sensors, as well as to reconfigure factories and production processes to optimize the complete manufacturing process. Factory downtimes and delays whether related to machine failures, accidents, tool unavailability, or Traditional Big Data Capabilities New Big Data Capabilities Standard Reporting Ad hoc analysis Cognos Business Objects SAS Hadoop Analytics Real-Time Alerting Ticker Focused Storage Data Mart & Cubes General Storage (Data Warehouse) Oracle IBM DB2 Data Transformation & Integration File Staging & Storage Proprietary Framework Informatica Hadoop Middleware Hadoop File System Messaging Streaming Structured Click Stream Figure 5: Modern big data architecture of a major European bank. text data text data text data text data text data text data Text Social Semi- & Unstructured BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 49

7 Rules and Processing Predict, Configure Control/Intervene/Locate Scale-Out Data Management (e.g., Hadoop) Reporting, Analysis Data Warehouse Edge: fast data processing (filtering, feedback) Network Infrastructure Sensors Update Edge Rules and Triggers Machines Tools Inventory Factory Factory Factory People Figure 6: Smart factories. inventory shortage are costly. Analytics can play a key role in reducing downtime and achieving higher operational efficiency through accurate monitoring of resources and data-driven predictive models. For example, with the help of location-based technologies, enterprises can track how long it takes for a worker to obtain a given tool, then use that data to optimize tool placement. Similarly, location-based technology can be useful to pinpoint where accidents happen and what machines and tools are involved. As illustrated in Figure 6, embedded sensors can monitor the health of machines and report to the control system, potentially monitoring multiple factories. Scale-out systems such as Hadoop can be leveraged to analyze the large and varied amount of data to predict maintenance needs and detect outliers (machines that consistently have problems). These analysis results, along with smaller, more structured data, can be stored in traditional data warehouses. Data can be filtered or aggregated at the edge to avoid sending large amounts of unnecessary data to the processing facility. Similarly, on-premises, fast data analysis can be used to detect whether machine parameters are under or over a given threshold and intervene as appropriate (for example, stopping a machine or alerting a supervisor). Sensors and asset tracking technology can also be used for inventory management, determining which goods need to be procured and when orders must be placed. Through reporting and analysis, the information gathered by different sensors along the workflow can be combined in a data warehouse to identify inefficiencies in the overall supply and production processes. This insight into efficiencies can then be propagated through updates to rules, triggers, and configurations in the edge layer. Once goods have left the factory to customers and retailers, manufacturers can leverage social media data to monitor consumers feedback and use the information to make product adjustments and improve customer support. 50 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

8 Use Case #2: Webification of Brick-and-Mortar Retail As with online stores, brick-and-mortar stores are becoming increasingly digitally enabled and connected, capable of tracking and understanding their customers behavior. You could say that a webification of these stores is taking place. Through sensors in stores, connectivity to mobile devices, and digital loyalty cards, stores can personalize and optimize individual shopping experiences. Opportunities include customers flow monitoring, Retailers want to tailor the customer experience for individuals or groups. The system needs real-time capabilities to accomplish this. analysis of buying habits, targeted ad placement, location-based coupons (Kosner, 2013; Dato, 2013), stock management, and offline customer engagement. A deeper understanding of customer behavior can help retailers drive more foot traffic. Retailers can aggregate individual customer movement patterns to optimize the layout of the store or shopping mall and better understand customers needs and habits. Similarly, vending machines can be equipped with sensors to track inventory as well as provide a better customer experience and improve engagement (Macsai, 2009). The customer experience is further improved through customer service. Enterprises can actively mine social networks for customer sentiment analysis and to identify customer service issues (Vizard, 2012). Technology makes it possible to correlate such data with an enterprise s internal databases and build more interactive sites while communicating with customers more proactively. Direct customer feedback is also important for retailers. As in Web environments, retailers want to tailor the customer experience for individuals or groups. The system needs real-time capabilities to accomplish this. Figure 7 shows how different components might be combined in a connected store. Connected cameras, shelf tags, and location-based sensors monitor in real time the way people move through a store and what customers choose from the shelves and put into their carts. Based on this data, the system can suggest particular items or point out discounted products that are of interest to a particular customer. The real-time infrastructure could connect sensing devices (such as cameras, motion detectors, and location tags) to a rules engine and an associated profile from the customer and could operate in-memory (data is not stored on disk for this) for faster processing. Depending on the actions of the customer in the store, the system would send alerts (feedback) to the customer s mobile device or a screen attached to the shopping cart, or present specific, targeted advertising on digital screens in the store when the customer approaches a particular location. The data remains useful even when a customer has left the premises. On the back end, the enterprise can perform historical purchase analysis and customer profiling. Customers entering a store of the same chain can be identified and their profiles retrieved from the back end. Hadoop can be a cost-effective option to store and analyze customers behavior data; the analysis results can be stored in more traditional data stores alongside customer profile data. Although there is great potential for brick-and-mortar retail stores, customer data privacy must be addressed when building such systems. Use Case #3: Outsourced Infrastructure Management Companies do not always have in-house expertise for infrastructure management or IT operations, often because of the cost of hiring the experts. Infrastructure in this case can mean physical infrastructure or services that are hosted within a company or deployed in a public or private cloud. Instead, enterprises outsource management to a third party (for example, their hardware, software, or services vendors) and sign contracts with specific service-level agreements (SLAs) regarding uptime, security, and maintenance with specific penalties if the SLAs are not met. BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 51

9 Customer Profile (Incrementally Updated) Rule Optimization Rules Engine Real-Time Feedback (targeted ads, promotions, etc.) In-Memory Store and Streaming Compute Network Mobile Devices, Shopping Carts, Screens Cameras, Locations, etc. Historical Analysis Scale-Out Data Management (e.g., Hadoop) Results (Data Warehouse) Compute Storage Network Centralized Data Center (Back End) Store (Front End) Figure 7: A front-end and back-end retail infrastructure. The advantage of the infrastructure management company is that it can leverage technical, operational, and configuration data from multiple clients. If the management company detects that a certain version of an Apache Web server and a certain configuration cause problems at one client, it can leverage this knowledge with other clients and, through preventive maintenance, reduce the risk that an SLA will be violated. As illustrated in Figure 8, customers typically install so-called data collectors that send data back to a data center, where the data is stored and analyzed (model A). The security of this data is important because it is moved outside the customer premises. Traditionally this data was relatively small, but today, with increasing network capacity and decreasing infrastructure costs, the variety and the volume of the data are increasing (for example, large amounts of log files are transferred). Scale-out systems enable infrastructure management companies to analyze this data in a relatively fast and cost-effective way and determine appropriate actions (feedback from the analysis) for their clients infrastructures. A scale-out infrastructure such as Hadoop is suitable for this kind of workflow. However, analysis results are more structured and can be stored (together with the actions taken at the customer side) in a more traditional data warehouse. Although this particular use case is very well suited to a scale-out architecture, many enterprises want to reduce the time between data collection (while the data size is increasing) and feedback to the system being managed. This becomes particularly important when dealing with security-related incidents. One approach (model B) is to put an in-memory front end before the scale-out infrastructure and stream and process certain time-sensitive data directly in memory. An even more streamlined approach is to push rules from the data center to the infrastructure of the clients and provide the data collectors with the functionality to quickly preprocess the data collected based on the rules distributed by the infrastructure manager. This creates another security challenge because the rules and algorithms of the infrastructure management company (in some sense, part of the bread and butter of the company) are transferred to the customers premises. This potentially leads to a three-step process: at the edge (data collector level), in-memory in a data center, and scale-out in a data center. The final management model is that 52 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

10 Model A, on premises (low volume/velocity) Model B, on premises (high volume/velocity) Model C, managed cloud (high volume/velocity) Historical Results (Data Warehouse) Historical Results (Data Warehouse) Rules and Processing Rules and Processing Rules and Processing Results (Data Warehouse) Scale-Out Data Management (e.g., Hadoop) In-Memory Store and Streaming Compute Storage Network Feedback Upload/Update Scale-Out Data Management (e.g., Hadoop) In-Memory Store and Streaming Data Feedback Upload/Update Compute Storage Network Data Collectors Data Feedback Fast Data Processing Infrastructure Network Compute Storage Services Collectors Collectors Infrastructure Network Compute Storage Services Globally Distributed Infrastructure Infrastructure Network Compute Storage Services Globally Distributed Infrastructure Figure 8: Different models of outsourced infrastructure management. the customer infrastructure is managed in a cloud of the infrastructure manager (model C). Summary In this article, we presented three use cases to provide examples of different big data challenges: real-time (velocity), scale-out (volume), and edge processing (variety and low latency combined with connectivity challenges). In reality, most use cases deal with more than one challenge. Many actually require data ingestion from multiple sources (variety) and at the same time especially within the IoT domain involve some real-time aspect or a direct feedback component. In addition, data can be stored for historical analysis to extract business insights or to optimize decision making. Data models can vary considerably from one domain (manufacturing, retail, and so on) to another, and legacy applications and infrastructure may impact how and where data is stored and integrated. Different classes of problems can drive different requirements (such as distribution, scalability, and performance) and require different platforms: data warehouse, Hadoop, edge, and so on. Analytics and a proper data infrastructure are becoming key competitive differentiators. Technology and talent (data scientists) are equally important to unlocking this potential. More non-technical (or non-software) enterprises are acquiring tech startups (Rao, 2013), which are attracting funding thanks to trends in big data, analytics, and sensorization. The traditional warehouse has been omnipresent during the last four decades, and it will not be phased out soon by new technologies such as Hadoop or real-time systems such as Storm. Traditional data warehouses perform many vital functions within enterprises and can be quite suitable for highly structured data. BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 53

11 Replacing traditional data warehouses with newer technology (where appropriate) will take time. Yet, several classes of problems are better served by newer technologies, particularly when dealing with high data volume, velocity, and variety. A classical data warehouse will not always be sufficient to capture and integrate all the data. Instead, enterprises will focus on data virtualization strategies. Finally, in many cases, the results of Hadoop processing can be stored in structured databases and could, therefore, be suitable for the classical data warehouses. The importance of traditional data warehouse technology should not be neglected with the rise of these new technologies. SQL-based query engines are today being developed on top of Hadoop, lowering the barrier to adoption for traditional database users, and at the same time acknowledging that SQL is still very important and widely used. However, we are seeing an evolution from the classical data warehouse to a logical data warehouse that combines the strengths of the traditional warehouses with alternative data management and access technologies such as those discussed in this article. References Bessis, Nik, and Ciprian Dobre, eds. [2014]. Fog Computing: A Platform for Internet of Things and Analytics, Big Data and Internet of Things: A Roadmap for Smart Environments, Studies in Computational Intelligence, Volume 546, pp Beyer, Mark [2011]. Does the 21st-Century Big Data Warehouse Mean the End of the Enterprise Data Warehouse? Bonomi, Flavio et al. [2012]. Fog Computing and Its Role in the Internet of Things Proceedings of the first edition of the MCC Workshop on Mobile Cloud Computing, pp Brändli, Pius [2013]. When Big Data Meets Reality, TDWI Conference, Zurich, Switzerland, November. Dato, Siraj [2013]. In-Store Advertising Will Soon Look You Straight in the Eye, Mashable/Quartz, November. Ehrenmann, Markus, Roland Pieringer, and Kurt Stockinger [2012]. Is There a Cure-All for Business Analytics? Case Studies of Exemplary Businesses in Banking, Telecommunications, and Retail, Business Intelligence Journal, Vol. 17, No Inmon, William H. [2005]. Building the Data Warehouse, Wiley. Kagermann, Henning, Wolfgang Wahlster, and Johannes Helbig [2013]. Recommendations for implementing the strategic initiative Industrie 4.0, National Academy of Science and Engineering (Germany), April. Kosner, Anthony W. [2013]. The Internet of ithings: Apple s ibeacon Is Already In Almost 200 Million iphones and ipads, Forbes. 12/15/the-internet-of-ithings-apples-ibeacon-is-alreadyin-almost-200-million-iphones-and-ipads/ Macsai, Dan [2009]. Douwe Egberts Bemoved Vending Machine Will Make You Jump for Joy... For Coffee, Fast Company, October. douwe-egberts-bemoved-vending-machine-will-makeyou-jump-joy-coffee Rao, Leena [2013]. As Software Eats The World, Non- Tech Corporations Are Eating Startups, TechCrunch, December Russom, Philip [2013]. Integrating Hadoop into Business Intelligence and Data Warehousing, TDWI Best Practices Report, Q2. tdwi-best-practices-report-integrating-hadoop-intobusiness-intelligence-and-data-warehousing.aspx 54 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2

12 Ward, Jonathan S., and Adam Barker [2013]. Undefined By Data: A Survey of Big Data Definitions, arxiv.org. Vizard, Michael [2012]. FedEx CIO Sees Analytics Driving a World of Enterprise Change, Slashdot, October 4. Instructions for Authors The Business Intelligence Journal is a quarterly journal that focuses on all aspects of business intelligence, data warehousing, and analytics. It serves the needs of researchers and practitioners in this important field by publishing surveys of current practices, opinion pieces, conceptual frameworks, case studies that describe innovative practices or provide important insights, tutorials, technology discussions, and annotated bibliographies. The Journal publishes educational articles that do not market, advertise, or promote one particular product or company. Visit tdwi.org/journalsubmissions for the Business Intelligence Journal s complete submissions guidelines, including writing requirements and editorial topics. Submissions For complete submission guidelines and suggestions, visit tdwi.org/journalsubmissions Materials should be submitted to: Marie Gipson Managing Editor journal@tdwi.org Upcoming Deadlines Volume 20, Number 4 Submission Deadline: August 7, 2015 Distribution: December 2015 Volume 21, Number 1 Submission Deadline: November 20, 2015 Distribution: March 2016 Volume 21, Number 2 Submission Deadline: February 19, 2016 Distribution: June 2016 BUSINESS INTELLIGENCE JOURNAL VOL. 20, NO. 2 55

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid

More information

How To Make Data Streaming A Real Time Intelligence

How To Make Data Streaming A Real Time Intelligence REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Extend your analytic capabilities with SAP Predictive Analysis

Extend your analytic capabilities with SAP Predictive Analysis September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

How To Turn Big Data Into An Insight

How To Turn Big Data Into An Insight mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant juan.madera.jimenez@accenture.com

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant juan.madera.jimenez@accenture.com The Potential of Big Data in the Cloud Juan Madera Technology Consultant juan.madera.jimenez@accenture.com Agenda How to apply Big Data & Analytics What is it? Definitions, Technology and Data Science

More information

Achieving Business Value through Big Data Analytics Philip Russom

Achieving Business Value through Big Data Analytics Philip Russom Achieving Business Value through Big Data Analytics Philip Russom TDWI Research Director for Data Management October 3, 2012 Sponsor 2 Speakers Philip Russom Research Director, Data Management, TDWI Brian

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Converging Technologies: Real-Time Business Intelligence and Big Data

Converging Technologies: Real-Time Business Intelligence and Big Data Have 40 Converging Technologies: Real-Time Business Intelligence and Big Data Claudia Imhoff, Intelligent Solutions, Inc Colin White, BI Research September 2013 Sponsored by Vitria Technologies, Inc. Converging

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

UNIFY YOUR (BIG) DATA

UNIFY YOUR (BIG) DATA UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:

More information

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready Real-Time IoT Platform Solutions for Wireless Sensor Networks Find the Information That Matters ViZix is a scalable, secure, high-capacity platform for Internet of Things (IoT) business solutions that

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006

Practical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006 Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment www.wipro.com Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment Pon Prabakaran Shanmugam, Principal Consultant, Wipro Analytics practice Table of Contents 03...Abstract

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Oracle Big Data Building A Big Data Management System

Oracle Big Data Building A Big Data Management System Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following

More information

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com

Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com Trend Too much information is a storage issue, certainly, but too much information is also

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

MES and Industrial Internet

MES and Industrial Internet October 7, 2014 MES and Industrial Internet Jan Snoeij Board Member, MESA International Principal Consultant, CGI Do you know MESA? Agenda Introduction Internet of Things Big Data Smart Factory or Smart

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All

More information

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation YOU VS THE SENSORS Six Requirements for Visualizing the Internet of Things Dan Potter Chief Marketing Officer, Datawatch Corporation About Datawatch NASDAQ: DWCH Pioneer in real-time visual data discovery

More information

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with

More information

Internet of Things. Opportunity Challenges Solutions

Internet of Things. Opportunity Challenges Solutions Internet of Things Opportunity Challenges Solutions Copyright 2014 Boeing. All rights reserved. GPDIS_2015.ppt 1 ANALYZING INTERNET OF THINGS USING BIG DATA ECOSYSTEM Internet of Things matter for... Industrial

More information

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise

An Oracle White Paper October 2011. Oracle: Big Data for the Enterprise An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information