Technical White Paper. October Real-Time Discovery in Big Data Using the Urika-GD. Appliance G OVERN M ENT.

Size: px
Start display at page:

Download "Technical White Paper. October 2014. Real-Time Discovery in Big Data Using the Urika-GD. Appliance G OVERN M ENT. www.cray.com"

Transcription

1 LIFE SCIENCES Technical White Paper Real-Time Discovery in Big Data Using the Urika-GD Appliance SPORTS ANALYTICS FRAU D SCIENTIFIC RESEARCH CYBERSECURITY G OVERN M ENT TELECOMMUNICATIONS CUSTOM ER INSIG HTS FINANCIAL SERVICES

2 Table of Contents Executive Summary Discovery Through Human-Machine Collaboration... 3 Using Graphs for Discovery Analytics... 5 Introducing the Cray Urika-GD Graph Analytics Appliance Overview of the Multiprocessor, Shared Memory Architecture Addressing the Memory Wall Through Massive Multithreading... 8 Delivering High Bandwidth with a High Performance Interconnect... 8 Enabling Fine-Grained Parallelism with Word-Level Synchronization Hardware Delivering Scalable I/O to Handle Dynamic Graphs Comparison: The Urika-GD Appliance s Hardware and Commodity Hardware The Urika-GD System Software Stack The Graph Analytics Database Enabling Ad-hoc Queries and Pattern-based Search with RDF and SPARQL Augmenting Relationships in the Graph Through Inferencing Benefits of the Urika-GD System s Software Architecture The Benefits of an Appliance Integrating the Urika-GD Appliance into an Existing Analytics Environment Building the Graph Visualization Integration with Other Analytics Packages Conclusion Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 1

3 A new approach is needed. An approach that: Separates data and its representation, allowing for new data sources and new relationships to be included without complex data model changes. Supports a wide range of ad-hoc analysis as needed to spark insight and validate new theories. Typically, these will take the form of searching for patterns of relationships, but other types of analytics and visualization will also be required. Operates in real time, supporting collaborative, iterative discovery in very large datasets. Executive Summary Discovery, the often accidental revelations that have changed the world since Archimedes, is a vital component of the advancement of knowledge. The recognition of previously unknown linkages between occurrences, objects and facts underpins advances in such diverse areas as life sciences (cancer drug discovery, personalized medicine or understanding the spread of disease), financial services (counter-party credit risk analysis, fraud detection, identity resolution or anti-money laundering) and government operations (cybersecurity threat analysis, person-of-interest identification or counterterrorism threat detection). New discoveries often deliver very high value: Consider the harm avoided through the proactive detection of fraud or counterterrorism operations, or the billions of dollars in revenue generated from a new cancer drug. The traditionally slow pace of discovery is being greatly accelerated by the advent of big data. Discovery takes place when a researcher has a Eureka! moment, where a flash of insight leads to the formulation of a new theory, followed by a painstaking validation of that theory against observations in the real world. Big data can assist in both of these phases. Applying analytics and visualization to the huge volume of captured data stimulates insight, and the ability to test new theories electronically can speed validation a thousandfold, fulfilling the true promise of big data as long as an organization s systems are up to the challenge. Traditional data warehouses and business intelligence (BI) tools built on relational models are not well suited to discovery, however. BI tools are highly optimized to generate defined reports from operational systems or data warehouses. They require the development of a data model that is designed to answer specific business questions but the model then limits the types of questions that can be asked. Discovery, however, is an iterative process, where a line of inquiry may result in previously unanticipated questions, which may also require new sources of data to be loaded. Accommodating these will likely require time-consuming, error-prone and complex extensions to the data model, for which saturated IT professionals do not have time. Graph analytics are ideally suited to meet these challenges. Graphs represent entities and the relationships between them explicitly, greatly simplifying the addition of new relationships and new data sources, and efficiently support ad-hoc querying and analysis. Real-time response to complex queries against multi-terabyte graphs is also achievable, with the appropriate platform. Cray s Urika-GD appliance is built to meet the challenging requirements of discovery. With one of the world s most scalable shared memory architectures, the Urika-GD appliance employs graph analytics to surface unknown linkages and non-obvious patterns in big data, do it with speed and simplicity, and facilitate the kinds of breakthroughs that can give any organization in government activities ranging from national security to fraud detection, medical and pharmaceutical research, financial services and even retail a measurable advantage. The Urika-GD appliance complements existing data warehouses and Hadoop clusters by offloading challenging data discovery applications while still interoperating with the existing analytics workflow. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 2

4 Discovery through Human-Machine Collaboration All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei Discovery is the desired outcome of an investigative analytical process. Discovery in big data requires the collaboration of man and machine, where the guiding intellect the ability to posit and infer is human. In time, artificial intelligence may be able to make suppositions and draw conclusions but, for now, humans still have the advantage. The process of discovery is iterative, as shown in Figure 1. The analyst must be able to test a hypothesis against all available data by posing a question that the technology answers in depth and then renders visually, shortening the time between results. This requires the ability to ask questions that were not anticipated by those who built the knowledge base, referred to as ad-hoc queries in the database world. In discovery, you don t know the next question until you get the first answer, and each iteration may require additional datasets for analysis. The addition of those datasets demands fast, flexible and powerful I/O. This cycle continues until the Eureka! moment, where the analyst makes a high-value breakthrough discovery. Example 1, on cancer drug discovery, illustrates this process. Discovery through fast hypothesis validation Figure 1. The cycle of discovery. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 3

5 With traditional analytics technologies, discovery is challenging because of several interrelated difficulties: 1. Predicting what data is needed. Discovery depends upon the ability to import and combine new datasets, ranging from structured (databases) to semistructured (XML, log files) to unstructured sources (text, audio, video) as needed to support new lines of inquiry. Traditional analytics solutions use fixed data schema, and the addition of new types of data and relationships between data items involves complex, time consuming schema extension, often requiring person-weeks or months of effort. Analysts using traditional technologies report spending up to 80 percent of their time on data import and schema manipulation. 2. Predicting what questions will be asked. Discovery depends on the ability to follow up on new lines of questioning, including questions about the relationships implied within the data. Traditional solutions depend upon optimizing data schemata for specific queries in order to deliver acceptable performance. Failure to do so results in nested table joins, which are very damaging to performance. IT groups have described these as forbidden queries, for their tendency to bring the analytics infrastructure to a grinding halt until the queries are killed. 3. Delivering predictable, real-time performance as data sizes and query complexity grow. Discovery depends upon real-time results being delivered in response to queries. Traditional systems have difficulty achieving deterministic response times to ad-hoc queries, let alone real-time response as dataset sizes and query complexity grows. The result is that analysts cherry pick their lines of reasoning, driven by systems capability, rather than investigating all the avenues desired, introducing bias from their own preconceptions. The result of these challenges is an organizational unwillingness to extemporaneously experiment with data, unless the value has been proven beyond the shadow of a doubt. This is a major constraint on innovation. Example 1: Cancer Drug Discovery Using Graph Analytics The Institute for Systems Biology (ISB) is approaching the challenge of cancer drug discovery using a systems biology approach, involving the modeling of the formation and growth of tumors at the molecular level. The objective is to understand the gene mutations and the biological processes that lead to cancer, to discover highly targeted treatments. This is very challenging because the volume of published, relevant scholarly articles and genomic and protein databases is beyond human ability to digest. ISB tackled this problem by extracting the relationships contained in Medline articles, containing journal citations and abstracts for biomedical literature from around the world, using natural language processing. They combined these relationships with genomic and proteomic data of healthy and cancerous cells from the Cancer Genome Atlas and other databases, as well as their own experimental wet lab results, into a very large graph comprising billions of relationships. New sources of data were continually added as their relevancy was determined. Researchers wrote complex, ad-hoc queries, effectively validating hypotheses in-silico, in an iterative process where each new set of results suggested new lines of inquiry. Graph analytics served ISB very well for discovery. They were able to quickly add new sources of data and new types of relationships as they were uncovered, and write sophisticated, partially specified queries looking for patterns of relationships in the data. Visualization of the results enabled quick comprehension, while the ability to export large sets of results for statistical processing helped guide the discovery process and provided statistical rigor. In the amount of time it took to validate one hypothesis, we can now validate 1,000 hypotheses increasing our success rate significantly, remarked Dr. Ilya Shmulevich of the ISB. This approach led to the discovery that many breast cancers have an increase in the expression of the ABCG2 gene, and that the HIV drug nelfinavir inhibits ABCG2. This drug is a strong candidate for repurposing to treat breast cancer, a discovery with considerable potential revenue opportunity. Repurposing is a very cost-effective way of bringing new drug therapies to market, and graph analytics are now a proven way to identify these opportunities. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 4

6 Using Graphs for Discovery Analytics Using graphs in data analytics provides many advantages. What differentiates Urika from the many graph databases available today is its ability to enable data discovery at scale and on an interactive basis. (Chartis Research, Looking for Risk: Applying Graph Analytics to Risk Management, Peyman Mestchian) A graph consists of nodes, representing data items, and edges, representing the relationships between nodes, as shown in Figure 2. Graphs represent the relationships between data explicitly, facilitating analysis of patterns of relationships, a key aspect of discovery. Contrast this with traditional tabular representations, where the focus is on processing data (the rows in the tables), and where relationships are second-class entities, represented indirectly by table column headings and indices. Graphs address the challenges presented by traditional analytics. 1. Predicting what data is needed. Graphs provide a flexible data model, where new types of relationships are readily added, greatly simplifying the addition of new data sources. Relationships extracted from structured, semistructured or unstructured data can be readily represented in the same graph. 2. Predicting what questions will be asked. Graphs have no fixed schema, constraining the universe of queries that can be posed. Relationships are not hidden: it s possible to write relationships querying the types of relationships that exist. Graphs also enable advanced analytics techniques such as community detection, path analysis, clustering and others. 3. Delivering predictable, real-time performance as data sizes and query complexity grow. Graph analytics can deliver predictable, real-time performance, as long as the hardware and software are appropriate to the task. Cray developed the Urika-GD appliance specifically for this task, as described in the following sections. These attributes enable graph analytics to deliver value incrementally. As understanding grows, new data sources and new relationships can be added, building an ever more potent and accurate model. Figure 2. Graphs consist of nodes (data items) and edges (relationships). Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 5

7 Introducing the Cray Urika-GD Graph Analytics Appliance The Urika-GD appliance was introduced in recognition of the important role that graph analytics can play in discovery. A large governmental organization approached Cray about performing discovery analytics in a large and constantly growing graph. They had investigated several technologies, but none satisfied their needs. Analysis of this organization s needs led to a canonical list of hardware requirements for all graph analytics (the software requirements will be discussed in a later section): Discovery analytics requires real-time response: A multiprocessor solution is required for scale and performance. Many graph analytics solutions are single-computer implementations, very useful for small problems, but unusable at scale. Uncovering previously unknown patterns and relationships across increasingly large repositories of multistructured data represents one of the biggest opportunities to derive new sources of innovation, growth and productivity from analytics. (Gartner, Cool Vendors in Content and Social Analytics, by Rita Salaam) Graphs are hard to partition: A large, shared memory is required to avoid the need to partition graphs. Analyzing graph relationships requires following the edges in the graph. Regardless of the scheme used, partitioning the graph across a cluster will result in edges spanning cluster nodes. In most cases, the number of edges crossing cluster nodes is so large that it requires a time-consuming network transfer each time those edges are crossed. Compared to local memory, even a fast commodity network such as 10 GB Ethernet is at least 100 times slower at transferring data. Given the highly interconnected nature of graphs, users gain a significant processing advantage if the entire graph is held in sufficiently large shared memory. Graphs are not predictable, and therefore cache-busting: A custom graph processor is needed to deal with the mismatch between processor and memory speeds. Analyzing relationships in large graphs requires the examination of multiple, competing alternatives. These memory accesses are very data dependent and eliminate the ability to apply traditional performance improvement techniques such as pre-fetching and caching. Given that even RAM memory is 100 times slower than processors, and that graph analytics consists of exploring alternatives, the processor sits idle most of the time waiting for delivery of data. Cray developed hardware multithreading technology to help alleviate this problem. Threads can explore different alternatives, and each thread can have its own memory access. As long as the processor supports a sufficient number of hardware threads, it can be kept busy. Given the highly nondeterministic nature of graphs, a massively multithreaded architecture enables a tremendous performance advantage. Graphs are highly dynamic: A scalable, high performance I/O system is required for fast loading. Graph analytics for discovery involves examining the relationships and correlations between multiple datasets and, consequently, requires loading many large, constantly changing datasets into memory. The sluggish speed of I/O systems 1,000 times slower compared to the CPU translates into graph load and modification times that can stretch into hours or days far longer than the time required for running analytics. In a dynamic enterprise with constantly changing data, a scalable processing infrastructure enables a tremendous performance advantage for discovery. These requirements drove the design of the Urika-GD system s hardware, and resulted in a hardware platform proven to deliver real-time performance for complex data discovery applications. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 6

8 Overview of the Multiprocessor, Shared Memory Architecture Cray s Urika-GD appliance is a heterogeneous system consisting of Urika-GD appliance services nodes and graph accelerator nodes linked by a high performance interconnect fabric for data exchange (see Figure 3). Graph accelerator nodes ( accelerator nodes ) utilize a purpose-built Threadstorm processor capable of delivering several orders of magnitude better performance on graph analytics applications than a conventional microprocessor. Accelerator nodes share memory and run a single instance of a UNIX-based, compute-optimized OS named multithreaded kernel (MTK). Urika-GD appliance services nodes ( service nodes ), based on x86 processors, provide I/O, appliance and database management. As many I/O nodes may be added as desired, enabling the scaling of connectivity and management functions for larger Urika-GD appliances. Each service node runs a distinct instance of a fully featured Linux operating system. The interconnect fabric is designed for high-speed access to memory anywhere in the system from any processor, as well as scaling to large processor counts and memory capacity. The Urika-GD system architecture supports flexible scaling to 8,192 graph accelerator processors and 512 TB of shared memory. Urika-GD systems can be incrementally expanded to this maximum size as data analytics needs and dataset sizes grow. Graph Accelerator Nodes Urika-GD Appliance Services Nodes MTK Linux Network RAID Controllers Threadstorm processors running MTK (BSD) x86 processors running SUSE Linux Figure 3. Urika-GD system architecture. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 7

9 Addressing the Memory Wall Through Massive Multithreading A DBMS designed for known relationships and anticipated requests runs badly if the relationships actually discovered are different, and if requests are continually adapted to what is learned. (Gartner, Urika shows Big Data is More than Hadoop and Data Warehouses by Carl Claunch) Memory wall refers to the growing imbalance between CPU speeds and memory speeds. Starting in the early 1980s, CPU speed improved at an annual rate of 55 percent, while memory speed only improved at a rate of 10 percent. This imbalance has been traditionally addressed by either managing latency or amortizing latency. However, neither approach is suitable for graph analytics. Managing latency is achieved by creating a memory hierarchy (levels of hardware caches) and by software optimization to pre-fetch data. However, this approach is not suitable for graph analytics, where the workload is heavily dependent on pointer chasing (the following of edges between nodes in the graph) because the random access to memory results in frequent cache misses and processors stalling while waiting for data to arrive. Amortizing latency is achieved by fetching large blocks of data from memory. Vector processors and GPUs employ this technique to great advantage when all the data in the block retrieved is used in computation. This approach is also not suitable for graph analytics, where relatively little data is associated with each graph node, other than pointers to other nodes. In response to the ineffectiveness of managing or amortizing latency on graph problems, a new approach was developed the use of massive multithreading to tolerate latency. The Threadstorm processor is massively multithreaded with 128 independent hardware streams. Each stream has its own register set and executes one instruction thread. The fully pipelined Threadstorm processor switches context to a different hardware stream on each clock cycle. Up to eight memory references can be in flight for each thread, and each hardware stream is eligible to execute every 21 clock cycles if its memory dependencies are met. No caches are necessary or present anywhere in the system since the fundamental premise is that at least some of the 128 threads will have the data required to execute at any given time. Effectively, Threadstorm enables global access of multiple, random, dynamic memory refers simultaneously without pre-fetching or caching, turning the memory latency problem into a requirement for high bandwidth. Delivering High Bandwidth with a High Performance Interconnect The Urika-GD system uses a purpose-built high-speed network. This interconnect links nodes in the 3-D torus topology (Figure 3) to deliver the system s high communication bandwidth. This topology provides excellent cross-sectional bandwidth and scaling, without layers of switches. Key performance data for the interconnection network: Sustained bidirectional injection bandwidth of more than 4 GB/s per processor and an aggregate bandwidth of almost 40 GB/s through each vertex in the 3-D torus Efficient support for Threadstorm remote memory access (RMA), as well as direct memory access (DMA) for rapid and efficient data transfer between the service nodes and accelerator nodes Combination of high-bandwidth, low-latency network and massively multithreaded processors make the Urika-GD appliance ideally suited to handle the most challenging graph workloads Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 8

10 Comparison: The Urika-GD Appliance s Hardware and Commodity Hardware The Urika-GD appliance hardware provides a number of key advantages for discovery analytics over commodity cluster systems: a large, global shared memory, extreme processing power, a purpose-built, massively multithreaded graph acceleration processor, extreme memory bandwidth and extreme tolerance for memory latency. The table below sums up these differentiators and the benefit each provides for graph analytics. Enabling Fine-Grained Parallelism with Word-Level Synchronization Hardware The benefits of massive parallelism are quickly lost if synchronization between threads involves serial processing, in accordance with Amdahl s Law. The Threadstorm processors and memory implement fine-grained synchronization to support asynchronous, data-driven parallelism and spread the synchronization load physically within the memory and interconnection network to avoid hot spots. Full/empty bits are provided on every 64-bit memory word in the entire global address space for fine-grained synchronization. This mechanism can be used across unrelated processes on a spin-wait basis. Within a process (which can have up to tens of thousands of active threads), multiple threads can also do efficient blocking synchronization using mechanisms based on the full/empty bits. The Urika-GD system s software uses this mechanism directly without OS intervention in the hot path. Delivering Scalable I/O to Handle Dynamic Graphs Any number of appliance services nodes can be plugged into the interconnect, allowing the appliance s I/O capabilities to be scaled independently from the graph processing engine. A Lustre parallel file system is used to provide scalable, high performance storage. Lustre is an open-source file system designed to scale to multiple exabytes of storage, and to provide near linear scaling in I/O performance with the addition of Lustre nodes. DIFFERENTIATOR URIKA-GD SYSTEM CAPABILITY SIGNIFICANCE TO DISCOVERY ANALYTICS LARGE GLOBAL SHARED MEMORY Scales up to 512 TB Enables uniform, low-latency access to all the data, regardless of data partitioning, layout or access pattern. A large shared memory holds the entire graph, avoiding the need to partition and enabling unknown linkages and non-obvious patterns in the data to be easily surfaced with no advance knowledge of the relationships in the dataset. EXTREME PROCESSING POWER Scales up to 8,192 processors Achieving real-time performance requires employing as many processors as needed, all sharing the same memory. This scalability ensures interactive response on the most demanding workloads. MASSIVE MULTITHREADING EXTREME MEMORY PERFORMANCE 128 hardware threads per processor Memory bandwidth scales with size of appliance Graph analytics involves random memory access patterns. Random memory access results in individual threads stalling. Processors can tolerate latency if they have multiple concurrently executing hardware threads so there are always threads ready to execute upon a memory stall. The Urika-GD appliance is effectively investigating multiple changing hypotheses in real time simultaneously, enabling it to deliver two to four orders of magnitude improvement in performance. 3 Traditional processors amortize memory latency (they make an inherent assumption that data will have locality, so they retrieve blocks of data into a complex hierarchy of caches). Graphs, and discovery applications generally, do not have locality, so this approach doesn t work. The Urika-GD platform s Threadstorm processors tolerate latency through massive multithreading. However, each thread can issue up to eight concurrent memory references so massive memory bandwidth is required to keep the processors running at peak performance. Massive multithreading and extreme memory bandwidth go hand in hand to deliver the Urika-GD system s performance advantage. Word-level memory synchronization hardware enables very linear scaling to high thread and processor counts. Together, these optimizations deliver an appliance finely tuned to the requirements of discovery in big data. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 9

11 The Urika-GD System Software Stack The Urika-GD system s software stack was crafted with several goals in mind: Create a standards-based appliance for real-time data discovery using graph analytics Facilitate migration of existing graph workloads onto the Urika-GD appliance Allow users to easily fuse diverse datasets from structured, semistructured and unstructured sources without upfront modeling, schema design or partitioning considerations Enable ad-hoc queries and pattern-based searches across the entire dynamic graph database. The Urika-GD appliance software (Figure 4) is partitioned across the two types of processors in the service nodes and accelerator nodes, with each processing the workload for which it is best suited. The service nodes run discrete copies of Linux and are responsible for all interactions with the external world database and appliance management, and database services, including a SPARQL endpoint for external query submission. The service nodes also perform network and file system I/O. A Lustre parallel file system enables near-linear scalability across multiple service nodes, allowing even the largest datasets to be loaded into memory in minutes. The accelerator nodes perform the functions of maintaining the in-memory graph database, including loading and updating the graph, performing inferencing and responding to queries. RDF SPARQL Java Visualization Tools Urika-GD Appliance Services Nodes Urika-GD Graph Appliance Accelerator Nodes Graph Analytics Application Services Database Manager, Database Services, Visualization Services Graph Analytics Database SUSE Linux 11 Optimized Multithreaded Kernel Figure 4. Urika-GD system software architecture. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 10

12 The Graph Analytics Database The graph analytics database provides an extensive set of capabilities for defining and querying graphs using the industry standard RDF and SPARQL. These standards are widely used for storing graphs and performing analytics against them. Cray built the database and query engine from the ground up to take advantage of the massive multithreading on the Threadstorm processors. The standards-based approach and comprehensive feature set ensure that existing graph data and workloads can be migrated onto the Urika-GD platform with minimal or no changes to existing queries and application software. With the Urika-GD appliance, query results can be sent back to the user or be written to the parallel file system. The latter capability can be very useful when the set of results is very large. Concept Fertilize Transport Factory Goal: Proactively identify patterns of activity and threat candidates by aggregating intelligence and analysis Data sets: Reference data, people, places, things, organizations, communications... Technical challenges: Volume and velocity of data; inaccurate, incomplete and falsified data Users: Intelligence analysts Usage model: Search for patterns of activity and graphically explore relationships between candidate behavior and activities Augmenting: Existing Hadoop cluster and multiple data appliances Figure 5. An example of identifying threat patterns. Discovery in Big Data Using the Urika-GD Graph Analytics Appliance 11

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

YarcData urika Technical White Paper

YarcData urika Technical White Paper YarcData urika Technical White Paper 2012 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark, YarcData, urika and Threadstorm are trademarks

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study Six Days in the Network Security Trenches at SC14 A Cray Graph Analytics Case Study WP-NetworkSecurity-0315 www.cray.com Table of Contents Introduction... 3 Analytics Mission and Source Data... 3 Analytics

More information

Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases

Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases 1 Introduction to urika Multithreading urika Appliance SPARQL Database Use Cases 2 Gain business insight by discovering unknown relationships in big data Graph analytics warehouse supports ad hoc queries,

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

urika! Unlocking the Power of Big Data at PSC

urika! Unlocking the Power of Big Data at PSC urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data

More information

Big Data and Healthcare Payers WHITE PAPER

Big Data and Healthcare Payers WHITE PAPER Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Platfora Big Data Analytics

Platfora Big Data Analytics Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper

See the Big Picture. Make Better Decisions. The Armanta Technology Advantage. Technology Whitepaper See the Big Picture. Make Better Decisions. The Armanta Technology Advantage Technology Whitepaper The Armanta Technology Advantage Executive Overview Enterprises have accumulated vast volumes of structured

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché

I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES. Deploying an elastic Data Fabric with caché I N T E R S Y S T E M S W H I T E P A P E R F O R F I N A N C I A L SERVICES EXECUTIVES Deploying an elastic Data Fabric with caché Deploying an elastic Data Fabric with caché Executive Summary For twenty

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

YarcData's urika Shows Big Data Is More Than Hadoop and Data Warehouses

YarcData's urika Shows Big Data Is More Than Hadoop and Data Warehouses G00232737 YarcData's urika Shows Big Data Is More Than Hadoop and Data Warehouses Published: 11 September 2012 Analyst(s): Carl Claunch The hype about big data is mostly on Hadoop or data warehouses, but

More information

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays

The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays Executive Summary Microsoft SQL has evolved beyond serving simple workgroups to a platform delivering sophisticated

More information

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014 Increase Agility and Reduce Costs with a Logical Data Warehouse February 2014 Table of Contents Summary... 3 Data Virtualization & the Logical Data Warehouse... 4 What is a Logical Data Warehouse?... 4

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Data virtualization: Delivering on-demand access to information throughout the enterprise

Data virtualization: Delivering on-demand access to information throughout the enterprise IBM Software Thought Leadership White Paper April 2013 Data virtualization: Delivering on-demand access to information throughout the enterprise 2 Data virtualization: Delivering on-demand access to information

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved DDN Case Study Accelerate > Converged Storage Infrastructure 2013 DataDirect Networks. All Rights Reserved The University of Florida s (ICBR) offers access to cutting-edge technologies designed to enable

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

ANALYTICS STRATEGY: creating a roadmap for success

ANALYTICS STRATEGY: creating a roadmap for success ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data

More information

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data MarkLogic Enterprise NoSQL Database and Cisco Unified Computing System provide a single, integrated hardware and software infrastructure

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate

More information

Gaining the Performance Edge Using a Column-Oriented Database Management System

Gaining the Performance Edge Using a Column-Oriented Database Management System Analytics in the Federal Government White paper series on how to achieve efficiency, responsiveness and transparency. Gaining the Performance Edge Using a Column-Oriented Database Management System by

More information

TopBraid Insight for Life Sciences

TopBraid Insight for Life Sciences TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

Big Data for the Rest of Us Technical White Paper

Big Data for the Rest of Us Technical White Paper Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Gradient An EII Solution From Infosys

Gradient An EII Solution From Infosys Gradient An EII Solution From Infosys Keywords: Grid, Enterprise Integration, EII Introduction New arrays of business are emerging that require cross-functional data in near real-time. Examples of such

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse

IBM Analytics. Just the facts: Four critical concepts for planning the logical data warehouse IBM Analytics Just the facts: Four critical concepts for planning the logical data warehouse 1 2 3 4 5 6 Introduction Complexity Speed is businessfriendly Cost reduction is crucial Analytics: The key to

More information

EMC Unified Storage for Microsoft SQL Server 2008

EMC Unified Storage for Microsoft SQL Server 2008 EMC Unified Storage for Microsoft SQL Server 2008 Enabled by EMC CLARiiON and EMC FAST Cache Reference Copyright 2010 EMC Corporation. All rights reserved. Published October, 2010 EMC believes the information

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

The IBM Cognos Platform

The IBM Cognos Platform The IBM Cognos Platform Deliver complete, consistent, timely information to all your users, with cost-effective scale Highlights Reach all your information reliably and quickly Deliver a complete, consistent

More information

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS ..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Virtual Data Warehouse Appliances

Virtual Data Warehouse Appliances infrastructure (WX 2 and blade server Kognitio provides solutions to business problems that require acquisition, rationalization and analysis of large and/or complex data The Kognitio Technology and Data

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory) WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP selects SAP HANA to improve the speed of business analytics with IBM and SAP Founded in 1806, is a global consumer products company which sells nearly $17 billion annually in personal care, home care,

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information