BSC vision on Big Data and extreme scale computing
|
|
|
- Louisa George
- 10 years ago
- Views:
Transcription
1 BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra, Enric Tejedor and Mateo Valero BSC Science needs in Big Data at BSC Solving new challenges in most application areas requires handling and processing huge amounts of data. Most computational groups at the Barcelona Supercomputing Centers already met this demands, with applications for molecular dynamics, genomic and protein docking analysis, air quality control, data assimilation, oil exploration, physiological human simulation, social graph analytics, smart cities and HPC performance analysis and modeling. We are also cooperating in projects in the human brain simulation and neuroscience analytics with equivalent demands, An interesting observation is that each of these research teams has faced the issues from an individual perspective, experimenting and/or elaborating different solutions for their specific applications. We believe that this situation within our institution is probably representative of the current ad- hoc and fragmented approaches in the field, calling for a unification of efforts. In this direction, BSC is committed in an internal effort (the Severo Ochoa project) to integrate the activities of application development departments (Life Sciences, Earth Sciences and Engineering) with the Computer Sciences department working on system software and architecture. The objective of this co- design project between application, programming models, runtime systems and architecture teams is to identify and/or develop an infrastructure to provide a unified, productive, easy to use and efficient environment for our broad range of applications. Strategic considerations Big data means big issues. Topics like applications, algorithms, system architecture and capacity/bandwidth balance at the different levels, scalability, middleware and APIs, security, resilience or provenance to name a few are extremely relevant to the area. Among them, we would like to stress: the importance of algorithmic optimization and tuning, the increasingly dynamic usage patterns of the infrastructures; the importance of providing clean programming interfaces integrating the concurrency and data management models; the responsibility of the runtime in optimizing the mapping of computation and data to available resources; and the need to develop architectural support for the runtime and application functionalities. The following subsections further elaborate these topics, exposing the considerations that drive/inspire our efforts to develop an appropriate systems and middleware to efficient and productively support the needs of big data applications.
2 a) Reducing computational and data movement complexities Algorithmic developments are extremely important. We believe that brute force algorithms are too often applied relying on the capability of scalable hardware to solve a problem. Better algorithms that drastically reduce the computational complexity to solve a problem have always been the best ways to reduce time and energy to solution. With the advent of the memory wall problem, the high energy cost and long latencies of data movement, the reduction of the data movement complexity needs to be considered in combination with the computational complexity. The introduction of asynchrony, avoidance of global synchronization, allowing for non- structured patterns of concurrency and large look- ahead capabilities are features that will allow our algorithms to tolerate the latency limitations of the infrastructures. Today s algorithms are frequently used in very simple linear workflows that often correspond to an analysis process or methodology. We envisage that programs with more flexible control flow structures will be of great relevance and will bring increased intelligence to the coarse- grain computational workflows. b) Usage models and resource management In the current batch- oriented dominant practice users often launch a lot of computations that are analyzed (and not always) offline. We anticipate that the usage of big data and compute infrastructures will need to evolve to allow higher dynamicity and responsiveness. The incoming rates for streamed data sets may significantly vary along time depending on external data acquisition and measured system conditions. Interactive and steered usage will dynamically shift the focus and balance of computation and data processing, potentially deciding that computations initially scheduled are no longer relevant and unforeseen analyses or visualizations become suddenly important. Flexible and intelligent computational workflows will also present variable computational and data access demands as they proceed. These usage patterns do generate strong requirements on the dynamicity of the resource management of the infrastructure. Sharing a large infrastructure by several applications and users has the potential to improve user satisfaction and save a lot of resources if appropriate resource management policies can be implemented. Mechanisms supporting malleability have to be developed/introduced to make applications capable of adapting their internal parallel and data access structure to changes in the allocated resources (data, communication and compute) as they appear. Dynamic resource management policies will have to match the demand and availability, redistributing resources to maximize the efficiency in the use of the overall infrastructure and maximizing the quality of service experienced by users. c) Programming models Ideally, programming models should provide an interface by which the application developer expresses algorithms and ideas in a platform- agnostic way. The interface should also allow conveying information/hints to the runtime that could be dynamically used to optimize the actual mapping of computations and data access/movements to resources. In current Big Data practices, concurrency and data processing are often considered as independent aspects, with
3 most of the efforts focusing on one or the other. Furthermore, they are offered through fairly large APIs, each of them introducing many concepts and essentially targeting a specific level of granularity. As a result, large programs with many lines of code dealing with platform specificities rather than problem related logic are written. The specialization of developers at a given granularity level can lead to applications that are not optimized at all levels from global orchestration to low level data access or computational kernels. We believe that it is important to propose models that better integrate the concurrency and data processing aspects trying to rely on the same type of abstractions at the different granularity levels. Particularly relevant is the integration of flexible parallel control flow structures and query mechanisms to refer to huge data sets. Support for flexible parallelization strategies (asynchrony, nesting) is needed to free applications from latency limitations, converting them into throughput based applications where amount of resources (bandwidth, cores, $) is the limiting factor. By using persistent object- based storage layers we have the potential of closely mapping the data models in the programming languages to those used to provide data durability and also to provide a flexible shared communication space between partially coupled or independent applications. Such integration would simplify programs, eliminating the programmer need to consider the different models and a lot of the code that today is devoted to explicit I/O operations. We consider that two features of our designs will be extremely useful to ensure productivity, maintainability, application integration and portability: first, being able to leverage current programming languages at the different granularities (sequential or parallel programming language, scripting languages, or DSL) with minimal extensions; second, using similar underlying concepts (ie. task based models) at the different granularity levels (workflow, parallel program). The models must allow and encourage a holistic optimization considering all levels of granularity. This should even include potential architecture support with specialized functional units for computational/data movement tasks and potentially hardware support for the runtime. The models should integrate clean and simple mechanisms to enable intelligent data layouts and locality optimizations, allowing the runtime to minimize data movements and/or move computations close to the data. d) Intelligent runtimes The programming models provide the interface for the programmer to describe its requirements in terms of computations, data accesses and dependences or communications. It is then up to the runtimes to optimize the mapping of those requirements to the available resources when the application is run on a given architecture. Different implementations of the runtimes may vary in overhead, latency or bandwidth, targeting different platforms (from very distributed to fine- grain parallel) and thus supporting different granularities. Our philosophy is that runtimes should decide actual data placements, replication, transfers (whether data to computation, computation to data or intermediate). As stated in the previous section, the layer above can and should help by providing hints to the runtime but it is important that this interface is defined in as abstract terms as possible, resulting in conveying useful information without requiring the programmer or user to control the actual
4 details of the architecture. This is certainly an area where concurrency and computation have to be integrally handled and where a lot of intelligence has to be developed and embedded in the system. e) Architectural support Finally, research in architecture and hardware support has a potential huge pay off. This applies both to processor design and storage system architectures. Of special interest is to propose processor optimizations for certain functionalities relevant for application algorithms and programming model support. A particular challenge is to holistically study the dimensioning of capacity and bandwidths of the different memory/communication/storage layers in the sight of new technologies, the locality and asynchronous data movement potential optimizations by the runtime and the algorithm characteristics. BSC technologies The BSC effort to develop a middleware to support the different internal application projects enumerated in the introduction builds around a shared persistent object data management layer cleanly integrated in the generic StarSs programming model. We also explore different architectural features that will provide an efficient support for both the data and computation aspects of these applications. In the data management area, different teams at BSC have been working in the past on aspects relevant to Big Data: resource management in MapReduce focusing on task scheduling driven by actual resource consumption and high- level performance metrics; resource- aware task scheduling with completion time goals; decoupling user interfaces from data layout in non- relational databases (using Apache Cassandra as a representative of non- relational database); usage of specialized hardware architectures as IBM s BGAS platform for running some of BSC s genomics workflows; and integration of its programming models with the BGAS key/value store. Within the Severo Ochoa Project we are integrating the research experience of those groups onto a unified data access and management layer. Our global objective is to devise a persistent object storage interface that can be mapped on top of different underlying data management infrastructures and cleanly integrated with the programming model developments described in the following paragraphs. At the programming model level, the StarSs concept is that the programmer specifies tasks and the directionality of the data accesses they perform. The sequential execution of a control flow instantiates the tasks and their dependences are computed by the runtime based on the directionality annotations. Key philosophical considerations are malleability, support for automatic locality management based on the data access annotations that are used to build dependences, and resource independence (specification of tasks and not threads, processes or other abstractions). The concept is implemented in COMPSs at the medium/coarse grain (tens of milliseconds and up) level and in OmpSs at the fine/medium grain (few microseconds to tenths of milliseconds or seconds) parallel programming level. We think that their hierarchical
5 combination provides a unified conceptual framework and a simple to use interface leading to programs focusing on the actual scientific problem they target. The models allow for very dynamic mapping of computations and data accesses to the available resources by intelligent runtimes and data management layers that we provide. COMPSs aim is to support flexible computational workflows. Several language bindings are available: C, C++ and Java for the original implementation and PyCOMPSs, a recent development that introduces efficient parallel support in Python. The annotations of argument directionalities are provided through method interfaces in Java and through decorators in Python. In its original implementation, the COMPSs runtime computed dependences using the files read or written by the application, targeting only coarse granularities. In the current version, the arguments to the tasks can also be objects declared in the language type model and dependences are also computed based on accesses to such local objects. The execution engine actually offloads the objects and computations to different cores or nodes, supporting the efficient parallel execution of medium granularity programs. Tasks from the same computational workflow can actually be offloaded to different nodes within the local cluster or to external cloud resources in a transparent way. PyCOMPSs offers an elegant solution to automatically parallelize/distribute applications in the widely used Python language. By cleanly integrating the persistent object model in it we expect to offer to application programmers the possibility to incrementally enable Big Data requirements in existing applications. The environment will also ease and encourage the productive programming of more dynamic workflows. At the finer granularity level, OmpSs offers C, C++ and FORTRAN bindings. Its design follows the same philosophical considerations as COMPSs: asynchrony (through the dynamic building of the dependences and dataflow- based execution), nesting and heterogeneity (accelerators). Major features of the dependence specification mechanism proposed in OmpSs have been adopted in the latest specification of the OpenMP standard (version 4.0). Our efforts try to integrate the philosophical guidelines presented in the previous section in the design and implementation of the different components of our environment. We expect that the cooperation within the Severo Ochoa project of scientific teams in the different areas working on the algorithmic aspects and building on top of the middleware and architectural developments proposed will lead to significant demonstrations of real social impact.
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Software-defined Storage Architecture for Analytics Computing
Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Building Platform as a Service for Scientific Applications
Building Platform as a Service for Scientific Applications Moustafa AbdelBaky [email protected] Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Evaluator s Guide. McKnight. Consulting Group. McKnight Consulting Group
NoSQL Evaluator s Guide McKnight Consulting Group William McKnight is the former IT VP of a Fortune 50 company and the author of Information Management: Strategies for Gaining a Competitive Advantage with
Software Development around a Millisecond
Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.
APPLICATION NOTE. Elastic Scalability. for HetNet Deployment, Management & Optimization
APPLICATION NOTE Elastic Scalability for HetNet Deployment, Management & Optimization Introduction Most industry reports indicate that the HetNet market is poised for explosive growth in the coming years.
159.735. Final Report. Cluster Scheduling. Submitted by: Priti Lohani 04244354
159.735 Final Report Cluster Scheduling Submitted by: Priti Lohani 04244354 1 Table of contents: 159.735... 1 Final Report... 1 Cluster Scheduling... 1 Table of contents:... 2 1. Introduction:... 3 1.1
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga
Programming models for heterogeneous computing Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga Talk outline [30 slides] 1. Introduction [5 slides] 2.
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
IBM Deep Computing Visualization Offering
P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: [email protected] Summary Deep Computing Visualization in Oil & Gas
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Part I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
HPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox [email protected] http://www.infomall.org
Cloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 [email protected]
Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
Data sharing in the Big Data era
www.bsc.es Data sharing in the Big Data era Anna Queralt and Toni Cortes Storage System Research Group Introduction What ignited our research Different data models: persistent vs. non persistent New storage
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
Business Intelligence and Service Oriented Architectures. An Oracle White Paper May 2007
Business Intelligence and Service Oriented Architectures An Oracle White Paper May 2007 Note: The following is intended to outline our general product direction. It is intended for information purposes
Cloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
Designing a Cloud Storage System
Designing a Cloud Storage System End to End Cloud Storage When designing a cloud storage system, there is value in decoupling the system s archival capacity (its ability to persistently store large volumes
OS/Run'me and Execu'on Time Produc'vity
OS/Run'me and Execu'on Time Produc'vity Ron Brightwell, Technical Manager Scalable System SoAware Department Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
Pluribus Netvisor Solution Brief
Pluribus Netvisor Solution Brief Freedom Architecture Overview The Pluribus Freedom architecture presents a unique combination of switch, compute, storage and bare- metal hypervisor OS technologies, and
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Adaptive Task Scheduling for Multi Job MapReduce
Adaptive Task Scheduling for MultiJob MapReduce Environments Jordà Polo, David de Nadal, David Carrera, Yolanda Becerra, Vicenç Beltran, Jordi Torres and Eduard Ayguadé Barcelona Supercomputing Center
Intel Xeon +FPGA Platform for the Data Center
Intel Xeon +FPGA Platform for the Data Center FPL 15 Workshop on Reconfigurable Computing for the Masses PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA
Amazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
EII - ETL - EAI What, Why, and How!
IBM Software Group EII - ETL - EAI What, Why, and How! Tom Wu 巫 介 唐, [email protected] Information Integrator Advocate Software Group IBM Taiwan 2005 IBM Corporation Agenda Data Integration Challenges and
for Oil & Gas Industry
Wipro s Upstream Storage Solution for Oil & Gas Industry 1 www.wipro.com/industryresearch TABLE OF CONTENTS Executive summary 3 Business Appreciation of Upstream Storage Challenges...4 Wipro s Upstream
How To Write A Trusted Analytics Platform (Tap)
Trusted Analytics Platform (TAP) TAP Technical Brief October 2015 TAP Technical Brief Overview Trusted Analytics Platform (TAP) is open source software, optimized for performance and security, that accelerates
MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS
MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS Michael Feldman White paper November 2014 MARKET DYNAMICS Modern manufacturing increasingly relies on advanced computing technologies
A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage
Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform
On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...
IBM System x reference architecture solutions for big data
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
JOURNAL OF OBJECT TECHNOLOGY
JOURNAL OF OBJECT TECHNOLOGY Online at www.jot.fm. Published by ETH Zurich, Chair of Software Engineering JOT, 2008 Vol. 7, No. 8, November-December 2008 What s Your Information Agenda? Mahesh H. Dodani,
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April 2009. Page 1 of 12
XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines A.Zydroń 18 April 2009 Page 1 of 12 1. Introduction...3 2. XTM Database...4 3. JVM and Tomcat considerations...5 4. XTM Engine...5
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
WSO2 Message Broker. Scalable persistent Messaging System
WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture
Big Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim [email protected] Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
Hadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
Diagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
bigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Professional Organization Checklist for the Computer Science Curriculum Updates. Association of Computing Machinery Computing Curricula 2008
Professional Organization Checklist for the Computer Science Curriculum Updates Association of Computing Machinery Computing Curricula 2008 The curriculum guidelines can be found in Appendix C of the report
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems
SOFT 437 Software Performance Analysis Ch 5:Web Applications and Other Distributed Systems Outline Overview of Web applications, distributed object technologies, and the important considerations for SPE
Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
Addressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
Best Practices in Leveraging a Staging Area for SaaS-to-Enterprise Integration
white paper Best Practices in Leveraging a Staging Area for SaaS-to-Enterprise Integration David S. Linthicum Introduction SaaS-to-enterprise integration requires that a number of architectural calls are
Hybrid Software Architectures for Big Data. [email protected] @hurence http://www.hurence.com
Hybrid Software Architectures for Big Data [email protected] @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line
Introduction to DISC and Hadoop
Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and
Mobile and Heterogeneous databases Database System Architecture. A.R. Hurson Computer Science Missouri Science & Technology
Mobile and Heterogeneous databases Database System Architecture A.R. Hurson Computer Science Missouri Science & Technology 1 Note, this unit will be covered in four lectures. In case you finish it earlier,
How To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce
Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications
Grid Computing vs Cloud
Chapter 3 Grid Computing vs Cloud Computing 3.1 Grid Computing Grid computing [8, 23, 25] is based on the philosophy of sharing information and power, which gives us access to another type of heterogeneous
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
CloudDB: A Data Store for all Sizes in the Cloud
CloudDB: A Data Store for all Sizes in the Cloud Hakan Hacigumus Data Management Research NEC Laboratories America http://www.nec-labs.com/dm www.nec-labs.com What I will try to cover Historical perspective
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
Integrating Netezza into your existing IT landscape
Marco Lehmann Technical Sales Professional Integrating Netezza into your existing IT landscape 2011 IBM Corporation Agenda How to integrate your existing data into Netezza appliance? 4 Steps for creating
Data Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
secure intelligence collection and assessment system Your business technologists. Powering progress
secure intelligence collection and assessment system Your business technologists. Powering progress The decisive advantage for intelligence services The rising mass of data items from multiple sources
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING
ENZO UNIFIED SOLVES THE CHALLENGES OF OUT-OF-BAND SQL SERVER PROCESSING Enzo Unified Extends SQL Server to Simplify Application Design and Reduce ETL Processing CHALLENGES SQL Server does not scale out
Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
