Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases
|
|
- Bernadette Flowers
- 8 years ago
- Views:
Transcription
1 1
2 Introduc)on to urika Mul)threading SPARQL Database urika Appliance XMT- 2 Programming Use Cases 2
3 MTA Gallium arsenide: Proof of concept First produc,on implementa,on of latency- tolerant mul,threading MTA Transi5on the microprocessor to CMOS Inven,on of scaling memory in a mul,threading system XMT Integrated MTA and main Cray product line XT First hybrid system to support X86 and mul,threading CPUs XMT Made system ready for enterprise produc5on Largest shared memory machine in the world First installa,on CSCS, May 2011 YarcData urika 2012 Op5mized for Scalable Graph Analy5cs 3
4 Uses the same cabinets, boards, scalable interconnect, I/O and storage infrastructure, user environment, and administra)ve tools just changes the processor Cabinet 24 blades per cabinet Ver5cal airflow with op5onal liquid assist Compute blades 4 Threadstorm processors GB per processor Cray XT service and I/O subsystem PCIe connec5ons to storage and networks Scalable Lustre global file system Cray XT high- speed 3D torus network Cray XT power and RAS systems Linux based user environment 4
5 Social Networking Intelligence/Security Telecom/Mobile Life Sciences/Biology Healthcare/Medicine Finance Internet/WWW Supply Chain 5
6 Big Data Structured Semi-structured Unstructured Scale- out Analy)cs Unstructured Data Highly Par))onable Datasets Keyword Search Example: Hadoop, column stores Data Warehouses BI Tools Structured Data OLAP Cubes Regular Queries, Known Variables Example: Oracle Exadata In- memory Analy)cs Structured and Unstructured Data Non- par))onable datasets Complex Queries ( PaYern Matching ) Example: YarcData urika 6
7 7 Graphs are hard to Partition Graphs are not Predictable Graphs are highly Dynamic? High cost to follow relationships that span Cluster Nodes High cost to follow multiple competing paths which cannot be prefetched/cached High cost to load multiple, constantly changing datasets into in-memory graph models Network is 100 times SLOWER than Memory* Memory is 100 times SLOWER than Processor* Storage I/O is 1000 times SLOWER than Memory I/O* *Source: Hennessy, J. and Patterson, D., Computer Architecture: A Quantitative Approach, 2012 edition 7
8 8 Graphs are hard to Partition Large Shared Memory Up to 512 TB Graphs are not Predictable Massively Mul5- threaded 128 threads/processor Graphs are highly Dynamic Highly Scalable I/O Up to 350 TB/hr Real-time, Interactive Analytics on Big Data Graph Problems 8
9 Cray XMT- 2 Hardware Engine Originally designed for deep analysis of large datasets Very large scalable shared memory Architecture can support 512TB shared memory Typical systems are 2 TB to 32 TB Mul5threading Unique highly mul5threaded architecture 128 hardware threads per processor Extreme parallelism, hides memory latency Mul)threaded Graph Database Highly parallel in- memory RDF / SPARQL quad store High performance inference engine High performance parallel I/O Industry Standard Front End Based on Jena open source seman5c DB All standard Linux infrastructure and languages Lustre parallel file system 9 9
10 1 0
11 Rela)ve latency to memory con)nues to increase Vector processors amor,ze memory latency Cache- based microprocessors reduce memory latency Mul5threaded processors tolerate memory latency Mul)threading is most effec)ve when: Parallelism is abundant Data locality is scarce Large graph problems perform well on this architecture Graph analysis Seman5c databases Social network analysis 1 1
12 Many threads per processor core; small thread state Thread- level context switch at every instruc)on cycle registers ALU stream conventional processor program counter multithreaded processor Slide
13 Conven)onal processor Mul)threaded processor memory memory When one or a few threads stall, memory/ network bandwidth become idle Although some threads stall, others keep issuing local/remote memory requests, keeping most precious resources busy network network Slide 1 3
14 To the programmer, a mul)ple processor XMT- 2 looks like a single processor, except that the number of threads is increased. Bokhari/Sauer
15 1 5
16 RDF triples databases are inherently graphical Some researchers call seman)c databases seman)c graph databases Vancouver in-country Canada is-type country object Edmonton has-population 730,000 North America 1 6
17 Resource Descrip)on Framework (RDF) Designed to enable seman5c web searching and integra5on of disparate data sources W3C standard formats Every datum represented as subject/predicate/object Ideally with each of those expressed with a URI Standard ontologies in some domains e.g., Open Biological and Biomedical Ontologies (OBO) Examples: subject predicate object <ncbitax:ncbitaxon_840261> rdf:type owl:class <ncbitax:ncbitaxon_195644> rdfs:subclassof <ncbitax:ncbitaxon_185881> <ncbitax:ncbitaxon_816681> rdfs:label "Characiformes sp. BOLD:AAG5151"@en 1 7
18 SPARQL Protocol and RDF Query Language (SPARQL) Enables matching of graph paderns in the seman5c DB Reminiscent of SQL # Lehigh University BenchMark (LUBM) Query 9 PREFIX rdf: < PREFIX ub: < SELECT?X,?Y,?Z WHERE {?X rdf:type ub:student.?y rdf:type ub:faculty.?z rdf:type ub:course.?x ub:advisor?y.?y ub:teacherof?z.?x ub:takescourse?z} PREFIX == shorthand for a URI variables to be returned from the query find sets of (X, Y, Z) with a subject X of type Student, a subject Y of type Faculty, and a subject Z of type Course, where X is an advisee of Y, Y teaches course Z, and X takes course Z 1 8
19 1 9
20 20 RDF SPARQL Java Linux Apache Tomcat/Jena Graph Analy)c Tools Management, Security, Data Pipeline Graph Analy)c Database RDF datastore and SPARQL engine Gadgets / Mashups Graph Analy)c Plagorm Shared- memory, Mul5- threaded, Scalable I/O graph- op5mized hardware Data Center sees another Linux server Applica)ons see industry standard interface RDF, SPARQL, Java, Gadgets Reuse Exis)ng Skillsets Java, OSGI, App Server, SOA, ESB, Web toolkit Standard languages All applica5ons and ar5facts built on urika can be run on other plalorms 2 0
21 SELECT?p?x WHERE { (?x type person) (?p sells DVD ) (?x shops-at?p) } API SPARQL Engine database BGP SPARQL Ingest Applica)on Jena/Fuseki HTTP Interface Send query Parse, interpret query Set up query engine ORDER DISTINCT OPTIONAL UNION interface to Web Receive results Translate, send results FILTER generate low-level query urika service nodes urika compute nodes
22 Data Warehouse Hadoop Other Big Data Appliances Existing Analytic Environment(s) Export Analy)c/ Rela)onship Results Import Graph Datasets User/App Visualiza)on 2 2
23 2 3
24 Automa)c Parallelizing C / C++ Compiler Compiler runs on Linux login node The executable runs on the compute nodes Provides automa5c parallelism if proper op5ons are included Paralleliza5on direc5ves are used to provide hints to the compiler Login Nodes run standard SuSe (SLES 11) Linux All usual Linux languages and applica5ons may be used A hybrid programming model is possible, and encouraged Lightweight user communica)on (LUC) interface Use LUC to build a client/server interface between the front- end and back- end Symmetric; RPC- style interface in either direc5on Lustre global file system High- performance parallel file system used across Cray line 2 4 Slide
25 Primary responsibility for machine virtualiza)on. Maps simple programming model (flat memory, infinite processors) to actual hardware. Performs general management of resources such as processors and hardware streams. Responsible for thread management Thread crea5on/destruc5on/blocking/unblocking. Work scheduling - automa5c intra- process load balancing. Handles hardware traps and OS- induced swaps. Provides a highly- op)mized parallel memory allocator. Provides support for auxiliary func)ons. Tracing and profiling. Client- side debugging. Slide
26 Appren)ce2 GUI applica5on for debugging performance problems Consists of one or more reports based on how your program was compiled and executed Canal Report Feedback from the compiler Insight on how latent parallelism was exploited Informa5on on expected resource u5liza5on and scheduling Tview Report Hardware counter plots Actual performance of your applica5on Run5me trap informa5on for detec5ng hotspots Bprof Report Profile tables in terms of instruc5ons issued and memory references
27 2 7
28 Discover new drugs and treatments The Challenge Mul5ple massive datasets describing biological network graphs in cancer cells from published literature and experimental data, constantly updated Un- par55onable, densely and irregularly connected graphs Medical research churns out massive quan55es of literature, but most researchers don t have 5me to read even their own specialty publica5ons Over 10,000 genomics publica5ons per month currently urika Solu)on urika holds un- par55oned genomic network graph in memory Contrast experimental models and theories with published results to discover previously unknown rela5onships Interac5ve, real 5me access by mul5ple researchers Confirmation of elevated VEGF levels by tissue microarray: 2 8
29 The Cancer Genome Atlas (TCGA) data combined with literature rela)onships from Medline Protein domain interac5ons are extracted from TCGA using Random Forest classifica5on A Normalized Medline Distance (NMD) between proteins is calculated from Medline 5tles and abstracts Protein data from Uniprot and Pfam are also included 1.2 billion entries that generate 6B triples New data sets constantly being added- muta5on data, pa5ent data, methyla5on data, sta5s5cal analysis results and more Current database technology does not have the performance for useful full- scale queries across this data Muta5on Methyla5on Genes (TCGA) Protein Data (UniProt) GRAPH Query Literature (Medline) Protein Families (PFAM) Correla5ons (Sta5s5cal) Pa5ents (Health Records) 2 9
30 Quickly adapt surveillance to new rules and paderns The Challenge Con5nuous stream of possible compliance events New compliance rules require constant updates of signals and paderns Current solu5ons require offline prepara5on and are based on rigid rules urika Solu)on En5re rela5onship graph in memory New Paderns/templates can be iden5fied and added in real- 5me Graphical interac5ve explora5on of rela5onships between people, places, things, organiza5ons, communica5ons, etc. Business Value A responsive, flexible event detec5on plalorm that adapts to new knowledge Figure Source: Graph-based technologies for intelligence analysis,t. Coffman, S. Greenblatt, S. Marcus, Commun. ACM, 47(3):45-47,
31 Deploy beder trading strategies with enhanced Market Sensing The Challenge Trading strategies that can react quickly and con5nuously to market drivers Market drivers are news and other events which may impact current trading outlook, posi5ons Enhanced market sensing requires seman5c analysis of news and a scalable graph- based model to implement urika Solu)on Ability to hold en5re market model in- memory as a dependency graph Ability to filter news relevant to posi5ons based on seman5c analysis Interac5ve, real- 5me access from trader worksta5ons Business Value Beder trading strategies that can be based on augmented market data including news and other secondary events of relevance 3 1
32 Discover similar Pa5ents to op5mize treatment The Challenge Longitudinal, historical data spanning all events, symptoms, diagnoses, diseases, treatments, prescrip5ons, etc of 10M pa5ents including gene5cs and family history Ad- hoc, constantly changing defini5on of similarity based on thousands of parameters Interac5ve, real- 5me response during consulta5on urika Solu)on urika holds en5re rela5onship graph in memory Iden5fy similar pa5ents based on ad- hoc physician specified paderns Interac5ve, real- 5me access by en5re physician community Business Value Consistent selec5on of the most effec5ve treatment for each pa5ent by each doctor every 5me Blood pressure Prior myocardial infarction Hypertension Body mass index HDL Anti-hypertension meds Patient: Jean Generic 3 2
33 3 3
34 SPARQL by Example tutorial, by Lee Feigenbaum, hdp:// by- example/ Search RDF data with SPARQL, by Philip McCarthy Seman5c Web for the Working Ontologist, by Dean Allemang and James Hendler, ISBN Learning SPARQL, by Bob DuCharme, O Reilly, ISBN RDF: SPARQL: sparql- query/ D2R: www4.wiwiss.fu- berlin.de/bizer/d2r- server
Introduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases
1 Introduction to urika Multithreading urika Appliance SPARQL Database Use Cases 2 Gain business insight by discovering unknown relationships in big data Graph analytics warehouse supports ad hoc queries,
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationexcellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own
Steve Reinhardt 2 The urika developers are extending SPARQL s excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own
More informationCray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
More informationYarcData urika Technical White Paper
YarcData urika Technical White Paper 2012 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark, YarcData, urika and Threadstorm are trademarks
More informationurika! Unlocking the Power of Big Data at PSC
urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data
More informationRun$me Query Op$miza$on
Run$me Query Op$miza$on Robust Op$miza$on for Graphs 2006-2014 All Rights Reserved 1 RDF Join Order Op$miza$on Typical approach Assign es$mated cardinality to each triple pabern. Bigdata uses the fast
More informationBig Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas
Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationSupercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
More informationCyber Security With Big Data
Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general
More informationIntroduc8on to Apache Spark
Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these
More informationB2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity
B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business
More informationSeman&c Web: Benefits For Clinical Decision Support At The Bedside. Emory Fry, MD SemTechBiz 2013
Seman&c Web: Benefits For Clinical Decision Support At The Bedside Emory Fry, MD SemTechBiz 2013 Clinical Decision Support (CDS) A system providing knowledge and person specific or popula8on informa8on
More informationAccelerating Application Performance on Virtual Machines
Accelerating Application Performance on Virtual Machines...with flash-based caching in the server Published: August 2011 FlashSoft Corporation 155-A W. Moffett Park Dr Sunnyvale, CA 94089 info@flashsoft.com
More informationDNS Big Data Analy@cs
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,
More informationData Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationData Stream Algorithms in Storm and R. Radek Maciaszek
Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.
More informationSix Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study
Six Days in the Network Security Trenches at SC14 A Cray Graph Analytics Case Study WP-NetworkSecurity-0315 www.cray.com Table of Contents Introduction... 3 Analytics Mission and Source Data... 3 Analytics
More informationUsing RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
More informationChapter 3. Database Architectures and the Web Transparencies
Week 2: Chapter 3 Chapter 3 Database Architectures and the Web Transparencies Database Environment - Objec
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationAVOIDING SILOED DATA AND SILOED DATA MANAGEMENT
AVOIDING SILOED DATA AND SILOED DATA MANAGEMENT Dalton Cervo Author, Consultant, Management Expert September 2015 This presenta?on contains extracts from books that are: Copyright 2011 John Wiley & Sons,
More informationInge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationCloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering
Cloudian The Storage Evolution to the Cloud.. Cloudian Inc. Pre Sales Engineering Agenda Industry Trends Cloud Storage Evolu4on of Storage Architectures Storage Connec4vity redefined S3 Cloud Storage Use
More information1 Actuate Corpora-on 2013. Big Data Business Analy/cs
1 Big Data Business Analy/cs Introducing BIRT Analy3cs Provides analysts and business users with advanced visual data discovery and predictive analytics to make better, more timely decisions in the age
More informationBest Prac*ces for Deploying Oracle So6ware on Virtual Compute Appliance
Best Prac*ces for Deploying Oracle So6ware on Virtual Compute Appliance CON7484 Jeff Savit Senior Technical Product Manager Oracle VM Product Management October 1, 2014 Safe Harbor Statement The following
More informationKeeping Pace with Big Data
- A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,
More informationGraph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
More informationBig Data Are You Ready? Thomas Kyte http://asktom.oracle.com
Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationTechnical White Paper. October 2014. Real-Time Discovery in Big Data Using the Urika-GD. Appliance G OVERN M ENT. www.cray.com
LIFE SCIENCES Technical White Paper Real-Time Discovery in Big Data Using the Urika-GD Appliance SPORTS ANALYTICS FRAU D SCIENTIFIC RESEARCH CYBERSECURITY G OVERN M ENT TELECOMMUNICATIONS CUSTOM ER INSIG
More informationHigh Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote
More informationChing-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
More informationIbis: Scaling Python Analy=cs on Hadoop and Impala
Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT
More informationIns+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho
Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationAn Introduc@on to Big Data, Apache Hadoop, and Cloudera
An Introduc@on to Big Data, Apache Hadoop, and Cloudera Ian Wrigley, Curriculum Manager, Cloudera 1 The Mo@va@on for Hadoop 2 Tradi@onal Large- Scale Computa@on Tradi*onally, computa*on has been processor-
More informationStream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More
Copyright 2015 Splunk Inc. Stream Deployments in the Real World: Enhance Opera?onal Intelligence Across Applica?on Delivery, IT Ops, Security, and More Stela Udovicic Sr. Product Marke?ng Manager Clayton
More informationIntro to BI. Mul0- dimensional Analysis
Intro to BI BI Vendor Landscape BI Roles & Responsibili0es Data Governance and Quality DW Architectures ETL Processes BI Capabili0es & Maturity Mul0- dimensional Analysis BI Vendors and Products Module
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationHadoop- Based Data Explora1on for the Healthcare Safety- Net Technical & Sociocultural Challenges to Big Data Usability
Hadoop- Based Data Explora1on for the Healthcare Safety- Net Technical & Sociocultural Challenges to Big Data Usability David Hartzband, D.Sc. Research Affiliate, SSRC, MIT & Director, Technology Research
More informationAppLogic and the Mainframe: The Ul7mate Private Cloud
MODERNIZE AND OPTIMIZE YOUR MAINFRAME S510 AppLogic and the Mainframe: The Ul7mate Private Cloud Sco@ Fagen Dis7nguished Engineer Chief Architect: Mainframe Abstract Mainframers have been using virtual
More informationNERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015
NERSC Data Efforts Update Prabhat Data and Analytics Group Lead February 23, 2015-1 - A little bit about myself Computer Scien.st Brown, IIT Delhi Real- 3me Graphics, Virtual Reality, HCI Computa3onal
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationThe Data Reservoir. 10 th September 2014. Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons
Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Solu4ons The Reservoir 10 th September 2014 A growing demand Business Teams want Open access to more informa4on More
More informationOS/Run'me and Execu'on Time Produc'vity
OS/Run'me and Execu'on Time Produc'vity Ron Brightwell, Technical Manager Scalable System SoAware Department Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,
More informationUnderstanding the Value of In-Memory in the IT Landscape
February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to
More informationSQream Technologies Ltd - Confiden7al
SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationInfrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
More informationPrivacy- Preserving P2P Data Sharing with OneSwarm. Presented by. Adnan Malik
Privacy- Preserving P2P Data Sharing with OneSwarm Presented by Adnan Malik Privacy The protec?on of informa?on from unauthorized disclosure Centraliza?on and privacy threat Websites Facebook TwiFer Peer
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More information! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationThe IBM Cognos Platform for Enterprise Business Intelligence
The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics
More informationPerformance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
More informationThe Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays
The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays Executive Summary Microsoft SQL has evolved beyond serving simple workgroups to a platform delivering sophisticated
More informationIncreasing Flash Throughput for Big Data Applications (Data Management Track)
Scale Simplify Optimize Evolve Increasing Flash Throughput for Big Data Applications (Data Management Track) Flash Memory 1 Industry Context Addressing the challenge A proposed solution Review of the Benefits
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationANALYTICAL TECHNIQUES FOR DATA VISUALIZATION
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationBIG DATA AND INVESTIGATIVE ANALYTICS
The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationCustomer experiences in implemen0ng SKOS- based vocabulary management systems, Ralph Hodgson, TopQuadrant. CWI, Amsterdam, April 3, 2014
LDBC Consor*um Fourth Technical User Community (TUC) mee*ng Customer experiences in implemen0ng SKOS- based vocabulary management systems, and other Seman0c- Technology- Driven Systems. Ralph Hodgson,
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationReal-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
More informationBig Data in medical image processing
Big Data in medical image processing Konstan3n Bychenkov, CEO Aligned Research Group LLC bychenkov@alignedresearch.com Big data in medicine Genomic Research Popula3on Health Images M- Health hips://cloud.google.com/genomics/v1beta2/reference/
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationCloud Compu?ng & Big Data in Higher Educa?on and Research: African Academic Experience
3 rd SG13 Regional Workshop for Africa on ITU- T Standardiza?on Challenges for Developing Countries Working for a Connected Africa (Livingstone, Zambia, 23-24 February 2015) Cloud Compu?ng & Big Data in
More informationNextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
More informationOracle Spatial and Graph. Jayant Sharma Director, Product Management
Oracle Spatial and Graph Jayant Sharma Director, Product Management Agenda Oracle Spatial and Graph Graph Capabilities Q&A 2 Oracle Spatial and Graph Complete Open Integrated Most Widely Used 3 Open and
More informationScalus A)ribute Workshop. Paris, April 14th 15th
Scalus A)ribute Workshop Paris, April 14th 15th Content Mo=va=on, objec=ves, and constraints Scalus strategy Scenario and architectural views How the architecture works Mo=va=on for this MCITN Storage
More informationIntegrating Open Sources and Relational Data with SPARQL
Integrating Open Sources and Relational Data with SPARQL Orri Erling and Ivan Mikhailov OpenLink Software, 10 Burlington Mall Road Suite 265 Burlington, MA 01803 U.S.A, {oerling,imikhailov}@openlinksw.com,
More informationHow To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9
Copyright 2014 Splunk Inc. Splunk for Mobile Intelligence Bill Emme< Director, Solu?ons Marke?ng Panos Papadopoulos Director, Product Management Disclaimer During the course of this presenta?on, we may
More informationIntroducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
More information5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
More informationSo#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell
So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell R&D Manager, Scalable System So#ware Department Sandia National Laboratories is a multi-program laboratory managed and
More informationTexas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson
Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over
More informationData Center Evolu.on and the Cloud. Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM
Data Center Evolu.on and the Cloud Paul A. Strassmann George Mason University November 5, 2008, 7:20 to 10:00 PM 1 Hardware Evolu.on 2 Where is hardware going? x86 con(nues to move upstream Massive compute
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationNetwork Graph Databases, RDF, SPARQL, and SNA
Network Graph Databases, RDF, SPARQL, and SNA NoCOUG Summer Conference August 16 2012 at Chevron in San Ramon, CA David Abercrombie Data Analytics Engineer, Tapjoy david.abercrombie@tapjoy.com About me
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationCS 5150 So(ware Engineering System Architecture: Introduc<on
Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering System Architecture: Introduc
More informationProtec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology
Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology Alexey Kirichenko, F- Secure Corpora7on ICT SHOK, Future Internet program 30.5.2012 Outline 1. Security WP (WP6) overview
More informationHow To Use A Webmail On A Pc Or Macodeo.Com
Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five
More informationAlgorithms and Tools for Scalable Graph Analy8cs. Kamesh Madduri
Algorithms and Tools for Scalable Graph Analy8cs Kamesh Madduri Computer Science and Engineering The Pennsylvania State University madduri@cse.psu.edu MMDS 2012 July 13, 2012 This talk: A methodology for
More informationLDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationComputer Networks. Examples of network applica3ons. Applica3on Layer
Computer Networks Applica3on Layer 1 Examples of network applica3ons e- mail web instant messaging remote login P2P file sharing mul3- user network games streaming stored video clips social networks voice
More information.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau.nl ENTRADA Vijfde niveau CENTR-tech 33 November 2015 Marco Davids, SIDN Labs Wie zijn wij? Mijlpalen
More informationReturn on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013
Return on Experience on Cloud Compu2ng Issues a stairway to clouds Experts Workshop Agenda InGeoCloudS SoCware Stack InGeoCloudS Elas2city and Scalability Elas2c File Server Elas2c Database Server Elas2c
More informationBeyond Watson: The Business Implications of Big Data
Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT
More information