Soma: Linked Data Infrastructure



Similar documents
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

HDP Hadoop From concept to deployment.

Big Data and Data Science. The globally recognised training program

How To Create A Data Visualization With Apache Spark And Zeppelin

CRITEO INTERNSHIP PROGRAM 2015/2016

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

The Future of Data Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Architecture & Experience

BIG DATA TRENDS AND TECHNOLOGIES

Workshop on Hadoop with Big Data

Big Data - Business, Math, Technology Best combination for big data 商 业 理 解, 数 据 科 学, 技 术 实 践 之 完 美 结 合

Investor Presentation. Second Quarter 2015

BIG DATA What it is and how to use?

Big Data Use Case: Business Analytics

Big data blue print for cloud architecture

Big Analytics in the Cloud. Matt Winkler PM, Big

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Ganzheitliches Datenmanagement

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Modernizing Your Data Warehouse for Hadoop

Are You Big Data Ready?

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Data Refinery with Big Data Aspects

Oracle Big Data SQL Technical Update

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

Are You Ready for Big Data?

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Internet of Things. Opportunity Challenges Solutions

Extend your analytic capabilities with SAP Predictive Analysis

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

The Big Data Revolution: welcome to the Cognitive Era.

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

The 4 Pillars of Technosoft s Big Data Practice

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

BigMemory and Hadoop: Powering the Real-time Intelligent Enterprise

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Gerrit and Jenkins for Big Data Continuous Delivery. Santa Clara, CA, September 2-3

Big Data Analytics OverOnline Transactional Data Set

Open Source for Cloud Infrastructure

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

What s next for the Berkeley Data Analytics Stack?

Cisco Data Preparation

locuz.com Big Data Services

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Customer Case Study. Sharethrough

Ubuntu and Hadoop: the perfect match

Logentries Insights: The State of Log Management & Analytics for AWS

Unified Batch & Stream Processing Platform

Augmented Search for Software Testing

Big Data for Big Intel

Upcoming Announcements

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Bringing Big Data Modelling into the Hands of Domain Experts

Big Data Web Analytics Platform on AWS for Yottaa

How To Turn Big Data Into An Insight

Big Data on Microsoft Platform

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Building Your Big Data Team

Improve performance and availability of Banking Portal with HADOOP

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Cost-Effective Business Intelligence with Red Hat and Open Source

tuplejump The data engineering platform

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Blazent IT Data Intelligence Technology:

HPC technology and future architecture

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Information Builders Mission & Value Proposition

Creating Power BI solutions using Power BI Desktop

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Gain Contextual Awareness for a Smarter Digital Enterprise with SAP HANA Vora

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Comprehensive Analytics on the Hortonworks Data Platform

Supercharge your MySQL application performance with Cloud Databases

COMP9321 Web Application Engineering

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Keyword: YARN, HDFS, RAM

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Analytics on Spark &

Hadoop in the Hybrid Cloud

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

From Spark to Ignition:

How To Make Sense Of Data With Altilia

Are You Ready for Big Data?

BIG Data Analytics Move to Competitive Advantage

White paper: Delivering Business Value with Apache Mesos

Bringing the Power of SAS to Hadoop. White Paper

Creating Big Data Applications with Spring XD

Changing the Equation on Big Data Spending

Transcription:

Soma: Linked Data Infrastructure

What is Soma? It s Big Data Candy for the Cloud. The Soma platform helps Data Scientist to collaborate together to discover and share new facts from large datasets hosted on shared infrastructure. All this while lowering development & operations bottom line.

Meet our Customers Expert See themselves as experts or an authority on a subject. Wants the big picture, likes easy to use specialised applications with great visualisation. Researcher See themselves as scientists. People with deep academic background in maths, machine learning & modeling complex processes. Reluctant coders. Creative People who see themselves as Data artists. Need to explain the meaning of the data. Good generalists, can code, with a flare for the visual or data narrative. Engineer See themselves as engineers. Focused on the technical problem of managing data how to get it, store it, and learn from it. Normally strong software developers with some O/R statistics.

Customers we support now Engineer Focused on the technical problem of managing data Normally strong software developers Creative Need to explain the meaning of the data. Good generalists, can code, with a flare for the visual or data narrative. Researcher People with deep academic background in science, maths, machine learning Reluctant coders.

What we deliver to customers Engineer Now: Big Data Cluster Container Management November: Storage frameworks Creative Now: Gitlab integration from gitlab Web facing applications Researcher Now: Discovery early adopters Early September Discovery platform rollout

Features Fully operational big data station Right Now Mesos based Cloud O/S Cluster of 88 CPUs 295 GB of memory Distributed Application Scheduling Resource Scheduling Container Management DNS service discover

Deployment Gitlab Mesos Cluster Zookeeper Cluster HDFS Cluster Integrated DNS CI servers Docker Registry

Deeper Dive Gitlab All applications MUST be in gitlab Mesos Cluster and Container Manager Let s have a look at what is running right now:

Lambda architecture can mix both batch and real-time processing process at batch and realtime Velocity

Data sources

Features Source Control Management Continuous Deployment Service Monitoring Always available key datasets DBPedia SemanticWeb Dogfood

Continuous Deployment 1. Have gitlab account 2. Ask Research ops to add Soma Role to your project 3. If you are accepted you will be guided through dockerizing you gitlab project 4. Once accepted, every push to your master branch will be deployed and accessible online through soma.

Features Integrated Discovery platform SOMA Discover - hosted discovery tool based on smarter data project allowing exploration of data and sharing results. Other internal tools such as Sig.ma, Social Lens, and other projects to follow.

Goals for Research Ops Nurture a Data Engineering community at Insight with supportive experts, shared tools & best practices Provide a Shared analytics platform for Data Scientists at Insight (Soma) Encourage new research and engagements with the wider big data analytics research community

Nurture Provide a structured approach to managing and releasing all Engineering IP (Code and Data) at insight Source control (Git) release management Assist in IP management Provide Quality Circles for Engineering practices 2 Groups - Data Visualisation & Big Data, Workshops to commence this month.

Provide Build big data infrastructure for Insight Soma platform Support Hadoop ongoing development Hadoop clusters, Dataspace support Support Ad Hoc projects requiring scale Cancer atlas Provide Big Data Expertise to the Linked Data group Hadoop, Yarn, Mesos, Spark, Dataspace, Mongo and Virtuoso

Problems being met High cost in research when data scales to Big Data [P1] Ad Hoc Maintenance of big data sets is expensive [P2] Development complexity of valuable Big Data jobs is prohibitive [P3] The high cost in Operating Big Data infrastructure [P4] Scarcity of hardware and lack of funds for new Hardware [P5] Inability to maintain a core operations team [P7] Missed opportunity for researcher to collaborate [P6]

Soma serving our customers Soma Create - Serves data fresh from the source. Has queryable large datasets that are both highly available & up-to-date. Has service to mash these up. Soma Engineer - Provides a Lambda architecture consuming, cleaning, processing and loading the data to the data layer. Soma Discover - Useful blocks of processing that can connected together using a nice GUI, works with many datastores Soma Expert - vertical applications solving a real world problem, these apps are built by Insight s Data Researchers and Data Creatives.

The 4 kinds of Data Scientist Expert See themselves as experts or an authority on a subject. Wants the big picture, likes easy to use specialised applications with great visualisation. Researcher See themselves as scientists. People with deep academic background in maths, machine learning & modeling complex processes. Reluctant coders. Creative People who see themselves as Data artists. Need to explain the meaning of the data. Good generalists, can code, with a flare for the visual or data narrative. Engineer See themselves as engineers. Focused on the technical problem of managing data how to get it, store it, and learn from it. Normally strong software developers with some O/R statistics.

Goals Soma to be a complete ecosystem to help researchers deliver Big Data distributed applications Showcase Insight expertise Standardize best practices for linked data at big data scales Delivers targeted applications & tools tools to build complex analytics apps & job management

Distributed O/S (Better than cloud) We use Mesos based infrastructure to provide Scheduling Process Execution of Jobs/Applications across the cluster Resource scheduling of the needed CPU/Memory/Storage for these applications

SOMA Discover (Data)

Where we are now What we have Soma Engineer - Standard Mesos platform - Provides a Lambda architecture consuming, cleaning, processing and loading the data to the data layer. Soma Discover - Smarter Data - an interactive expressive query tool creates data blocks & visualisations What we need help on Soma Expert - Pivoty - a medical index built from standard HCLS datasets and uses a Pivot Browser Soma Create - The Insight Standard Dataset - a shared queryable standard set of big-data sources