DataBridge Arcot Rajasekar" The University of North Carolina at Chapel Hill "

Size: px
Start display at page:

Download "DataBridge http://databridge.web.unc.edu/" Arcot Rajasekar" rajasekar@unc.edu The University of North Carolina at Chapel Hill ""

Transcription

1 DataBridge Arcot Rajasekar" The University of North Carolina at Chapel Hill "

2 Data Bridge: A Social Network for Long Tail Science Data" Outline of the Talk" Motivation" Design" Implementation Status" Examples" Future" " Data Bridge" 2"

3 Big Data" Well-known problem" Characteristics:" Volume" Exponential Increase in Size & Count" Velocity" Speed of Generation & Consumption" Variety" Disparate Types of data" Veracity" Integrity & Fidelity" Value" Worth" cognizan(.weber.org infographics.socialnama.com Data Bridge" 3"

4 Three Kinds of Big Data (1)" Archetypal Big Data" Science Projects LHC, LSST, SCEC, OOI " Business/Industry Genomics, Finance, Pharma " Government NASA, NOAA, NCDC " " Volume High large datasets" Velocity High but predictable" Variety Low Standardized, Metadata, Curated" Veracity High Fidelity and Credible" Value High funded" Findability High Known sites and discovery mechanisms" Availability High Published API" Light & Visible Data " infographics.socialnama.com wired.com 4"

5 Three Kinds of Big Data (2)" Crowd-Sourced Big Data" Social Media Facebook, Twitter " Recommenders Yelp, Angie s List, Groupon" Web Commerce Amazon, Ebay, Orbitz, enews" Volume High small data" Velocity High and non predictable" Variety High But well managed" Veracity Mixed Low to High" Value Ephemeral can be None to High" Findability High Known and Advertised" dreams(me.com Availability Immediate Interest Web pages and Apps" Nova-like Data " Data Bridge" 5"

6 Three Kinds of Big Data (3)" Long-tail Big Data" Science Projects small teams and organizations" Personal Hobbies, Amateur/Citizen Science/Arts" Government Internal and unpublished" Volume High small data sets" Velocity Low" Variety High Too many, Idiosyncratic" Veracity Non Credible until proven" Value unknown" Findability None Hidden and not advertised" Availability None In local, disks and tapes" Dark Data " teradata.com Images.frompo.com 6"

7 Our Interest: Long tail of Science Data" Large number of data generators" Highly distributed" NSF in 2011" 11,150 awards" Median size $126K" Primarily single PI" Data individually not petascale but large in aggregate" Of possible Value " Dark Data" Unpublished" Used once and Forgotten" Sunset Data" Even NSF retention expects at the least only 3 years after project" Expired Data " curated but disposed" Social media data" com2733amandaathens.blogspot.com 7"

8 Problems: Long-tail of Science Data" First Mile Problem" How to make it available?" Where do I upload?" Who is in charge?" How do I get credit?" Can I control access?" How do I pool with other like-minded researchers? Community services?" How much is long-term?" Who pays for it?" Last Mile Problem" How to make it findable?" What is needed to make it more visible? Metadata?" Are there other methods to make my data findable?" My data has specific ways & characteristics" How do I expose " them as finding " aids?" How can I find similar " data?" Solving the long-tail problem will also help " other two Big Data problems " blog.enrichconsul(ng.com Data Bridge" 8"

9 Dark Data from The Long Tail of Science " Long tail data amounts to small data sets produced by numerous investigators." Dark Data Exemplars:" From Brahe to Mendel, discovery has come from relatively small data sets" Much long tail data is dark data, data not easily found by potential users (Bryan Heidorn)" Long tail data sets lack structural advantages of classic Big Data, such as professional curation and homogeneous formats and well-documented data formats and populated & well-formed metadata schema." Improving availability and findability will help " solve the problems with this type of big data " and make it more mainstream." " Expose the hidden nuggets" astrosolar.com 9"

10 Data Bridge: A Social Network for Long Tail Science Data" Outline of the Talk" Motivation" Design" Implementation Status" Examples" Future" " Data Bridge" 10"

11 Data Bridge: A solution" We tackle mainly the last mile problem" But show an avenue for solving the first mile problem" Main Aim: Improve Findability" Solution: Empower Data!!" Empower data to find its own community" Community of likeness" Similarity in multiple dimensions" Look at data from diverse angles" Find " Relationships strong links " Friendships weaker links" Assist scientists in discovering " interesting data sets by automatically " forming communities of data" " Data Bridge" 11"

12 Data Bridge Strategy: Automatic Community Detection " Doc Watson albums at Amazon.com" Clustered by Yasiv.com" Clusters represent " related items" Clusters are connected by some internal " metrics " Data Bridge" 12"

13 Data Bridge: Design" Construct multi-dimensional social networks for data. " Three challenges:" Evaluate multiple types of " metrics on data" Domain-specific, genre-specific, " project-specific" Use Socio-metric Network Algorithms" Similar to but for data" Find relevance" Slices of similarity" Explore Relationships between " Data, Users, Resources, Methods, Workflows " Use Relevance Algorithms" Create communities" Use Clustering Algorithms" Provide an extensible & big data framework" Democratize the process" Data Bridge" 13"

14 Data Bridge Infrastructure" Accommodate for multiple, extensible number of " SNA, RA and CA algorithms" Provision an easy way to add new algorithms" Crowd sourcing" MyVector (add your own way of defining metrics)" Provision an easy way to connect data to algorithms" Multiple ways of finding similarities" Multiple ways of providing search criteria" MyBridge (add your own way of finding relevance)" Provision an easy way to form communities" Multiple ways of categorizing data" MyCommunity (add your own " domain-specific clustering)" Make it a distributed system " Grow and Shrink as needed" Make it easy for third-party setups" Federation of Data Bridge" " Data Bridge" 14"

15 Data Bridge Architecture" 15"

16 Message-oriented Architecture" Loosely linked processes: Messages make the connections" Scenario: " User A has a novel, signature detection algorithm for gene sequences" User A wraps algorithm with API provided by DataBridge" Subscribes to a message for gene sequence data " User B publishes a new gene sequence into DataBridge" A new message is created informing the new addition" User A s algorithm detects, computes relevance to signature " Publishes a new relevance message for this signature " User C has a relevance algorithm that catches A s message" Uses it to add B s sequence to a new data community" Other signature detectors may also look at B s gene sequence" Publish their own relevance, if applicable" User A can also look at " older gene sequences and " find relevance" Image from support.oyala.com Data Bridge" 16"

17 Data Bridge: A Social Network for Long Tail Science Data" Outline of the Talk" Motivation" Design" Implementation Status" Examples" Future" " Data Bridge" 17"

18 Modules: Current Status " RabbitMQ Messaging system" Ingest Engine" Relevance Engine" Network Engine" Ingestion GUI" DataVerseNetwork access" Meta database" MongoDB" Network Database" Neo4J" Viz Display " Data Bridge" 18"

19 Messages" Message Listener Originator Ingest Metadata Ingest Engine Any Data Provider (DVN) Metadata Available Relevance Engine Metadatabase Create Similarity Matrix Relevance Engine Any Ingest Engine Similarity Matrix Available Network Engine Any Relevance Engine Insert Similarity Matric Ingest Engine User/App Run SNA Algorithm Network Engine Network Engine or user/app SNA Data Available Network Engine Network Database Create Visualiza(on Data Network Database Network Visualiza(on Show Visualiza(on Visualizer User (WebApp) Data Bridge" 19"

20 Example Message Schema" Name Header Insert.Metadata.Java.URI.MetadataDB System headers type subtype User provided headers classname namespace inputuri Example Value databridge ingestmetadata Example Value org.renci.databridgecontrib.ingest.mockingest system_test /projects/databridge/metadata.xml W3.org Data Bridge" 20"

21 Data Bridge: A Social Network for Long Tail Science Data" Outline of the Talk" Motivation" Design" Implementation Status" Examples" Future" " Data Bridge" 21"

22 Screenshot: Finding similarities" Select Network Data Filter Connec(vity by similarity value Data Bridge" 22"

23 Screenshot: Weight of similarity" Similarity measure: "

24 Screenshot: Highlights of similarities" Link to the data 24"

25 Screenshot: Data Access" Data Bridge" " 25"

26 Screenshot: Simple Ingest GUI" Data Bridge" " 26"

27 Data Bridge: A Social Network for Long Tail Science Data" Outline of the Talk" Motivation" Design" Implementation Status" Examples" Future" " Data Bridge" 27"

28 Next Steps" Basic Framework implemented " Applied to a few thousands of datasets" Work to do, advanced features" Documentation" Scaling tests" More types data/metadata to be tested" Ready for new algorithms" Ready for more data" Ready for larger usage" Investigate multiple" similarity measures" Usage, Methods as " relevance" Data Bridge" 28"

29 Players" Howard Lander" Justin Zhan" Merce Crosas" Gary King" Jon Crabtree" Tom Carsey" Sharlini Shankaran" Arcot Rajasekar" Data Bridge" 29"

30 Arcot Rajasekar" The University of North Carolina at Chapel Hill " Conclusion" DataBridge" Motivation" Design" Implementation Status" Examples" Future"

Big Data Analysis in a Message Oriented Framework

Big Data Analysis in a Message Oriented Framework Big Data Analysis in a Message Oriented Framework Arcot Rajasekar 1, Howard Lander 2! 1 School of Information and Library Science, 2 The Renaissance Computing Institute, The University of North Carolina

More information

Merce Crosas, Gary King Harvard University Cambridge, Massachusetts, USA. {mcrosas, king}@harvard.edu

Merce Crosas, Gary King Harvard University Cambridge, Massachusetts, USA. {mcrosas, king}@harvard.edu Sociometric methods for relevancy analysis of Long Tail Science Data Arcot Rajasekar, Sharlini Sankaran, Howard Lander, Tom Carsey, Jonathan Crabtree The University of North Carolina Chapel Hill Chapel

More information

Big Data Operations: Basis for Benchmarking Big Data Systems

Big Data Operations: Basis for Benchmarking Big Data Systems Big Data Operations: Basis for Benchmarking Big Data Systems Justin Zhan North Carolina State A&U University, Greensboro Arcot Rajasekar Reagan Moore Shu Huang Yufeng Xin University of North Carolina at

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

MicroStrategy Course Catalog

MicroStrategy Course Catalog MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Digital Preservation Lifecycle Management

Digital Preservation Lifecycle Management Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar San Diego Supercomputer Center, University of California,

More information

Organic Data Publishing: A Novel Approach to Scientific Data Sharing

Organic Data Publishing: A Novel Approach to Scientific Data Sharing Second International Workshop on Linked Science Tackling Big Data, (LISC 2012), colocated with the International Semantic Web Conference (ISWC), Boston, MA, November 11-15, 2012. Organic Data Publishing:

More information

Data Grids. Lidan Wang April 5, 2007

Data Grids. Lidan Wang April 5, 2007 Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com A Near Real-Time Personalization for ecommerce Platform Amit Rustagi arustagi@ebay.com Abstract. In today's competitive environment, you only have a few seconds to help site visitors understand that you

More information

Enhanced Research Data Management and Publication with Globus

Enhanced Research Data Management and Publication with Globus Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

The National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?

More information

CASC Spring Meeting 2014 Federal Agency Panel Update on Big Data

CASC Spring Meeting 2014 Federal Agency Panel Update on Big Data CASC Spring Meeting 2014 Federal Agency Panel Update on Big Data Robert Chadduck Program Director, Data & CI CISE Division of Advanced Cyberinfrastructure 23 April 2014 ACI data focused CI - A view towards

More information

OpenChorus: Building a Tool-Chest for Big Data Science

OpenChorus: Building a Tool-Chest for Big Data Science OpenChorus: Building a Tool-Chest for Big Data Science Milind Bhandarkar Chief Scientist, Machine Learning Platforms EMC Greenplum 1 Agenda! Tools for Data Science! Data Science Workflow! Greenplum OpenChorus!

More information

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Considerations for Research Data Management

Considerations for Research Data Management Considerations for Research Data Management Andrew Dean - OCF adean@ocf.co.uk - 07508 033894 Wednesday 3 rd December 2014 What is an RDM solution? Research Data Management A method of effectively managing

More information

MAP YOUR WORLD S DATA. CartoDB is the easiest way to map & analyze your location data

MAP YOUR WORLD S DATA. CartoDB is the easiest way to map & analyze your location data MAP YOUR WORLD S DATA CartoDB is the easiest way to map & analyze your location data CartoDB CartoDB leads the world of location intelligence and data visualization, empowering any business and individual

More information

Tidepool Informational Pre-submission Meeting

Tidepool Informational Pre-submission Meeting Tidepool Informational Pre-submission Meeting Prepared for FDA CDRH June 2, 2015 Tidepool attendees: Howard Look, President and CEO (phone) Brandon Arbiter, VP Product and BizDev (phone) Sheila Ramerman,

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT 2014 Cisco and/or its affiliates. All rights reserved. 2 2014 Cisco and/or its affiliates. All rights reserved. 3 IoT World Forum Architecture Committee 2013 Cisco and/or

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Outline. What is Big data and where they come from? How we deal with Big data?

Outline. What is Big data and where they come from? How we deal with Big data? What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

More information

What s New in Analytics: Fall 2015

What s New in Analytics: Fall 2015 Adobe Analytics What s New in Analytics: Fall 2015 Adobe Analytics powers customer intelligence across the enterprise, facilitating self-service data discovery for users of all skill levels. The latest

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Enabling the Big Data Commons through indexing of data and their interactions

Enabling the Big Data Commons through indexing of data and their interactions biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible

More information

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key

More information

irods in complying with Public Research Policy

irods in complying with Public Research Policy irods User Group 2015 irods in complying with Public Research Policy Vic Cornell Senior Storage Consultant Overview Compliance overview UK examples Imperial College MedBio Requirements Architecture irods

More information

Big Data. George O. Strawn NITRD

Big Data. George O. Strawn NITRD Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? NITRD's Big Data Research Initiative Big Data

More information

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Presented by: Dennis Liao Sales Engineer Zach Rea Sales Engineer January 27 th, 2015 Session 4 This Session

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

Vulnerability Management

Vulnerability Management Vulnerability Management Buyer s Guide Buyer s Guide 01 Introduction 02 Key Components 03 Other Considerations About Rapid7 01 INTRODUCTION Exploiting weaknesses in browsers, operating systems and other

More information

Data Publishing Workflows with Dataverse

Data Publishing Workflows with Dataverse Data Publishing Workflows with Dataverse Mercè Crosas, Ph.D. Twitter: @mercecrosas Director of Data Science Institute for Quantitative Social Science, Harvard University MIT, May 6, 2014 Intro to our Data

More information

Machine Learning/Data Mining for Cancer Genomics

Machine Learning/Data Mining for Cancer Genomics Machine Learning/Data Mining for Cancer Genomics Bernard Manderick, Vrije Universiteit Brussel Henry Nyongesa, University of the Western Cape Collaboration: Artificial Intelligence Laboratory VUB Intelligent

More information

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Social Media Implementations

Social Media Implementations SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.

More information

Image Data, RDA and Practical Policies

Image Data, RDA and Practical Policies Image Data, RDA and Practical Policies Rainer Stotzka and many others KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association www.kit.edu Data Life Cycle Lab

More information

How To Understand The Value Of Big Data

How To Understand The Value Of Big Data Big Data Is Not Yet Another IT Project Krish Krishnan President, Sixth Sense Advisors Inc Bridge to Big Data Oct 23 rd 2012 Background Applications, OLTP Systems, Traditional Data Warehouse and Business

More information

How to avoid building a data swamp

How to avoid building a data swamp How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

The New ADS Search Interface and API

The New ADS Search Interface and API The New ADS Search Interface and API Alberto Accomazzi - @aaccomazzi for the ADS team - @adsabs 28 September 2013 IVOA Kona Saturday, September 28, 13 The ADS Classic System No frameworks available in

More information

Delivering a Campus Research Data Service with Globus. GlobusWorld 2014 Keynote

Delivering a Campus Research Data Service with Globus. GlobusWorld 2014 Keynote Delivering a Campus Research Data Service with Globus GlobusWorld 2014 Keynote Give me your data, your terabytes, Your huddled files yearning to breathe free Building campus research data services Open

More information

Managing Data Storage in the Public Cloud. October 2009

Managing Data Storage in the Public Cloud. October 2009 October 2009 Table of Contents Introduction...1 What is a Public Cloud?...1 Managing Data Storage as a Service...2 Improving Public Cloud Storage CDMI...4 How CDMI Works...4 Metadata in CDMI... 6 CDMI

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

SURFsara Data Services

SURFsara Data Services SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Mark van de Sanden The world of the many Many different users (well organised (international) user communities, research groups, universities,

More information

Data Management using irods

Data Management using irods Data Management using irods Fundamentals of Data Management September 2014 Albert Heyrovsky Applications Developer, EPCC a.heyrovsky@epcc.ed.ac.uk 2 Course outline Why talk about irods? What is irods?

More information

Big Data Analytics Roadmap Energy Industry

Big Data Analytics Roadmap Energy Industry Douglas Moore, Principal Consultant, Architect June 2013 Big Data Analytics Energy Industry Agenda Why Big Data in Energy? Imagine Overview - Use Cases - Readiness Analysis - Architecture - Development

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

IDC MaturityScape Benchmark: Big Data and Analytics in Government

IDC MaturityScape Benchmark: Big Data and Analytics in Government IDC MaturityScape Benchmark: Big Data and Analytics in Government Adelaide O Brien Research Director, IDC aobrien@idc.com Presentation to ACT-IAC Emerging Technology SIG July, 2014 IDC MaturityScape Benchmark:

More information

C05 Discovery of Enterprise zsystems Assets for API Management

C05 Discovery of Enterprise zsystems Assets for API Management C05 Discovery of Enterprise zsystems Assets for API Management Unlocking mainframe assets for mobile and cloud applications Haley Fung hfung@us.ibm.com IMS Mobile and APIM Development Lead * IMS Technical

More information

A Close Look at Drupal 7

A Close Look at Drupal 7 smart. uncommon. ideas. A Close Look at Drupal 7 Is it good for your bottom line? {WEB} MEADIGITAL.COM {TWITTER} @MEADIGITAL {BLOG} MEADIGITAL.COM/CLICKOSITY {EMAIL} INFO@MEADIGITAL.COM Table of Contents

More information

Scale Cloud Across the Enterprise

Scale Cloud Across the Enterprise Scale Cloud Across the Enterprise Chris Haddad Vice President, Technology Evangelism Follow me on Twitter @cobiacomm Read architecture guidance at http://blog.cobia.net/cobiacomm Skate towards the puck

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Alcatel-Lucent Multiscreen Video Platform RELEASE 2.2

Alcatel-Lucent Multiscreen Video Platform RELEASE 2.2 Alcatel-Lucent Multiscreen Video Platform RELEASE 2.2 Enrich the user experience and build more valuable customer relationships by delivering personal, seamless and social multiscreen video services Embrace

More information

Fluency With Information Technology CSE100/IMT100

Fluency With Information Technology CSE100/IMT100 Fluency With Information Technology CSE100/IMT100 ),7 Larry Snyder & Mel Oyler, Instructors Ariel Kemp, Isaac Kunen, Gerome Miklau & Sean Squires, Teaching Assistants University of Washington, Autumn 1999

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EISCAT User Meeting, Uppsala,6 May 2013 2 Exponential growth Data trends Zettabytes

More information

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management

Canadian National Research Data Repository Service. CC and CARL Partnership for a national platform for Research Data Management Research Data Management Canadian National Research Data Repository Service Progress Report, June 2016 As their digital datasets grow, researchers across all fields of inquiry are struggling to manage

More information

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis

Globus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis Globus Research Data Management: Introduction and Service Overview Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Thank you to

More information

IO Informatics The Sentient Suite

IO Informatics The Sentient Suite IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

What s New in Analytics: Fall 2015

What s New in Analytics: Fall 2015 Adobe Analytics What s New in Analytics: Fall 2015 Adobe Analytics powers customer intelligence across the enterprise, facilitating self-service data discovery for users of all skill levels. The latest

More information

Oracle BI 11g R1: Build Repositories

Oracle BI 11g R1: Build Repositories Oracle University Contact Us: 1.800.529.0165 Oracle BI 11g R1: Build Repositories Duration: 5 Days What you will learn This Oracle BI 11g R1: Build Repositories training is based on OBI EE release 11.1.1.7.

More information

How To Understand The Benefits Of Big Data

How To Understand The Benefits Of Big Data Findings from the research collaboration of IBM Institute for Business Value and Saïd Business School, University of Oxford Analytics: The real-world use of big data How innovative enterprises extract

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD

DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure. Arcot (RAJA) Rajasekar DICE/SDSC/UCSD DataGrids 2.0 irods - A Second Generation Data Cyberinfrastructure Arcot (RAJA) Rajasekar DICE/SDSC/UCSD What is SRB? First Generation Data Grid middleware developed at the San Diego Supercomputer Center

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures

A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures A Benchmark to Evaluate Mobile Video Upload to Cloud Infrastructures Afsin Akdogan, Hien To, Seon Ho Kim and Cyrus Shahabi Integrated Media Systems Center University of Southern California, Los Angeles,

More information

Big Data and Predictive Analytics. Cameron Hall Vice President, Products ValueCentric, LLC

Big Data and Predictive Analytics. Cameron Hall Vice President, Products ValueCentric, LLC Big Data and Predictive Analytics Cameron Hall Vice President, Products ValueCentric, LLC Agenda 1 What is Big Data? 2 Does your organization have Big Data? - Spoiler Alert: Yes! 3 What is Predictive Analytics?

More information

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 Cleveland State University Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734 SS Chung 14 Build a Data Mining Model using Data

More information

Cloud Computing. What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine

Cloud Computing. What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine Cloud Computing and Big Data: What s the Big Deal? Michael J. Carey Information Systems Group CS Department UC Irvine What Is Cloud Computing? Cloud computing is a model for enabling ubiquitous, convenient,

More information

Introduction to Service Oriented Architectures (SOA)

Introduction to Service Oriented Architectures (SOA) Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Web Archiving and Scholarly Use of Web Archives

Web Archiving and Scholarly Use of Web Archives Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on

More information

Data sharing and Big Data in the physical sciences. 2 October 2015

Data sharing and Big Data in the physical sciences. 2 October 2015 Data sharing and Big Data in the physical sciences 2 October 2015 Content Digital curation: Data and metadata Why consider the physical sciences? Astronomy: Video Physics: LHC for example. Video The Research

More information

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers

More information

Spatio-Temporal Networks:

Spatio-Temporal Networks: Spatio-Temporal Networks: Analyzing Change Across Time and Place WHITE PAPER By: Jeremy Peters, Principal Consultant, Digital Commerce Professional Services, Pitney Bowes ABSTRACT ORGANIZATIONS ARE GENERATING

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Big Data Trends A Basis for Personalized Medicine

Big Data Trends A Basis for Personalized Medicine Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees

More information

RS MDM. Integration Guide. Riversand

RS MDM. Integration Guide. Riversand RS MDM 2009 Integration Guide This document provides the details about RS MDMCenter integration module and provides details about the overall architecture and principles of integration with the system.

More information

locuz.com Big Data Services

locuz.com Big Data Services locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.

More information

Collaboration. Michael McCabe Information Architect mmccabe@gig-werks.com. black and white solutions for a grey world

Collaboration. Michael McCabe Information Architect mmccabe@gig-werks.com. black and white solutions for a grey world Collaboration Michael McCabe Information Architect mmccabe@gig-werks.com black and white solutions for a grey world Slide Deck & Webcast Recording links Questions and Answers We will answer questions at

More information

Multichannel Customer Listening and Social Media Analytics

Multichannel Customer Listening and Social Media Analytics ( Multichannel Customer Listening and Social Media Analytics KANA Experience Analytics Lite is a multichannel customer listening and social media analytics solution that delivers sentiment, meaning and

More information

Creative Director. Inspire artists, programmers, producers and marketing staff to make the highest quality product possible

Creative Director. Inspire artists, programmers, producers and marketing staff to make the highest quality product possible Open positions Creative Director... 2 Level designer... 3 Data scientist... 4 Backend engineer - user acquisition and game management tools... 5 Gameplay programmer... 6 Software engineer Client, tools,

More information

Globus Research Data Management: Introduction and Service Overview

Globus Research Data Management: Introduction and Service Overview Globus Research Data Management: Introduction and Service Overview Kyle Chard chard@uchicago.edu Ben Blaiszik blaiszik@uchicago.edu Thank you to our sponsors! U. S. D E P A R T M E N T OF ENERGY 2 Agenda

More information

Customer Analytics. Turn Big Data into Big Value

Customer Analytics. Turn Big Data into Big Value Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0

Online Marketing Module COMP. Certified Online Marketing Professional. v2.0 = Online Marketing Module COMP Certified Online Marketing Professional v2.0 Part 1 - Introduction to Online Marketing - Basic Description of SEO, SMM, PPC & Email Marketing - Search Engine Basics o Major

More information

Big data for the Masses The Unique Challenge of Big Data Integration

Big data for the Masses The Unique Challenge of Big Data Integration Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

The Way to SOA Concept, Architectural Components and Organization

The Way to SOA Concept, Architectural Components and Organization The Way to SOA Concept, Architectural Components and Organization Eric Scholz Director Product Management Software AG Seite 1 Goals of business and IT Business Goals Increase business agility Support new

More information

How To Write A Blog Post On Globus

How To Write A Blog Post On Globus Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, chard@uchicago.edu Jim Pruyne, University of Chicago Computation Institute, pruyne@uchicago.edu

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Cloud computing based big data ecosystem and requirements

Cloud computing based big data ecosystem and requirements Cloud computing based big data ecosystem and requirements Yongshun Cai ( 蔡 永 顺 ) Associate Rapporteur of ITU T SG13 Q17 China Telecom Dong Wang ( 王 东 ) Rapporteur of ITU T SG13 Q18 ZTE Corporation Agenda

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information