Big Data to Decision. Thomas E. Potok, PhD Group Leader Computational Data Analytics Group Oak Ridge National Laboratory
|
|
- Hugh Reeves
- 8 years ago
- Views:
Transcription
1 Big Data to Decision Thomas E. Potok, PhD Group Leader Computational Data Analytics Group Oak Ridge National Laboratory
2 Computational Data Analytics Group Research 10 years in data mining and machine learning Last year organized 7 research workshops and published 70 research papers 2 Managed by UT-Battelle Key University Partnerships University of Tennessee Center for Intelligent Systems and Machine Learning North Carolina State University Emory, University of Chicago, Georgetown Key Government and Industrial Partnerships Office of Naval Research Department of Homeland Security Department of Defense Intelligence Community Department of Energy Lockheed Martin Battelle Memorial Institute Technology Transfer People and Awards Piranha license to TextOre Piranha license to Lockheed Martin ORCA license to Lockheed Martin 25 staff members, 15 PhDs in computer science R&D 100 Award (Oscars of invention) in 2007
3 Recent News Stories and Programs Major Projects: Centers for Medicare and Medicaid Services Objectives:! Facilitate easy, consistent analysis for CMS staff and stakeholders! Decrease turn-around time from data to decision! Give analysts insight into opportunities for policy improvement!"#$%&'()*+,-$"+(.&(/0.1*"#0(*$2#0-3%.$+4( Future:! Drive the ability to inform policy and policy makers based on near real-time data! Provide the ability to perform predictive modeling and simulation of policy scenarios, (outcomes and impacts)! Supply tools and techniques for social network, risk, and other advanced analysis 50.6(,#78(( /#0+/#3%1#+(&.0(B#.<0-/?*3(C-0*-%.$( ( D0.1*"#(*$2#0-3%1#(0#/.02+(-$"(1*+E-,*:-%.$+(2.(#F2#0$-,( E+#0+( (!"#$%&'(-G0*>E2#+(-30.++(6-++*1#(-6.E$2+(-$"(2'/#+(.&("-2-( 7 3 Managed by UT-Battelle Managed by UT-Battelle Big Data Intrusion Detection
4 Data is a blessing or curse Warning signs existed The data was never fully understood 4 Managed by UT-Battelle
5 The people who understand their data will succeed Data Channels RSS feeds Blogs Twitter Linkedin Youtube Facebook The cost for not keeping up is high The value of being informed is high Ahhh, I am not, ahhh what was that link again... ahhh, let me get back to you 5 Managed by UT-Battelle
6 Recommendations: Apple itunes Genius Arranges your music Recommends new music based on your music library 6 Managed by UT-Battelle
7 Personalized Music Recommendation My List Stairway to Heaven - Led Zeppelin Johnny B. Goode - Chuck Berry Like A Rolling Stone - Bob Dylan Playlist 2 Johnny B. Goode - Chuck Berry Won't Get Fooled Again Who All Along The Watchtower Jimi Hendrix Playlist 3 Purple Haze - Jimi Hendrix Whole LoFa Love - Led Zeppelin (I Can't Get No) SaGsfacGon Rolling Stones Term List Stairway Heaven Led Zeppelin Johnny Goode Chuck Berry Like Rolling Stone Bob Dylan Wont Fooled Who Along Watchtower Jimi Hendrix Purple Haze Whole LoFa Love SaGsfacGon Vector Space Model Term List 1 List 2 List 3 Stairway Heaven Led Zeppelin Johnny Goode Chuck Berry Like Rolling Stone Bob Dylan Wont Fooled Who Along Watchto wer Jimi Hendrix Purple Haze Whole LoFa Love SaGsfacG on Similarity Matrix!"#$%&!"#$%'!"#$%(!"#$%&!""# $"#!"#!"#$%'!""#!!#!"#$%(!""# Genius Recommendations You bought music by Chuck Berry Won't Get Fooled Again Who You bought music by Chuck Berry All Along The Watchtower Jimi Hendrix You bought music by Led Zeppelin Whole Lotta Love Led Zeppelin 7 Managed by UT-Battelle
8 Piranha Document Analysis - Terms are weighted according to their frequency. - TF-ICF was used for this work as an example of one weighting algorithm that does not require a static corpus. - A document vector is a VSM representation of the document s terms and associated weights. Similarity Matrix Doc 1 Doc 2 Doc 3 Doc 1 100% 17% 21% Doc 2 100% 36% Doc 3 100% Documents to Documents Euclidean distance Cluster Analysis D1 D2 D3 Most similar documents Time Complexity O(n 2 Log n) 8 Managed by UT-Battelle
9 Term Frequency Weights Term Frequency/Inverse Document Frequency Document Frequency Significant Term centrifuge InteresGng Term mechanism Stop word the Theme word Obama Set Frequency Inverse Document Frequency 9 Managed by UT-Battelle Inverse Document Frequency Strengths Finds significant terms in a random set of document Weakness High O Does not work well for single topic set
10 Term Frequency/Inverse Corpus Frequency Document Frequency Significant Term centrifuge or words not in the corpus InteresGng Term mechanism Stop word the Theme word Obama Corpus Frequency Inverse Corpus Frequency Inverse Corpus Frequency Strengths Finds significant terms in common topic sets Parallel Weakness Stop words Managed by UT-Battelle 0 for the U.S. Department of Energy
11 Challenge Highly weighted terms, but not significant terms "computer", "ieee", "architecture", "data", "algorithms", "applications", "submitted", "researchers", "energy", "due:", "device", "infrastructures", "library", "methods", "optimized", "symposium", "version", "systems", "processes", "models", "present", "paper", "performance", "scale", "intern", "technology", "communities", "result", "large", "distributed", "standards", "september", "july", "provide", "annual", "interests", "areas:" Add a query for each term against the repository Terms that generate few returned documents, or low IDF and deemed significant 11 Managed by UT-Battelle
12 Personalized Content Recommendations Computational Data Analytics Group Main Idea: What you see is what you want What you see is what you get 12 Managed by UT-Battelle AFenGon Time User Interest Personalized RecommendaGons Problem Statement: How to detect user interests and automatically recommend interesting contents in a personalized way. Technical Approach: Detecting user interests through attention time, i.e. time spent by a user on reading a certain webpage. Collaboratively mining semantic contents of user reading materials along with one s implicit feedbacks. Advanced data fusion algorithms for user interest inference. Dynamic content recommendations according to inferred user interest profile. Advantage over the State-of-the-Art: Leverage an ontology based approach for noise tolerant user interest inference. Can autonomously recommend interesting contents to end users without explicit user participation. Capable of detecting dynamic user interest shift fully automatically and adjust algorithm behaviors accordingly.
13 Piranha Cluster View Report Date: 1 April, FBI: Abdul Ramazi is the owner of the Select Gourmet Foods shop in Springfield Mall, Springfield, VA. [Phone number ]. First Union National Bank lists Select Gourmet Foods as holding account number Six checks totaling $35,000 have been deposited in this account in the past four months and are recorded as having been drawn on accounts at the Pyramid Bank of Cairo, Egypt and the Central Bank of Dubai, United Arab Emirates. Both of these banks have just been listed as possible conduits in money laundering schemes 13 Managed by UT-Battelle
14 PiranhaG - 1 million documents in 12 minutes on a GPU cluster 30-fold performance improvement in text analysis Researchers in the Applied Software Engineering Group (ASER) in collaboration with North Carolina State University has created a cluster of 1M documents in 12 minutes using a 6 node GPU cluster 14 Managed by UT-Battelle
15 PiranhaX: Petascale text analysis ORNL s Jaguar is the 2nd fastest computer in the world 255,000 cores -10PB (13,400 1TB drives) of Storage -362TB of memory Google has indexed 1 Trillion unique URLs, but has not analyzed the content of the information We are currently developing petascale text analysis techniques to cluster (deep analysis) of 1 trillion documents using Jaguar 15 Managed by UT-Battelle
16 VERDE: NOM: Visualizing Energy Resources Dynamically on Earth National Outage Map Capability: Platform provides wide area spatiotemporal electric grid situational awareness Situational awareness of transmission lines (above 230KV) Situational awareness of distribution outages (status of approximately 40 Million customers served) Wide-Area Power Grid Situational Awareness Streaming Data Impact Models and Data Analysis Real-time weather overlays Predictive and post-event impact modeling and simulation Data analysis Energy infrastructure views Population impacts 16 Managed by UT-Battelle Distribution Outages Analysis Real-time Weather Overlays
17 TRACS: The Resiliency Analysis & Coordination System Computational Data Analytics Group Problem Statement: Critical Humanitarian Assistance/Disaster Recovery data resides in multiple domains Growing HA/DR information in social media Disaster response requires real-time access to common community information Technical Approach: Apply Web 2.0 and social media technologies for sharing heterogeneous data across organizations Allow mapping of data to one or more assessment frameworks to track progress towards stated goals Advantage over state-of-the-art: TRACS contextualizes social media, crisis mapping, and network analysis data within one or more common societal models Provides visualization and intuitive displays to support pre- and post-event analysis 17 Managed by UT-Battelle 17
18 Human Expert Interpretation of Images Images courtesy of Memorial Sloan-Kettering Cancer Center via 18 Managed by UT-Battelle
19 Mammography Data Temporal Aspects Two latest normal reports Two latest suspicious reports Band Coefficients Band 1 identifies recent abnormalities Band 2 identifies early abnormalities Wavelet transform of the sequence of abnormal s-grams counts 19 Managed by UT-Battelle
20 Knowledge Discovery in Linked (Big)Data Computational Data Analytics Group RDF, SPARQL, OWL etc. Data SEEKER: Schema Exploration and Evolving Knowledge Recorder Models and Algorithms Data association, Prediction, Summarization, Data fusion Inference, Network-analytic behavior extraction, etc. Inference Emergent behaviors SNAKE: Social Network Analytic Knowledge Extraction Problem Statement: Data is not rectangular it is freeform. Knowledge is buried in associations (links) of disparate data. How can we find what is interesting from the data? What is the data trying to tell us? Technical Approach: Domain Digitization Machine traversable (and evolving) data model as taxonomies, vocabulary and links. Automated data schema exploration and analysis. Graph-theoretic space-time-topological model Captures relationships, attributes, and temporal/spatial variations. Change detection in time-varying graphs. Statistical inference and visualization Hypothesis tests for anomalous behavior. Advantage over State-of-the-Art: Domain-inclusion makes knowledge discovery sustainable over time. Change-management strategy recommendation. Sensing emergent behaviors. 20 Managed by UT-Battelle
21 Summary Processing large volumes of text is a challenging problem Missing information is expensive Discovering information is profitable Piranha can keep you from missing information, and help you to discover new valuable information 21 Managed by UT-Battelle
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationStatistical Analysis and Visualization for Cyber Security
Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationSustainable Development with Geospatial Information Leveraging the Data and Technology Revolution
Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationBig Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich
Big Data Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich Goal of Today What is Big Data? introduce all major buzz words What is not Big Data? get a feeling for opportunities & limitations Answering
More informationBig Data Challenges and Success Factors. Deloitte Analytics Your data, inside out
Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationBig Data and Healthcare Payers WHITE PAPER
Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other
More informationBig Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India
Big Data and Semantic Web in Manufacturing Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India Outline Big data in Manufacturing Big data Analytics Semantic web technologies Case
More informationText Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationIntroduction to Big Data the four V's
Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationCourse 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing
More informationThe University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More information可 视 化 与 可 视 计 算 概 论. Introduction to Visualization and Visual Computing 袁 晓 如 北 京 大 学 2015.12.23
可 视 化 与 可 视 计 算 概 论 Introduction to Visualization and Visual Computing 袁 晓 如 北 京 大 学 2015.12.23 2 Visual Analytics Adapted from Jim Thomas s slides 3 Visual Analytics Definition Visual Analytics is the
More informationCYBER4SIGHT TM THREAT INTELLIGENCE SERVICES ANTICIPATORY AND ACTIONABLE INTELLIGENCE TO FIGHT ADVANCED CYBER THREATS
CYBER4SIGHT TM THREAT INTELLIGENCE SERVICES ANTICIPATORY AND ACTIONABLE INTELLIGENCE TO FIGHT ADVANCED CYBER THREATS PREPARING FOR ADVANCED CYBER THREATS Cyber attacks are evolving faster than organizations
More informationScalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
More informationDe la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data
De la Business Intelligence aux Big Data Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris 22/01/14 Séminaire Big Data 1 Agenda EvoluHon of Business Intelligence SemanHc Technologies
More informationPALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management
PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their
More informationChapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization
More informationThe Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
More informationData Intensive Science and Computing
DEFENSE LABORATORIES ACADEMIA TRANSFORMATIVE SCIENCE Efficient, effective and agile research system INDUSTRY Data Intensive Science and Computing Advanced Computing & Computational Sciences Division University
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationHow To Create A Data Science System
Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002
More informationHOW TO DO A SMART DATA PROJECT
April 2014 Smart Data Strategies HOW TO DO A SMART DATA PROJECT Guideline www.altiliagroup.com Summary ALTILIA s approach to Smart Data PROJECTS 3 1. BUSINESS USE CASE DEFINITION 4 2. PROJECT PLANNING
More informationCyber4sight TM Threat. Anticipatory and Actionable Intelligence to Fight Advanced Cyber Threats
Cyber4sight TM Threat Intelligence Services Anticipatory and Actionable Intelligence to Fight Advanced Cyber Threats Preparing for Advanced Cyber Threats Cyber attacks are evolving faster than organizations
More informationWeb Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationSoftware Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo
Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationMaking Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.
Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
More informationSemantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
More informationMEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012
MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationOutline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More informationDatabase Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
More informationIntroduction to Big Data Science
Introduction to Big Data Science 13 th Period Project: Situation Awareness and Statistical Analysis On Big Data Big Data Science 1 Contents What is Situation Awareness (SA)? 3 Levels for SA Role of Data
More informationChapter 5. Warehousing, Data Acquisition, Data. Visualization
Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization 5-1 Learning Objectives
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationUnderstanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
More informationPARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
More informationWhat is Data Science? Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014
What is Data Science? { Data, Databases, and the Extraction of Knowledge Renée T., @becomingdatasci, November 2014 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa
More informationNiara Security Analytics. Overview. Automatically detect attacks on the inside using machine learning
Niara Security Analytics Automatically detect attacks on the inside using machine learning Automatically detect attacks on the inside Supercharge analysts capabilities Enhance existing security investments
More informationMLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
More informationCustomer Lead Generation from Digital Channels for Insurance Dr. Jai Ganesh
Customer Lead Generation from Digital Channels for Insurance By Dr. Jai Ganesh Head, Mphasis Next Labs Contents Introduction...4 Innovation-driven Integrated Digital Customer Analytics...5 Comprehensive,
More informationUncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM
Uncovering Value in Healthcare Data with Cognitive Analytics Christine Livingston, Perficient Ken Dugan, IBM Conflict of Interest Christine Livingston Ken Dugan Has no real or apparent conflicts of interest
More informationGETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA"
GETTING REAL ABOUT SECURITY MANAGEMENT AND "BIG DATA" A Roadmap for "Big Data" in Security Analytics ESSENTIALS This paper examines: Escalating complexity of the security management environment, from threats
More informationManjula Ambur NASA Langley Research Center April 2014
Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big
More informationSupercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
More informationDigital marketing strategy: embracing new technologies to broaden participation
Communications and engagement strategy Appendix 1 Digital marketing strategy: embracing new technologies to broaden participation NHS Northumberland Clinical Commissioning Group (CCG) is keen to develop
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationEnabling the SmartGrid through Cloud Computing
Enabling the SmartGrid through Cloud Computing April 2012 Creating Value, Delivering Results 2012 eglobaltech Incorporated. Tech, Inc. All rights reserved. 1 Overall Objective To deliver electricity from
More informationAnalysis of Social Media Streams
Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization
More informationThe data forest. Application. Application Application DATA. Office of Research
The data forest DATA Unfortunately Data to the rescue The Rensselaer IDEA HPC: Computational Science and Engineering + Data Science and Predictive Analytics + Cognitive Computing + Perceptualization DATA
More informationA Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
More informationScholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
More informationText Analytics Beginner s Guide. Extracting Meaning from Unstructured Data
Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights
More informationThe Ontological Approach for SIEM Data Repository
The Ontological Approach for SIEM Data Repository Igor Kotenko, Olga Polubelova, and Igor Saenko Laboratory of Computer Science Problems, Saint-Petersburg Institute for Information and Automation of Russian
More informationCyber Security Metrics Dashboards & Analytics
Cyber Security Metrics Dashboards & Analytics Feb, 2014 Robert J. Michalsky Principal, Cyber Security NJVC, LLC Proprietary Data UNCLASSIFIED Agenda Healthcare Sector Threats Recent History Security Metrics
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationCray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationSocial Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationPerformance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology
Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Hong-Linh Truong Institute for Software Science, University of Vienna, Austria truong@par.univie.ac.at Thomas Fahringer
More informationW. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
More informationSocial Media Analysis and Audience Engagement
Solution in Detail Media and Marketing Executive Summary Contact Us Social Media Analysis and Audience Engagement Analyze Social Media and Engage Customers Audience Engagement Consumer Experiences Social
More informationVisualization, Modeling and Predictive Analysis of Internet Attacks. Thermopylae Sciences + Technology, LLC
Visualization, Modeling and Predictive Analysis of Internet Attacks Thermopylae Sciences + Technology, LLC Administrative POC: Ms. Jeannine Feasel, jfeasel@t-sciences.com Technical POC: George Romas, gromas@t-sciences.com
More informationICT Perspectives on Big Data: Well Sorted Materials
ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in
More informationThe Human Element in Cyber Security and Critical Infrastructure Protection: Lessons Learned
The Human Element in Cyber Security and Critical Infrastructure Protection: Lessons Learned Marco Carvalho, Ph.D. Research Scientist mcarvalho@ihmc.us Institute for Human and Machine Cognition 40 South
More informationBUSINESS VALUE OF SEMANTIC TECHNOLOGY
BUSINESS VALUE OF SEMANTIC TECHNOLOGY Preliminary Findings Industry Advisory Council Emerging Technology (ET) SIG Information Sharing & Collaboration Committee July 15, 2005 Mills Davis Managing Director
More informationThree Open Blueprints For Big Data Success
White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints
More informationSOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014
SOCIAL MEDIA LISTENING AND ANALYSIS Spring 2014 EXECUTIVE SUMMARY In this digital age, social media has quickly become one of the most important communication channels. The shift to online conversation
More informationCLUSTER ANALYSIS WITH R
CLUSTER ANALYSIS WITH R [cluster analysis divides data into groups that are meaningful, useful, or both] LEARNING STAGE ADVANCED DURATION 3 DAY WHAT IS CLUSTER ANALYSIS? Cluster Analysis or Clustering
More informationEindhoven December 4, 2014
Eindhoven December 4, 2014 Waves: Visualizing spatio-temporal Soccer Data Insight Reports of sport events can be enhanced by real-time feature analysis. Solutions Complex spatio-temporal sports-analytics
More informationBig Data and Complex Networks Analytics. Timos Sellis, CSIT Kathy Horadam, MGS
Big Data and Complex Networks Analytics Timos Sellis, CSIT Kathy Horadam, MGS Big Data What is it? Most commonly accepted definition, by Gartner (the 3 Vs) Big data is high-volume, high-velocity and high-variety
More informationConcept and Project Objectives
3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the
More informationConnecting library content using data mining and text analytics on structured and unstructured data
Submitted on: May 5, 2013 Connecting library content using data mining and text analytics on structured and unstructured data Chee Kiam Lim Technology and Innovation, National Library Board, Singapore.
More informationSemantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
More informationModule Overview CORPORATE COMMUNICATIONS. Personal Information
CORPORATE COMMUNICATIONS MODULE OUTLINE DURATION: 22 nd July to 9 th August 2013: 6 sessions of two hours MODULE LEADER: David Kirkham, TD, BA, MSc, PhD E: david.kirkham@calistroconsultants.com T: +44
More informationData Visualization An Outlook on Disruptive Techniques (Technical Insights)
Data Visualization An Outlook on Disruptive Techniques (Technical Insights) Comprehend Complex Data Sets through Visual Representations June 2014 Contents Section Slide Numbers Executive Summary 3 Research
More informationWhat is Data Science? Girl Develop It! Meetup Renée M. P. Teate, March 2015
What is Data Science? { Girl Develop It! Meetup Renée M. P. Teate, March 2015 Let s start with: What is Data? http://upload.wikimedia.org/wikipedia/commons/f/f0/darpa _Big_Data.jpg https://encryptedtbn2.gstatic.com/images?q=tbn:and9gcs9dku3_tzi-swwyaqee5y0ehuvoiznsya_raknubbd0jyxpx7pw
More information