Introduction to Big Data Science



Similar documents
Data Mining Yelp Data - Predicting rating stars from review text

HOW TO DO A SMART DATA PROJECT

ANALYTICS IN BIG DATA ERA

Information Retrieval Elasticsearch

How To Make Sense Of Data With Altilia

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Domain Analytics. Jay Daley,.nz Registrar Conference, 2015

COMP9321 Web Application Engineering

Deposit Identification Utility and Visualization Tool

IT services for analyses of various data samples

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

The key to knowing the best price is to fully understand consumer behavior.

The Scientific Data Mining Process

CAS CS 565, Data Mining

bigdata Managing Scale in Ontological Systems

Artificial Intelligence and Politecnico di Milano. Presented by Matteo Matteucci

Exploring People in Social Networking Sites: A Comprehensive Analysis of Social Networking Sites

Term extraction for user profiling: evaluation by the user

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India


Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Big Data and Analytics: Challenges and Opportunities

Big Data Collection Study for Providing Efficient Information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

New Web tool to create educational and adaptive courses in an E-Learning platform based fusion of Web resources

Search Engines. Stephen Shaw 18th of February, Netsoc

THE SEMANTIC WEB AND IT`S APPLICATIONS

Big Data & Security. Aljosa Pasic 12/02/2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 5, Sep-Oct 2015

Utilizing Social Media Data for Enhancing Decision Making during Emergencies

Cloud Computing and the Future of Internet Services. Wei-Ying Ma Principal Researcher, Research Area Manager Microsoft Research Asia

Finding Advertising Keywords on Web Pages. Contextual Ads 101

iservdb The database closest to you IDEAS Institute

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

Industry 4.0 and Big Data

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

PREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM

Finding Negative Key Phrases for Internet Advertising Campaigns using Wikipedia

Clustering Technique in Data Mining for Text Documents

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Collaborations between Official Statistics and Academia in the Era of Big Data

Computer Programming for the Social Sciences

Customer Relationship Management using Adaptive Resonance Theory

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Big Data Analytics and Healthcare

Packet Flow Analysis and Congestion Control of Big Data by Hadoop

Wikipedia and Web document based Query Translation and Expansion for Cross-language IR

HEALTH INFORMATION MANAGEMENT ON SEMANTIC WEB :(SEMANTIC HIM)

Distributed Computing and Big Data: Hadoop and MapReduce

Mining event log patterns in HPC systems

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Statistics for BIG data

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Roundpeg 2014 All Rights Reserved. Page 1

Data, Data Everywhere

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

WHITEPAPER. Text Analytics Beginner s Guide

Text Analytics Evaluation Case Study - Amdocs

The Integration Between EAI and SOA - Part I

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

Profit from Big Data flow. Hospital Revenue Leakage: Minimizing missing charges in hospital systems

Professor, D.Sc. (Tech.) Eugene Kovshov MSTU «STANKIN», Moscow, Russia

Security Issues for the Semantic Web

USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS

Search Engine Optimisation Managed Service

Bridging CAQDAS with text mining: Text analyst s toolbox for Big Data: Science in the Media Project

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Incorporating Window-Based Passage-Level Evidence in Document Retrieval

Intelligent Tools For A Productive Radiologist Workflow: How Machine Learning Enriches Hanging Protocols

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

Sensors talk and humans sense Part II

Safewhere*Identify 3.4. Release Notes

Lead Generation Lessons From 4,000 Businesses. study based on real data from 4,000 businesses worldwide

Qi Liu Rutgers Business School ISACA New York 2013

Enabling Self Organising Logistics on the Web of Things

The sole purpose of the ad is to drive traffic to his landing page. By using Facebook ads is able to send highly targeted traffic there.

Chapter 7. Using Hadoop Cluster and MapReduce

Ontology based ranking of documents using Graph Databases: a Big Data Approach

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Transcription:

Introduction to Big Data Science 13 th Period Project: Situation Awareness and Statistical Analysis On Big Data Big Data Science 1

Contents What is Situation Awareness (SA)? 3 Levels for SA Role of Data Mining and Reasoning in SA Extracting Information from Big Data Entire Scenario of SA on Facebook Data Big Data Science 2

Awareness The goal of computational awareness: to realize awareness in computing machines Awareness is the ability to perceive, to feel, or to be conscious of events, objects or sensory patterns. Big Data Science 3

Situation Awareness Situation awareness is the perception of environmental elements with respect to time and/or space, the comprehension of their meaning, and the projection of their status in the near future after some variable has changed. (Mica Endsley, Wikipedia). Big Data Science 4

JDL: Data Fusion Levels A. Steinberg, et al., Rethinking the JDL Data Fusion Levels Big Data Science 5

Sources of SA Information M.R. Endsley, Theoretical Underpinnings of SA: A Critical Review Big Data Science 8

Mechanisms and Processes in SA M.R. Endsley, Theoretical Underpinnings of SA: A Critical Review Big Data Science 9

Provenance Endsley s Model Semantic Analysis thematic Spatio-Temporal trust Relate Situation Entities Identify Situation Entities Collect Relevant Data M. Kokar, et al., Ontology-based Situation Awareness* (Modified Figure by A. Sheth) Big Data Science 12

Three layers for situation awareness Big Data Science 13

A novel architecture for active situation awareness Image processing and pattern recognition, data mining, signal processing in computer technology can be applied to perception layer to recognize low level objects and data patterns. Situation awareness is inferring some conclusion from observation in the perception layer. Ontologybased rules are usually used for comprehension. The top layer is for projection, which anticipates future events and their implications. Big Data Science 14

A novel architecture for active situation awareness Projection recommendtoparticipate TheEvent(Building, Event) needreplyto (ITM) checkhisevent (ITM) Comprehension (Situation) givehottopic (ITM,ATopicHisBlog) hasevent (Building, Event) israre(event) saycelebration (ITM, myblog) Perception Stand (People, Longline) isat (People, Building) Wrote (ITM, myblog) needreplyto (ITM) World Facebook Twitter Google Web Data Service Big Data Science 15

Perceptions by mining SNS data Active Situation Awareness Ontology for Comprehension at Upper Layer Latent Query for SA (Time, Space, Theme) Document Processing Classification (TF-IDF) Event Information Extraction Perception Information Documents Twitter Facebook Data SNS, Web Data Services Big Data Science 16

Perception by mining SNS data Select data set to extract information to be used in comprehension layer. The information can be modeled by Web APIs to provide facts to rule engine. For example, we have analyzed the Facebook user s sentences by data mining technique to catch use s intension or changes in mind. There are various data and information set for each layer. Big Data Science 17

Ontology for Comprehension of the information Big Data Science 18

Comprehension of the information by inference of ontology and rule %% Cafeteria Event Inference %% Rules %%longlinestand(human) :- stand(human), long(human). mayhaveevent(place) :- longlinestand(human), areat(human, Place). hasevent(place,event) :- mayhaveevent(place), foundevent(place, Event). recommendtoparticipatetheevent(place, Event) :- hasevent(place,event), israre(event). %% Facts longlinestand(students). areat(students, cafeteria). foundevent(cafeteria, sinsobamatsuri). israre(sobamatsuri). Big Data Science 19

ASA System Architecture on SNS Smart Phone Inference Engine Facts RESTful Services for Perception FaceBook Service Rules Mapping Ontologies Domain Ontologies Twiter Service Web Data Service Big Data Science 20

Scenarios Scenario I A student in our university bought a lunch box because he saw a long waiting line in the university cafeteria. But he didn t know it was the waiting line for new soba festival in the cafeteria. If he got the information about the new soba festival from his smart phone when he was near to the cafeteria, he would have chosen the soba. Scenario II, III When I was in my office, a student came in. When I shake my smart phone, the phone tells me the followings about the student based on information on the Facebook: (Example) - The Opponent's Name: Leo Saito - He has interest to me - Saito has Events (Part Time Job, Date) - Saito has changed his topic from food to research Big Data Science 21

Mining SNS Data (By TF-IDF for Perception layer) Function: Category_calculate{//calculate category of a writing Input: word // set of words that are split Output: category //category of words set Data = learning data set for i = 1 to n {// n = number of word in words set calculate IDF i = log 2 (number of all document in Data / number of word i containing document in Data )} //IDF i = IDF value of word i for i = 1 to n {// n = number of word in words set for j = 1 to m { // m = number of data of Data set calculate TF ij =(frequency of word i in Data j / number of all word i in Data j ) calculate TFIDF ij = Tf ij * IDF i }} for j = 1 to m { // m = number of data of Data set calculate Sum_of_TFIDF j = sumof TFIDF 1j, TFIDF 2j,TFIDF nj if Max_Sum_of_TFIDF < Sum_of_TFIDF j { category=category of Data j }} return category } Function: determine the difference between the two categories{ Input: writing1, writing2 //writing is document set Output: true or false //If accordance -> true, Else -> false for i = i to n {//n = number of document in writing 1 Category_calculate(writing1 i ) } category_of_writing1 = most common category of document in writing1 for i = j to m {//m = number of document in writing 2 Category_calculate(writing2 j ) } category_of_writing2 = most common category of document in writing2 if category_of_writing1 = category_of_writing2 return false else return true } Big Data Science 22

Ontology for SA (Example 2) Big Data Science 23

Rules for SA (Example 2) 1) ITM wantsmyreply(itm) :- wrote(itm, myblog) and thereis(questionmark,hiswriting). enjoyme(itm) :- wrotenumbermorethan(itm, myblog, threshold). givehottopic(itm,atopichisblog) :- wrote(itm, ATopicHisBlog) and therearerepliesmorethan(atopichisblog, threshold). givegoodevaluation(itm, ATopicHisBlog) :- wrote(itm, ATopicHisBlog) and therearegoodrepliesmorethan(atopichisblog, threshold). saycelebration(itm, myblog) :- wrote(itm, myblog) and thereis(celebration, myblog). havenewevent(itm) :- wrote(itm, hiseventblog). * Example of Upper Level Factor or Situation needreplyto(itm) :- wantsmyreply(itm) and saycelebration(itm, myblog) adn enjoyme(itm). checkhisevent(itm) :- havenewevent(itm) and givehottopic(itm, ATopicHistBlog). 2) MC wantsmyreply(mc) :- wrote(mc, myblog) and thereis(questionmark,hiswriting). enjoyme(mc) :- wrotenumbermorethan(mc, myblog, threshold). givehottopic(mc,atopichisblog) :- wrote(mc, ATopicHisBlog) and therearerepliesmorethan(atopichisblog, threshold). givegoodevaluation(mc, ATopicHisBlog) :- wrote(mc, ATopicHisBlog) and therearegoodrepliesmorethan(atopichisblog, threshold). saycelebration(mc, myblog) :- wrote(mc, myblog) and thereis(celebration, myblog). havenewevent(mc) :- wrote(mc, hiseventblog). 3) IL hasnewevent(il) :- wrotesomeblogforevent(il) --> * large complex task * haschangedmind(il) :- wrotedifferentcontextinblog(il) --> * large complex task * Big Data Science 24

Running Example of Projection by ASA Demonstration Big Data Science 25