Semantic Search in E-Discovery. David Graus & Zhaochun Ren
|
|
- Todd Bradford
- 8 years ago
- Views:
Transcription
1 Semantic Search in E-Discovery David Graus & Zhaochun Ren
2 This talk Introduction David Graus! Understanding traffic David Graus! Topic discovery & tracking in social media Zhaochun Ren 2
3 Intro Semantic Search in E-Discovery NWO-funded project 4 year, 2 PhD students With help/input from: NFI FIOD Create-IT Applied Research FoxIT 3
4 Semantic Search in E-Discovery Information Retrieval + Information Extraction/Text Mining 4
5 Semantic Search in E-Discovery Information Retrieval Finding material of unstructured nature from large collections + Information Extraction (Text Mining) 5
6 Semantic Search in E-Discovery Information Retrieval Finding material of unstructured nature from large collections + Information Extraction (Text Mining) Discovering patterns in data 6
7 Focus Forensic evidence in user-generated content , social media, forums, etc. 7
8 Challenge Finding out who knew what, from whom, and when 8
9 9
10 Understanding traffic David Graus
11 11
12 Recipient recommendation Given a sender, an , all possible recipients (in an enterprise); Predict which recipient(s) are most likely to receive the 12
13 Why? Understanding communication in/structure of an enterprise Applications in: enterprise search expert finding community detection spam classification anomaly detection 13
14 How? Gmail Who do you frequently co-address egonetwork Related work Us Social Network Analysis (SNA) content SNA + content 14
15 Part 1: Social Network Analysis? 15
16 image by Calvinius - Creative Commons Attribution-Share Alike
17 SNA for predicting recipients? 1. Importance of a node in the network More important people are more likely to be the recipient of an 2. Strength of connection between two nodes Given sender of the , the recipients who are frequently addressed are more likely to be the recipient 17
18 SNA for predicting recipients? 1. Importance of a node in the network 1. Number of received s 2. PageRank score of node 2. Strength of connection between two nodes 1. Number of s sent between nodes 2. Number of times two nodes are adressed together 18
19 Part 2: content Statistical Language Models (LMs)! Assign a probability to a sequence of words;! Compute models for different corpora; Used in lots of places; Information Retrieval Machine Translation Speech Recognition 19
20 Language Models Language models as communication profiles 20
21 Language Models Language models as communication profiles 1. Incoming LM (how people talk to user) 21
22 Language Models Language models as communication profiles 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 22
23 Language Models Language models as communication profiles 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2) 23
24 Language Models Language models as communication profiles 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2) 24
25 Language Models Language models as communication profiles 1. Incoming LM (how people talk to user) 2. Outgoing LM (how user talks to people) 3. Interpersonal LM (how node1 talks with node2) 4. Corpus LM (how everyone talks) 25
26 Why language models? Comparisons between communication profiles: Find nodes with most similar communication 26
27 SNA!! 1. Importance of a node in the network! 3. Strength of connection between nodes Content!! 1. Incoming LM 2. Outgoing LM 3. Interpersonal LM 4. Corpus-based LM!!! 27
28 Approach: time-based t=0 1 , 2 addresses t=1 2 s, 2 addresses t=2 3 s, 4 addresses t=3 4 s, 5 addresses! etc! t=n s, addresses 28
29 At some time interval t Given the , sender, and network Remove recipients from Rank all nodes in the network By computing for each candidate (recipient) node: 1. Importance of candidate 2. Strength of connection between sender and candidate 3. Similarity between sender and candidate LMs 29
30 30
31 Findings: what works for predicting recipients? Importance of node: Number of received s of node! Strength of connection: Number of s between nodes! LM Similarity: Interpersonal LM is most important 31
32 Findings: SNA vs content SNA: SNA signals deteriorate over time SNA signals are most informative on highly active users! content: LM signal improves over time LM signal does worse with highly active users 32
33 Finally Combining Social Network Analysis with Language Modeling is better than doing either. 33
34 Why for E-Discovery Anomaly detection Given a working prediction model; identify unexpected communication Language models for communication For a node, find the most different interpersonal communication Friends/family vs colleagues? Find communication that differs from the corpus-based communication 34
35 Topic Discovery and Tracking in Social Text Streams Zhaochun Ren University of Amsterdam
36 Outline Motivation Challenges Our approach Conclusion & outlook 36
37 Motivation What is Social media? Social media is the social interaction among people in which they create, share or exchange information and ideas in virtual communities and networks 37
38 Social text streams 200 millions tweets posted per day Over 5 billion pages on Facebook Over 55 millions status are made on Facebook 38
39 Question: How can we help users understand social text streams? 39
40 Topic discovery and tracking What is topic discovery and tracking? finding topically related material in streams of data (e.g., newswire and broadcast news)! Methods Local content analysis algorithm (1998) Hidden markov model (2001) Topic models: LDA/pLSA (2003) 40
41 Topic discovery and tracking in social text streams Find important topics in social text streams! Influence of social media user behavior modeling collaborative connections topic drifting on social media 41
42 Outline Motivation Challenges Our approach Conclusion & outlook 42
43 Challenges Topic drifting phenomenon User behavior modeling Volume of social text streams Sparseness on social text streams 43
44 Outline Motivation Challenges Our approach Conclusion & outlook 44
45 Time-aware topic modeling Topic models: Latent dirichlet allocation Each topic can be represented as a finite mixture of words Each document in the corpus can be represented as a mixture of multiple topics Bag of words assumption 45
46 Topic modeling in social text streams Dynamic topic modeling on social streams User behavior modeling on Twitter 46
47 Topic modeling in social text streams Dynamic topic modeling on social streams User behavior modeling on Twitter 47
48 Dynamic topic modeling Social text streams: concept drifting phenomenon! Input data: input! documents X 1 input documentsx 2 input documentsx i!! t 1 t 2 t i Output: topic distribution p(z t) at each time period t 48
49 Application: Hierarchical multi-label classification on social text streams Hierarchical multi-label classification learn a hypothesis function f : X!{0, 1} C from training data {(x (i), y (i) )} D i=1 to predict a y when given input document x Follow T -property some social texts streams belong to to hierarchical multiple labels There are quite cramped trains I think the train will soon stop again because of snow... I really feel like Smullers Communication 200,000 people travel with book as ticket ROOT Product Traveler Personal report Personal experience Retail on station Parking Incident Compliment Complaint Product Experience Smullers 49
50 Hierarchical multi-label classification on social text streams Hierarchical multi-label classification for short documents in social streams Learn from previous time periods, and predict an output when a new document arrives Concept drift phenomenon Document expansion Dynamic topic modeling Structural learning based text classification 50
51 Experimental setup Dataset tweets related to a transportation company from 18th January 2010 to 5th June ,692 tweets posted by 77,161 Twitter users annotations 493 nodes in 13 subsets 51
52 Time-aware topic extraction (1) 1"Train"Schedule 2"winter"chaos 3"sta8on 4"hot"drinks 5"ede>wageningen 1"sta8on 2"winter"chaos 3"chocomel 4"wheel 5"change 1"netherlands 2"train 3"bomb 4"NS"company 5"police 1"train 2"train"cancel 3"snow"fall 4"froze 5"clumsy"work 1"bomb 2"NS 3"pains 4"police 5"train 52
53 Time-aware topic extraction (2) macro F #days C SSVM LTC SSVM GTC SSVM 53
54 Topic modeling in social text streams Dynamic topic modeling on social streams User behavior modeling on Twitter 54
55 Tweet Propagation Model Key idea 55
56 Tweet Propagation Model Candidate tweets User s own tweets RT t1 θ u,t1 RT t2 θ u,t2 time time Probability of user s interests at each time period Probability of topics at each time period 56
57 Application: personalized time-aware tweets summarization Time-aware tweets summarization Select the most representative tweets for each time period as summary Personalized time-aware tweets summarization Summary needs to be relevant to user s interests Data preprocessing Tweets propagation model Document summarization: sentence extraction 57
58 Overall performance 40 tweets per period Metrics TPM-A TPM-T TPM-S UBM TLDA AT TF-IDF Centroid Lex-R R R R-W tweets per period Metrics TPM-A TPM-T TPM-S UBM TLDA AT TF-IDF Centroid Lex-R R R R-W
59 Outline Motivation Problem definition Our approach Conclusion & outlook 59
60 Conclusion Dynamic topic modeling in social text streams A new topic model for synchronous tracking topics and user s interests Experiments on industrial dataset demonstrate the effectiveness of our proposed method 60
61 Future work Contrastive aspect extraction Large-scale topic modeling Parallel processing to enhance the efficiency Active learning to optimize the period size Continuous time periods 61
62 Thanks! Zhaochun Ren David 62
Who is Involved? Semantic Search for E-Discovery
Who is Involved? Semantic Search for E-Discovery David van Dijk d.v.vandijk@uva.nl Hans Henseler j.henseler@hva.nl David Graus D.P.Graus@uva.nl Maarten de Rijke derijke@uva.nl Zhaochun Ren Z.Ren@uva.nl
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationComputer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015
Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD
More informationAnalysis of Social Media Streams
Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization
More informationSpatio-Temporal Patterns of Passengers Interests at London Tube Stations
Spatio-Temporal Patterns of Passengers Interests at London Tube Stations Juntao Lai *1, Tao Cheng 1, Guy Lansley 2 1 SpaceTimeLab for Big Data Analytics, Department of Civil, Environmental &Geomatic Engineering,
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationSentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015
Sentiment Analysis D. Skrepetos 1 1 Department of Computer Science University of Waterloo NLP Presenation, 06/17/2015 D. Skrepetos (University of Waterloo) Sentiment Analysis NLP Presenation, 06/17/2015
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationExploring Big Data in Social Networks
Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about
More informationSentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationText Analytics Beginner s Guide. Extracting Meaning from Unstructured Data
Text Analytics Beginner s Guide Extracting Meaning from Unstructured Data Contents Text Analytics 3 Use Cases 7 Terms 9 Trends 14 Scenario 15 Resources 24 2 2013 Angoss Software Corporation. All rights
More informationContent Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research
INSIGHT Content Analyst's Cerebrant Combines SaaS Discovery, Machine Learning, and Content to Perform Next-Generation Research David Schubmehl IDC OPINION Organizations are looking for better ways to perform
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationDigital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012
Digital Collections as Big Data Leslie Johnston, Library of Congress Digital Preservation 2012 Data is not just generated by satellites, identified during experiments, or collected during surveys. Datasets
More informationCombining Social Data and Semantic Content Analysis for L Aquila Social Urban Network
I-CiTies 2015 2015 CINI Annual Workshop on ICT for Smart Cities and Communities Palermo (Italy) - October 29-30, 2015 Combining Social Data and Semantic Content Analysis for L Aquila Social Urban Network
More informationDoctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED
Doctoral Consortium 2013 Dept. Lenguajes y Sistemas Informáticos UNED 17 19 June 2013 Monday 17 June Salón de Actos, Facultad de Psicología, UNED 15.00-16.30: Invited talk Eneko Agirre (Euskal Herriko
More informationChapter ML:XI. XI. Cluster Analysis
Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster
More informationChallenges of Cloud Scale Natural Language Processing
Challenges of Cloud Scale Natural Language Processing Mark Dredze Johns Hopkins University My Interests? Information Expressed in Human Language Machine Learning Natural Language Processing Intelligent
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationWhy big data? Lessons from a Decade+ Experiment in Big Data
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:
More informationText Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationExtracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
More informationTable of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.
Table of Contents Title Declaration by the Candidate Certificate of Supervisor Acknowledgement Abstract List of Figures List of Tables List of Abbreviations Chapter Chapter No. 1 Introduction 1 ii iii
More informationData Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationCAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
More informationBig Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationCAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
More informationData Mining. Toon Calders TU Eindhoven
The Dangers of Data Mining Toon Calders TU Eindhoven Motivation for Data Mining: the Data Flood Huge amounts of data are available in digital form Internet IP Traffic logs Scientific data Customer profiles
More informationForecasting stock markets with Twitter
Forecasting stock markets with Twitter Argimiro Arratia argimiro@lsi.upc.edu Joint work with Marta Arias and Ramón Xuriguera To appear in: ACM Transactions on Intelligent Systems and Technology, 2013,
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationA Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
More informationISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 10, October 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationData Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
More informationCollaborations between Official Statistics and Academia in the Era of Big Data
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationCPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015
CPSC 340: Machine Learning and Data Mining Mark Schmidt University of British Columbia Fall 2015 Outline 1) Intro to Machine Learning and Data Mining: Big data phenomenon and types of data. Definitions
More informationSURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
More informationAutomatic slide assignation for language model adaptation
Automatic slide assignation for language model adaptation Applications of Computational Linguistics Adrià Agustí Martínez Villaronga May 23, 2013 1 Introduction Online multimedia repositories are rapidly
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationBig Data The Next Phase Lessons from a Decade+ Experiment in Big Data
Big Data The Next Phase Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 Outline Big Data Overview Thinking
More informationIdentifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
More informationDescription Why? When? Limitations Things to think about COMMON TECHNIQUES Word of mouth
Marketing Channels to Consider for Open Access Journals Working Draft of Table to be included in the Best Practices Guide to Open Access Journals Publishing by Co Action Publishing and Lund University
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationSentiment analysis using emoticons
Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was
More informationAnalysis of Communication Patterns in Network Flows to Discover Application Intent
Analysis of Communication Patterns in Network Flows to Discover Application Intent Presented by: William H. Turkett, Jr. Department of Computer Science FloCon 2013 January 9, 2013 Port- and payload signature-based
More informationMachine Learning with MATLAB David Willingham Application Engineer
Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the
More informationError Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin
Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationDecision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010
Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product
More informationResearch Article Traffic Analyzing and Controlling using Supervised Parametric Clustering in Heterogeneous Network. Coimbatore, Tamil Nadu, India
Research Journal of Applied Sciences, Engineering and Technology 11(5): 473-479, 215 ISSN: 24-7459; e-issn: 24-7467 215, Maxwell Scientific Publication Corp. Submitted: March 14, 215 Accepted: April 1,
More informationSummarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help?
Summarizing Online Forum Discussions Can Dialog Acts of Individual Messages Help? Sumit Bhatia 1, Prakhar Biyani 2 and Prasenjit Mitra 2 1 IBM Almaden Research Centre, 650 Harry Road, San Jose, CA 95123,
More informationAnomaly detection. Problem motivation. Machine Learning
Anomaly detection Problem motivation Machine Learning Anomaly detection example Aircraft engine features: = heat generated = vibration intensity Dataset: New engine: (vibration) (heat) Density estimation
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationData Analytics at NICTA. Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au
Data Analytics at NICTA Stephen Hardy National ICT Australia (NICTA) shardy@nicta.com.au NICTA Copyright 2013 Outline Big data = science! Data analytics at NICTA Discrete Finite Infinite Machine Learning
More informationPOWERFUL SOFTWARE. FIGHTING HIGH CONSEQUENCE CYBER CRIME. KEY SOLUTION HIGHLIGHTS
ADVANCED CYBER THREAT ANALYTICS POWERFUL SOFTWARE. FIGHTING HIGH CONSEQUENCE CYBER CRIME. Wynyard Advanced Cyber Threat Analytics (ACTA) is a Pro-active Cyber Forensics solution that helps protect organisations
More informationF. Aiolli - Sistemi Informativi 2007/2008
Text Categorization Text categorization (TC - aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community
ACEDS Membership Benefits Training, Resources and Networking for the E-Discovery Community! Exclusive News and Analysis! Weekly Web Seminars! Podcasts! On- Demand Training! Networking! Resources! Jobs
More informationIEEE JAVA Project 2012
IEEE JAVA Project 2012 Powered by Cloud Computing Cloud Computing Security from Single to Multi-Clouds. Reliable Re-encryption in Unreliable Clouds. Cloud Data Production for Masses. Costing of Cloud Computing
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationBig Data Challenges for Information Retrieval
UNIVERSITY OF COPENHAGEN DEPARTMENT OF COMPUTER SCIENCE Faculty of Science Big Data Challenges for Information Retrieval Christina Lioma Department of Computer Science c.lioma@diku.dk Slide 1/8 Information
More informationT-61.6010 Non-discriminatory Machine Learning
T-61.6010 Non-discriminatory Machine Learning Seminar 1 Indrė Žliobaitė Aalto University School of Science, Department of Computer Science Helsinki Institute for Information Technology (HIIT) University
More informationMonitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems
Monitoring and Analyzing Customer Feedback Through Social Media Platforms for Identifying and Remedying Customer Problems Sumit Bhatia, Jingxuan Li, Wei Peng, and Tong Sun Xerox Research Centre, Webster,
More informationLatent Dirichlet Markov Allocation for Sentiment Analysis
Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer
More informationRole of Social Networking in Marketing using Data Mining
Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:
More informationdm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
More informationOPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP
OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP 1 KALYANKUMAR B WADDAR, 2 K SRINIVASA 1 P G Student, S.I.T Tumkur, 2 Assistant Professor S.I.T Tumkur Abstract- Product Review System
More informationGrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens
More informationTwitter sentiment vs. Stock price!
Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured
More informationSocial Media Implementations
SEM Experience Analytics Social Media Implementations SEM Experience Analytics delivers real sentiment, meaning and trends within social media for many of the world s leading consumer brand companies.
More informationTackling Big Data with Tensor Methods
Tackling Big Data with Tensor Methods Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Data vs. Information Data vs. Information Missing observations, gross corruptions, outliers.
More informationFINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS
FINDING EXPERT USERS IN COMMUNITY QUESTION ANSWERING SERVICES USING TOPIC MODELS by Fatemeh Riahi Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationUser Modeling in Big Data. Qiang Yang, Huawei Noah s Ark Lab and Hong Kong University of Science and Technology 杨 强, 华 为 诺 亚 方 舟 实 验 室, 香 港 科 大
User Modeling in Big Data Qiang Yang, Huawei Noah s Ark Lab and Hong Kong University of Science and Technology 杨 强, 华 为 诺 亚 方 舟 实 验 室, 香 港 科 大 Who we are: Noah s Ark LAB Have you watched the movie 2012?
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationHow To Analyze Sentiment On A Microsoft Microsoft Twitter Account
Sentiment Analysis on Hadoop with Hadoop Streaming Piyush Gupta Research Scholar Pardeep Kumar Assistant Professor Girdhar Gopal Assistant Professor ABSTRACT Ideas and opinions of peoples are influenced
More informationThe Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
More informationDEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.
DEMYSTIFYING BIG DATA What it is, what it isn t, and what it can do for you. JAMES LUCK BIO James Luck is a Data Scientist with AT&T Consulting. He has 25+ years of experience in data analytics, in addition
More information