Keeping Pace with Big Data
|
|
- Deirdre Tucker
- 8 years ago
- Views:
Transcription
1 - A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep:// NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20, 2014 NSF Workshop on Big Data Analy6cs, Beijing 1
2 Concluding Remarks Big Data is a good problem to have Data mining is one way of approaching it Together, we can harness it for beler sci & eng Arizona State University Data Mining and Machine Learning Lab Keeping Pace with Big Data NSF Workshop on Big Data Analy6cs, Beijing 2 2
3 Big data is not a new problem, but a persistent one Why now? We re overwhelmed, start apprecia6ng data value, and data is generated ubiquitously (we re part of the problem) We have been dealing with it since we had data Feature selec6on, as an example, to balle data explosion (mainly for alribute- value data) Big data will only become bigger Ubiquitous and fast growing linked data in the age of social media Example con6nued, Feature selec6on for linked data Big data is a good problem to have And, many a 6me, big data may not be big enough NSF Workshop on Big Data Analy6cs, Beijing 3
4 Data will only become bigger hlp://iot.ieee.org/newsleler/september- 2014/the- internet- of- things- the- story- so- far.html NSF Workshop on Big Data Analy6cs, Beijing 4
5 Begin with AEribute- Value Data It is the most familiar form of data we encounter Tables in Excel, Databases, Data is conveniently collected everywhere Some typical challenges Data overload (increasing in both width and length) Data is collected for various reasons Data accumulates at an unprecedented speed Data itself does not offer any insight, but has poten6al To make sense of massive amounts of data is to focus: using only relevant data Data preprocessing is an important part of machine learning and data mining Feature selec6on is an effec6ve approach to downsizing data NSF Workshop on Big Data Analy6cs, Beijing 5
6 Massive Data and High Dimensionality Dimensionality of data has increased exponen6ally 10,000,000 log Max # Features of UCI data set 1,000, ,000 # Features 10,000 1, s 1990s 2000s max 102 1,558 3,231,961 NSF Workshop on Big Data Analy6cs, Beijing 6
7 A General Model of KDD Knowledge Discovery and Data Mining Data mining Applying analy6cal methods and tools to discover ac6onable palerns, construct sta6s6cal or predic6ve models, and iden6fy rela6onships among massive data NSF Workshop on Big Data Analy6cs, Beijing 7
8 Why Feature Selec>on? Most machine learning and data mining techniques may not be effec6ve for high- dimensional data Curse of Dimensionality Query accuracy and efficiency degrade rapidly as the dimensionality increases. The intrinsic dimensionality may be small. For example, the number of genes responsible for a certain type of disease may be small. NSF Workshop on Big Data Analy6cs, Beijing 8
9 Classifica>on A process of predic6ng the classes of unseen instances based on palerns learned from available instances Supervised learning with labeled data Classifica>on Rules Training Data Classifica>on Algorithm If Hair = blonde and Loca>on = no, then sunburned Test Data New Data NSF Workshop on Big Data Analy6cs, Beijing 9
10 Clustering A process of grouping objects (or instances) into clusters so that objects are similar to one another within a cluster but dissimilar to objects in other clusters Unsupervised learning with unlabeled data Clustering tasks NSF Workshop on Big Data Analy6cs, Beijing 10
11 Applica>ons of Feature Selec>on Customer rela6onship management Text mining and visual analy6cs Image retrieval Microarray data analysis and protein classifica6on Face recogni6on and handwrilen digit recogni6on Intrusion detec6on Social media and social networking apps NSF Workshop on Big Data Analy6cs, Beijing 11
12 Online Document Classifica>on Web Pages The image cannot be displayed. Your The image cannot be displayed. computer Your may computer not have enough memory may not have enough memory to open to open the image, the or the image may image, or the image may have been corrupted. Restart your Restart your computer, and computer, then open and the then file open the file again. again. If the red x still appears, If the red you x may still have appears, to you may have delete the image and then to insert delete it the again. image and then insert it again. The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. s Documents D 1 D 2 D M Terms T 1 T 2. T N C Sports Travel Jobs Internet ACM Portal IEEE Xplore Digital Libraries PubMed n n n Task: To classify unlabeled documents into categories Challenge: thousands of terms Solu>on: to apply dimensionality reduc6on NSF Workshop on Big Data Analy6cs, Beijing 12
13 Gene Expression Microarray Analysis Expression Microarray Image Courtesy of Affymetrix Task: To classify novel samples into known disease types (disease diagnosis) Challenge: hundreds of thousands of genes, but a few samples Solu>on: Feature Selec6on Expression Microarray Data Set NSF Workshop on Big Data Analy6cs, Beijing 13
14 Other Types of High- Dimensional Data Face images HandwriLen digits NSF Workshop on Big Data Analy6cs, Beijing 14
15 Evalua>on Measures for Ranking and Selec>ng Features The goodness of a feature/feature subset is dependent on measures Various measures Informa6on measures Distance measures Dependence measures Consistency measures Accuracy measures NSF Workshop on Big Data Analy6cs, Beijing 15
16 Informa>on Measures Entropy of variable X Entropy of X aher observing Y Informa6on Gain NSF Workshop on Big Data Analy6cs, Beijing 16
17 How to Validate Selec>on Results Direct evalua6on (if we know a priori ) Ohen suitable for ar6ficial data sets Based on prior knowledge about data Indirect evalua6on (if we don t know ) Ohen suitable for real- world data sets Based on a) number of features selected, b) performance on selected features (e.g., predic6ve accuracy, goodness of resul6ng clusters), and c) speed NSF Workshop on Big Data Analy6cs, Beijing 17
18 Methods for Result Evalua>on Accuracy For one ranked list Learning curves For results in the form Number of Features of a ranked list of features Before- and- aher comparison For results in the form of a minimum subset Comparison using different classifiers To avoid learning bias of a par6cular classifier Repea6ng experimental results For non- determinis6c results NSF Workshop on Big Data Analy6cs, Beijing 18
19 A Recent Book for Further Informa>on Six Chapters 1. Data of High Dimensionality and Challenges 2. Univariate Formula6on of Spectral Feature Selec6on (SFS) 3. Mul6variate Formula6ons 4. Connec6ons to Exis6ng Algorithms 5. Large- Scale SFS 6. Mul6- Source SFS Algorithms with sohware are available at dmml.asu.edu/sfs NSF Workshop on Big Data Analy6cs, Beijing 19
20 From ALribute- Value Data to Linked Data - We are living in an increasingly connected world NSF Workshop on Big Data Analy6cs, Beijing 20
21 Tradi>onal Media and Data Broadcast Media One- to- Many Communica6on Media One- to- One Tradi>onal Data NSF Workshop on Big Data Analy6cs, Beijing 21
22 Linked Data in the Age of Social Media Social Networking Content Sharing Social Media Blogs Wikis Forums NSF Workshop on Big Data Analy6cs, Beijing 22
23 Social Media: Many- to- Many Everyone can be a media outlet or producer Disappearing communica6on barrier Dis6nct characteris6cs User generated content: Massive, dynamic, extensive, instant, and noisy Rich user interac6ons: Linked data Collabora6ve environment: Wisdom of the crowd Many small groups: The long tail phenomenon; and ALen6on is hard to get NSF Workshop on Big Data Analy6cs, Beijing 23
24 Noise Removal Fallacy in Social Media We ohen learn that: Noise should be removed before data mining; and 99% TwiLer data is useless. Had eggs, sunny- side- up, this morning Can we remove noise as we usually do in DM? What is leh aher noise removal? TwiLer data can be rendered useless aher conven6onal noise removal As we are certain there is noise in data and there is a peril of removing it, what can we do? NSF Workshop on Big Data Analy6cs, Beijing 24
25 Linked Data and AEribute- Value Data They exist for different purposes Rela6ons, Connec6ons, or Links Proper6es, Content, etc. Classic machine learning and data mining methods assume independent, iden6cally distributed or i.i.d. property for alribute- value data Addi6onal challenges with the confluence of alribute- value and linked data User- generated Large Noisy, short, incomplete Unstructured, or free form NSF Workshop on Big Data Analy6cs, Beijing 25
26 Feature Selec>on for Social Media Data Massive and high- dimensional social media data poses unique challenges to data mining tasks Scalability Curse of dimensionality Social media data is inherently linked A key difference between social media data and alribute- value data Jiliang Tang and Huan Liu. ``Feature Selec6on with Linked Data in Social Media'', SIAM Interna6onal Conference on Data Mining (SDM), NSF Workshop on Big Data Analy6cs, Beijing 26
27 Feature Selec>on of Social Media Data Feature selec6on has been widely used to prepare large- scale, high- dimensional data for effec6ve data mining Tradi6onal feature selec6on algorithms deal with only flat" data (a2ribute- value data). Independent and Iden6cally Distributed (i.i.d.) We need to take advantage of linked data for feature selec6on NSF Workshop on Big Data Analy6cs, Beijing 27
28 Representa>on for Social Media Data u 1 p 1 p 2 f m... c k. u 1 u 2 u 3 u 4 u 2 u 3 u 4 p 4 p 5 p 6 p 7 p ser- post rela6ons NSF Workshop on Big Data Analy6cs, Beijing 28
29 Representa>on for Social Media Data u 1 p 1 p 2... f m c k. u 1 u 2 u 3 u 4 u 2 u 3 u 4 p 4 p 5 p 6 p 7 p User- user rela6ons NSF Workshop on Big Data Analy6cs, Beijing 29
30 Representa>on for Social Media Data u 1 p 1 p 2... f m c k. u 1 u 2 u 3 u 4 u 2 u 3 u 4 p 4 p 5 p 6 p 7 p Social Context NSF Workshop on Big Data Analy6cs, Beijing 30
31 Problem Statement Given labeled data X and its label indicator matrix Y, the dataset F, its social context including user- user following rela6onships S and user- post rela6onships P, Select k most relevant features from m features on dataset F with its social context S and P NSF Workshop on Big Data Analy6cs, Beijing 31
32 How to Use Link Informa>on The new ques6on is how to proceed with addi6onal informa6on for feature selec6on Two basic technical problems Rela6on extrac6on: What are dis6nc6ve rela6ons that can be extracted from linked data Mathema6cal representa6on: How to use these rela6ons in feature selec6on formula6on Do we have theories to guide us in this effort? NSF Workshop on Big Data Analy6cs, Beijing 32
33 Rela>on Extrac>on u 4 p 8 u 1 u 3 p 7 p 6 p 1 p 2 p 3 u 2 p 4 p 5 1. CoPost 2. CoFollowing 3. CoFollowed 4. Following NSF Workshop on Big Data Analy6cs, Beijing 33
34 Rela>ons, Social Theories, Hypotheses Social correla6on theories suggest that the four rela6ons may affect the rela6onships between posts Social correla6on theories Homophily: People with similar interests are more likely to be linked Influence: People who are linked are more likely to have similar interests Thus, four rela6ons lead to four hypotheses for verifica6on NSF Workshop on Big Data Analy6cs, Beijing 34
35 NSF Workshop on Big Data Analy6cs, Beijing 35 Modeling CoFollowing Rela>on Two co- following users have similar topics of interests ) ( ^ k F f i T k F f i k F f W F f T u T k i k i = )= ( Users' topic interests + + u N u u j i F T u j i u T u T, 2 2 ^ ^ 2,1 2 W ) ( ) ( W Y W X min β α
36 Evalua>on Results on Digg NSF Workshop on Big Data Analy6cs, Beijing 36
37 Evalua>on Results on Digg NSF Workshop on Big Data Analy6cs, Beijing 37
38 Summary LinkedFS is evaluated under varied circumstances to understand how it works. Link informa6on can help feature selec;on for social media data. Unlabeled data is more ohen in social media, unsupervised learning is more sensible, but also more challenging. Jiliang Tang and Huan Liu. `` Unsupervised Feature Selec6on for Linked Social Media Data'', the Eighteenth ACM SIGKDD Interna6onal Conference on Knowledge Discovery and Data Mining, Jiliang Tang, Huan Liu. ``Feature Selec6on with Linked Data in Social Media'', SIAM Interna6onal Conference on Data Mining, NSF Workshop on Big Data Analy6cs, Beijing 38
39 Looking Ahead New, rich data sources like social media present challenges and opportuni6es Feature selec6on is shown here for illustra6on Challenges abound Data collec6on (sampling bias, is data enough?) Data prepara6on (what is noise?) PaLern discovery (content, context, networks) Evalua6on (when without ground truth) Big data allows more opportuni6es for researchers of different disciplines to conduct collabora6ve research NSF Workshop on Big Data Analy6cs, Beijing 39
40 Thank You For this opportunity to share our research Acknowledgments Grants from NSF, ONR, and ARO, among others DMML members and project leaders Collaborators NSF Workshop on Big Data Analy6cs, Beijing 40
41 Concluding Remarks Big Data is a good problem to have Data mining is one way of approaching it Together, we can harness it for beler sci & eng Arizona State University Data Mining and Machine Learning Lab Keeping Pace with Big Data 41 NSF Workshop on Big Data Analy6cs, Beijing 41
Data Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationHow To Understand The Big Data Paradigm
Big Data and Its Empiricist Founda4ons Teresa Scantamburlo The evolu4on of Data Science The mechaniza4on of induc4on The business of data The Big Data paradigm (data + computa4on) Cri4cal analysis Tenta4ve
More informationANALYTICAL TECHNIQUES FOR DATA VISUALIZATION
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES
More informationGraduate Systems Engineering Programs: Report on Outcomes and Objec:ves
Graduate Systems Engineering Programs: Report on Outcomes and Objec:ves Alice Squires, alice.squires@stevens.edu Tim Ferris, David Olwell, Nicole Hutchison, Rick Adcock, John BrackeL, Mary VanLeer, Tom
More informationThe Library (Big) Data scien4st
The Library (Big) Data scien4st IFLA/ALA webinar: Big Data: new roles and opportuni4es for new librarians June 15 th 2016 IFLA Big Data Special Interest Group (SIG) Wouter Klapwijk, Stellenbosch University,
More informationIns+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho
Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED
More informationSecure Because Math: Understanding ML- based Security Products (#SecureBecauseMath)
Secure Because Math: Understanding ML- based Security Products (#SecureBecauseMath) Alex Pinto Chief Data Scien2st Niddel / MLSec Project @alexcpsec @MLSecProject @NiddelCorp Agenda Security Singularity
More informationHow To Use A Webmail On A Pc Or Macodeo.Com
Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five
More informationNodes, Ties and Influence
Nodes, Ties and Influence Chapter 2 Chapter 2, Community Detec:on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010. 1 IMPORTANCE OF NODES 2 Importance of Nodes Not
More information1 Actuate Corpora-on 2013. Big Data Business Analy/cs
1 Big Data Business Analy/cs Introducing BIRT Analy3cs Provides analysts and business users with advanced visual data discovery and predictive analytics to make better, more timely decisions in the age
More informationFINANCIAL SERVICES CASE STUDY COLLECTION. Broker Profile, Multrees Investor Services Ltd & Spayne Lindsay & Co. LLP
FINANCIAL SERVICES CASE STUDY COLLECTION Broker Profile, Multrees Investor Services Ltd & Spayne Lindsay & Co. LLP The Workbooks product offered greater functionality... We also felt that we would receive
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationB2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity
B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business
More informationSan Jacinto College Banner & Enterprise Applica5on Review Task Force Report. November 01, 2011 FINAL
San Jacinto College Banner & Enterprise Applica5on Review Task Force Report November 01, 2011 FINAL 1 Content Review goal and approach 3 Barriers to effec5ve use of Banner: Consultant observa5ons 10 Consultant
More informationTopic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth
Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth Lecture Algorithms to Analyze Big Data Speaker Hüseyin Dagaydin Heidelberg, 27
More informationBig Data in medical image processing
Big Data in medical image processing Konstan3n Bychenkov, CEO Aligned Research Group LLC bychenkov@alignedresearch.com Big data in medicine Genomic Research Popula3on Health Images M- Health hips://cloud.google.com/genomics/v1beta2/reference/
More informationMSc Data Science at the University of Sheffield. Started in September 2014
MSc Data Science at the University of Sheffield Started in September 2014 Gianluca Demar?ni Lecturer in Data Science at the Informa?on School since 2014 Ph.D. in Computer Science at U. Hannover, Germany
More informationOpportuni)es and Challenges of Textual Big Data for the Humani)es
Opportuni)es and Challenges of Textual Big Data for the Humani)es Dr. Adam Wyner, Department of Compu)ng Prof. Barbara Fennell, Department of Linguis)cs THiNK Network Knowledge Exchange in the Humani)es
More informationComputer Security Incident Handling Detec6on and Analysis
Computer Security Incident Handling Detec6on and Analysis Jeff Roth, CISSP- ISSEP, CISA, CGEIT Senior IT Security Consultant 1 Coalfire Confiden+al Agenda 2 SECURITY INCIDENT CONTEXT TERMINOLOGY DETECTION
More informationBig Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas
Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationCS 5150 So(ware Engineering Evalua4on and User Tes4ng
Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering Evalua4on and User Tes4ng William Y. Arms Usability: The Analyze/Design/Build/Evaluate Loop Analyze requirements Design User
More informationIntroduc)on to the IoT- A methodology
10/11/14 1 Introduc)on to the IoTA methodology Olivier SAVRY CEA LETI 10/11/14 2 IoTA Objec)ves Provide a reference model of architecture (ARM) based on Interoperability Scalability Security and Privacy
More informationPower to the People: Analy0cs for All
Arijit Sengupta CEO, BeyondCore, Inc. Power to the People: Analy0cs for All " Ten patents related to Advanced Analytics, Privacy/Security and BPaaS. " Previously worked at Oracle, Microsoft, Yankee Group
More informationUNIFIED, END- TO- END EDISCOVERY
ac.onable informa.on governance Partners Providing Excellence in: UNIFIED, END- TO- END EDISCOVERY 2011 IBM Corpora.on Meet the Presenters Amir Jaibaji Vice President, Product Management StoredIQ Kevin
More informationBIG DATA AND INVESTIGATIVE ANALYTICS
The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve
More informationData Stream Algorithms in Storm and R. Radek Maciaszek
Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.
More informationMaking Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.
Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy
More informationPu?ng B2B Research to the Legal Test
With the global leader in sampling and data services Pu?ng B2B Research to the Legal Test Ashlin Quirk, SSI General Counsel 2014 Survey Sampling Interna6onal 1 2014 Survey Sampling Interna6onal Se?ng the
More informationThe Elusive U,lity Customer: How Big Data & Analy,cs Connects U,li,es & Their Customers
The Place Analy,cs Leaders Turn to for Answers Member.U(lityAnaly(cs.com The Elusive U,lity Customer: How Big & Analy,cs Connects U,li,es & Their Customers Mike Smith Vice President, U(lity Analy(cs Ins(tute
More informationProtec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning. MIS 5206 Protec/ng Informa/on Assets Greg Senko
Protec'ng Informa'on Assets - Week 8 - Business Continuity and Disaster Recovery Planning MIS5206 Week 8 In the News Readings In Class Case Study BCP/DRP Test Taking Tip Quiz In the News Discuss items
More informationResearch at the Department of Computer Science and Software Engineering. Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014
Research at the Department of Computer Science and Software Engineering Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014 Research Areas Ar%ficial intelligence Robo%cs Data mining Image
More informationExtrac'ng People s Hobby and Interest Informa'on from Social Media Content
Extrac'ng People s Hobby and Interest Informa'on from Social Media Content Thomas Forss, Shuhua Liu and Kaj- Mikael Björk Dept of Business Administra?on and Analy?cs Arcada University of Applied Sciences
More informationSecurity Leadership: Preven4ng and Responding to Future Cyber A<acks. Mark Seward, Sr. Director, Security and Compliance
Security Leadership: Preven4ng and Responding to Future Cyber A
More informationWelcome! Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research. Andrea Heckert, PhD, MPH Program Officer, Science
Accelera'ng Pa'ent- Centered Outcomes Research and Methodological Research Emily Evans, PhD, MPH Program Officer, Science Andrea Heckert, PhD, MPH Program Officer, Science June 22, 2015 Welcome! Emily
More informationUnderstanding Cloud Compu2ng Services. Rain in business success with amazing solu2ons in Cloud technology
Understanding Cloud Compu2ng Services Rain in business success with amazing solu2ons in Cloud technology What is Cloud Compu2ng? Cloud compu2ng encompasses various services and ac2vi2es carried out over
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationManaged Services. An essen/al set of tools for today's businesses
Managed Services An essen/al set of tools for today's businesses Manage your enterprise better with a holis/c solu/on to all your IT worries only at Infolob What are Managed Services? By far the most cu/ng
More informationPrivacy- Preserving P2P Data Sharing with OneSwarm. Presented by. Adnan Malik
Privacy- Preserving P2P Data Sharing with OneSwarm Presented by Adnan Malik Privacy The protec?on of informa?on from unauthorized disclosure Centraliza?on and privacy threat Websites Facebook TwiFer Peer
More informationTelehealth care Closing the Gap to Specialty Care. Dietra Watson, MSN, RN Clinical Informa7cs
Telehealth care Closing the Gap to Specialty Care Dietra Watson, MSN, RN Clinical Informa7cs What is Telehealth? The use of electronic informa7on and telecommunica7ons technologies to support long- distance
More informationTim Blevins Execu;ve Director Labor and Revenue Solu;ons. FTA Technology Conference August 4th, 2015
Tim Blevins Execu;ve Director Labor and Revenue Solu;ons FTA Technology Conference August 4th, 2015 Governance and Organiza;onal Strategy PaIerns of Fraud and Abuse in Government What tools can we use
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationBig Data and Clouds: Challenges and Opportuni5es
Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationIntegra)ng Web 2.0 and Social Tools into Three Genera)ons of DE Pedagogy. Terry Anderson, PhD Professor, Athabasca University
Integra)ng Web 2.0 and Social Tools into Three Genera)ons of DE Pedagogy Terry Anderson, PhD Professor, Athabasca University Values We can (and must) con)nuously improve the quality, effec)veness, appeal,
More informationNZ On Air Digital Strategy 2012-2015
NZ On Air Digital Strategy 2012-2015 Defining digital Digital has various meanings that originate from different sources. In its purest sense it is simply the dis9nc9on from analogue. Broadcast content
More informationProject Por)olio Management
Project Por)olio Management Important markers for IT intensive businesses Rest assured with Infolob s project management methodologies What is Project Por)olio Management? Project Por)olio Management (PPM)
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationHow To Use Splunk For Android (Windows) With A Mobile App On A Microsoft Tablet (Windows 8) For Free (Windows 7) For A Limited Time (Windows 10) For $99.99) For Two Years (Windows 9
Copyright 2014 Splunk Inc. Splunk for Mobile Intelligence Bill Emme< Director, Solu?ons Marke?ng Panos Papadopoulos Director, Product Management Disclaimer During the course of this presenta?on, we may
More informationCloud Data Management System (CDMS)
Cloud Management System (CMS) Wiqar Chaudry Solu9ons Engineer Senior Advisor CMS Overview he OpenStack cloud data management system features a canonical data modeling framework designed to broker context
More informationMETHODS AND TECHNIQUES OF PREDICTION OF KEY PERFORMANCE INDICATORS FOR IMPLEMENTATION OF CHANGES IN MAINTENANCE ORGANISATION
Management Systems in Production Engineering 0, No (5), pp 5 9 METHODS AND TECHNIQUES OF PREDICTION OF KEY PERFORMANCE INDICATORS FOR IMPLEMENTATION OF CHANGES IN MAINTENANCE ORGANISATION Andrzej WIECZOREK
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationDistributed Data Management Summer Semester 2013 TU Kaiserslautern
Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de 1 Lecture 8+ (DISTRIBUTED) DATA STREAM PROCESSING (INTRODUCTION) 2 So Far: Databases/NoSQL
More informationBest Prac*ces in Online Tutoring in STEM with the Deaf STEM Community Alliance Virtual Academic Community (VAC)
Best Prac*ces in Online Tutoring in STEM with the Deaf STEM Community Alliance Virtual Academic Community (VAC) Lisa B. Elliot, Aus*n U. Gehret, Stacey Davis, Raja Kushalnagar, & Warren Goldmann Rochester
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationTheo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za
Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal
More informationApplying Machine Learning to Network Security Monitoring. Alex Pinto Chief Data Scien2st MLSec Project @alexcpsec @MLSecProject!
Applying Machine Learning to Network Security Monitoring Alex Pinto Chief Data Scien2st MLSec Project @alexcpsec @MLSecProject! whoami Almost 15 years in Informa2on Security, done a licle bit of everything.
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationLeveraging Expert Instructional Design Strategies to Develop Quality Online Courses
Leveraging Expert Instructional Design Strategies to Develop Quality Online Courses Kevin Hulen Assistant Director, Online Course Development Center for Instruc7on and Research Technology University of
More informationPhone Systems Buyer s Guide
Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
More informationA Web Page Prediction Model Based on Click-Stream Tree Representation of User Behavior
A Web Page Predicon Model Based on Click-Stream Tree Representaon of User Behavior Şule Gündüz Computer Engineering Department Istanbul Technical University Istanbul, Turkey gunduz@cs.itu.edu.tr M. Tamer
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationGyrus: A Framework for User- Intent Monitoring of Text- Based Networked ApplicaAons
Gyrus: A Framework for User- Intent Monitoring of Text- Based Networked ApplicaAons Yeongjin Jang*, Simon P. Chung*, Bryan D. Payne, and Wenke Lee* *Georgia Ins=tute of Technology Nebula, Inc 1 Tradi=onal
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More information2015-16 ITS Strategic Plan Enabling an Unbounded University
2015-16 ITS Strategic Plan Enabling an Unbounded University Update: July 31, 2015 IniAaAve: Agility Through Technology Vision Mission Enable Unbounded Learning Support student success through the innovaave
More informationActive Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
More informationSo#ware quality assurance - introduc4on. Dr Ana Magazinius
So#ware quality assurance - introduc4on Dr Ana Magazinius 1 What is quality? 2 What is a good quality car? 2 and 2 2 minutes 3 characteris4cs 3 What is quality? 4 What is quality? How good or bad something
More informationBig Data and Scientific Discovery
Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major
More informationUsing Social Media to Drive Recommender Systems for Mobile Apps. - GRP Presenta=on - Jovian Lin (A0026542M)
Using Social Media to Drive Recommender Systems for Mobile Apps - GRP Presenta=on - Jovian Lin (A0026542M) Structure of Presenta=on Introduc=on Why Recommender Systems (RS)? Problems in Recommending Our
More informationThe Development of a Strategic Planning Framework for VCU s College of Humani?es and Sciences
The Development of a Strategic Planning Framework for VCU s College of Humani?es and Sciences Data Analysis and Representa?on Interpreta?on U?liza?on Why are we here? During the fall 0 CHS retreat, Dean
More informationHow To Create A Text Classification System For Spam Filtering
Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar
More informationProgram Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional.
Program Model: Muskingum University offers a unique graduate program integra6ng BUSINESS and TECHNOLOGY to develop the 21 st century professional. 163 Stormont Street New Concord, OH 43762 614-286-7895
More informationREST (Representa.onal State Transfer) Ingegneria del So-ware e Lab. Università di Modena e Reggio Emilia Do<. Marzio Franzini
REST (Representa.onal State Transfer) Ingegneria del So-ware e Lab. Università di Modena e Reggio Emilia Do
More informationLearning and Learning Environments. Broadening Par2cipa2on in STEM. STEM Professional Workforce
Learning and Learning Environments Broadening Par2cipa2on in STEM STEM Professional Workforce Learning and Learning Environments Develop understanding of the founda3ons of STEM learning; emerging contexts
More informationSynchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry!
Synchronous and asynchronous video conferencing tools in an online-course:! Supporting a community of inquiry! David Wicks, Seattle Pacific University! Andrew Lumpe, Seattle Pacific University! Janiess
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationGyrus: A Framework for User- Intent Monitoring of Text- Based Networked ApplicaAons
Gyrus: A Framework for User- Intent Monitoring of Text- Based Networked ApplicaAons Yeongjin Jang*, Simon P. Chung*, Bryan D. Payne, and Wenke Lee* *Georgia Ins=tute of Technology Nebula, Inc 1 Tradi=onal
More informationDoing Big Data Projects: What s the Best Team Process Methology?
Doing Big Data Projects: What s the Best Team Process Methology? October 2015 1 Executive Summary What s the Best Team Process Methology? September 2015 2 Executive Summary What s the Best Team Process
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationThe Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis
The Emerging Discipline of Data Science Principles and Techniques For Data- Intensive Analysis What is Big Data Analy9cs? Is this a new paradigm? What is the role of data? What could possibly go wrong?
More informationBPO. Accerela*ng Revenue Enhancements Through Sales Support Services
BPO Accerela*ng Revenue Enhancements Through Sales Support Services What is BPO? Business Process Outsorcing (BPO) is the process of outsourcing specific business func6ons to a third- party service provider
More informationIntroduction. Chapter 1
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
More informationSplunk and Big Data for Insider Threats
Copyright 2014 Splunk Inc. Splunk and Big Data for Insider Threats Mark Seward Sr. Director, Public Sector Company Company (NASDAQ: SPLK)! Founded 2004, first sohware release in 2006! HQ: San Francisco
More informationCapitalize on your carbon management solu4on investment
Capitalize on your carbon management solu4on investment Best prac4ce guide for implemen4ng carbon management so9ware Carbon Disclosure Project +44 (0) 20 7970 5660 info@cdproject.net www.cdproject.net
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationNetworked Virtual Spaces and Clouds. Magda El Zarki UC Irvine
Networked Virtual Spaces and Clouds Magda El Zarki UC Irvine Outline Introduc6on to Networked Virtual Environments (NVE) Networked Virtual Environment Architectures Quality of Experience Clouds and real
More informationOnline Gambling - Advantages And Disadvantages
MOVING YOUR BUSINESS ONLINE TO MAXIMIZE ROI By Shelby Landeck Manager of Client Relations, Income Access PRESENTATION OVERVIEW Why going online is important And what your business can achieve online Defining
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationSocial Media Analy.cs (SMA)
Social Media Analy.cs (SMA) Emanuele Della Valle DEIB - Politecnico di Milano emanuele.dellavalle@polimi.it hap://emanueledellavalle.org What's social media? haps://www.youtube.com/watch?v=sgniiud_oqg
More informationData Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
More informationCyber Security With Big Data
Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general
More information«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge
«Shanoir : une solu/on pour la ges/on de données distribuées en imagerie in- vivo» Jus/ne Guillaumont Isabelle Corouge Shanoir: a solu-on for neuro- imaging data management Jus/ne Guillaumont, Isabelle
More informationIntroduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases
1 Introduc)on to urika Mul)threading SPARQL Database urika Appliance XMT- 2 Programming Use Cases 2 MTA- 1 1998 Gallium arsenide: Proof of concept First produc,on implementa,on of latency- tolerant mul,threading
More informationApplication of Supply Chain Concepts to the Analysis Process
Application of Supply Chain Concepts to the Analysis Process Rob Handfield, PhD Bank of America University Distinguished Professor of Supply Chain Management Executive Director, Supply Chain Resource Cooperative
More informationProtec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology
Protec'ng Communica'on Networks, Devices, and their Users: Technology and Psychology Alexey Kirichenko, F- Secure Corpora7on ICT SHOK, Future Internet program 30.5.2012 Outline 1. Security WP (WP6) overview
More informationPresenta<on to EMA GCP IWG. Cloud Services - A Framework for Adop<on in the Regulated Life Sciences Industry. Agenda item 03.1.1
Agenda item 03.1.1 Formed in 2004 >6000 members worldwide Not- for- profit organiza
More information