Research Statement Constrained Frequent Pattern Mining For Large Graph/Networks
|
|
- Cornelius Martin
- 8 years ago
- Views:
Transcription
1 Research Statement Feida ZHU School of Information Systems, Singapore Management University Tel: (65) ; 30 (Day) 04 (Month) 2013 (Year) Introduction The past decade has seen an unprecedented explosion of data in almost all areas of our life, from the boom of online social networks drawing hundreds of millions of users to highly accurate GPS systems tracking every move of the attached mobile devices The concept of Big Data has never attracted more attention from the research community as its importance grows increasingly palpable each day Yet, with all the wonders it could make happen, Big Data at the same time poses serious research challenges for mining and analysis tasks My central research theme has therefore been focused on --- Big Data Mining and Analytics The challenge of Big Data, in my understanding, can be best characterized by 4 V s, which are Volume, Velocity, Variety and Value as shown in Figure 1 These 4 V s also serve as a good map for my current and near-future research, which I shall present one by one in the following The settings have been centered on network and social media data as social networks have been the main data source for my research for the past few years However, all the results apply as well to other data settings of similar nature Variety Volume Big Data Velocity Value The Four Dimensions of the Big Data Challenge: (1) Volume --- taming data of societal-scale Figure 1 The most noticeable feature of the big data is its sheer volume, which is often of societal scale Mining and analysis on such data becomes extremely difficult even for simple tasks like frequent pattern discovery My research along this dimension has been focused on a fundamental problem in data mining which is the constrained frequent pattern mining problem, particularly on graph/network data which is the main data representation for social networks and also the most challenging setting compared with item-sets and sequences Frequent patterns have proved extremely powerful in a wide range of network analysis tasks including network clustering, classification, community detection and evolution To add to the complexity, the mining task often comes with user-specified constraints on the pattern result My research in this dimension can be further grouped into the following three topics 11 Constrained Frequent Pattern Mining For Large Graph/Networks To use frequent patterns for various knowledge discovery tasks, one must first be able to find the set of frequent patterns from the given data My research on constrained frequent pattern mining starts with my two Best Student Paper Awards [ICDE 07][PAKDD 07] during my PhD study in which I proposed a novel randomized mining framework to find the colossal frequent patterns in transaction data and a comprehensive constraint-pushing mining framework for graph data It is well-known that frequent pattern mining in graph setting is notoriously hard, especially in face of today s network scale Most work on graph mining has been largely focused on graph transaction setting where the input data is a large collection of small graphs However, 1
2 all the social network applications today present us with large single graphs It has been shown that frequent pattern mining in single network setting is a much more challenging problem than its counter-part in the transaction setting due to the existence of overlapping embeddings and accordingly much trickier support computation My VLDB 2011 paper on Mining top-k large structural patterns in massive networks [VLDB 11] proposed the first work that is able to find large patterns in massive graph data We developed a novel concept called r-spider and a corresponding algorithm called SpiderMine to use small frequent patterns in spider-shape to find top-k large patterns probabilistically within any user-specified error bound This work provides users for the first time the capacity to reach and study the largest frequent patterns in big graph data within reasonable amount of time With the boom of mobile social data and research on information diffusion, another kind of constrained pattern --- the skinny patterns, which are graph patterns with a long back- bone from which short twigs branch out, have found important applications for the descriptive power of its long backbone to represent spatial and temporal trajectories in heterogeneous information networks, and of the short twigs the various kinds of associated information My work in [SIGMOD 13] proposed a whole new direct mining paradigm for efficient constrained frequent graph mining such that frequent patterns with certain structural constraints can be generated directly with minimum redundancy, something impossible with traditional mining methodology in which patterns are grown in the order of increasing sizes The research agenda in this direction is to systematically explore and tackle the challenges posed by the constrained pattern mining problem for large networks as those ubiquitous in our daily life I have a coming book chapter on Mining Constrained Graph Patterns to be published by Springer later this year which will be a good summary of my work along this direction 12 Collaborative Pattern Mining In Distributed Environment Due to the remarkable size of network data, many of these networks are not stored in a centralized fashion Different parts of the network could be stored in different data centers around the world, or in a machine farm All existing mining algorithms have assumed a centralized storage of the entire graph and are therefore powerless in such a distributed environment Besides, one way to handle huge single network could be to first partition the data carefully and then mine them collaboratively Under this new setting, even the most classic problems in graph mining become fresh and interestingly challenging This is a whole new direction with few research work published There are many foundation work to be laid out and directions to be chartered My research agenda is to develop efficient algorithms for those fundamental mining problems in this setting and make it work on the societal-scale social network data we have here 13 Sampling and Summarization For Large Networks The size of today s social network has made it even impossible to visually comprehend as a whole by human examination Certain summarization of the original network becomes necessary for visualization of mining results or navigation in the network On the other hand, sampling of the entire network is also essential as it is often unrealistic to obtain the whole network My research agenda here is to examine the principles and algorithms of effective and efficient sampling methods to facilitate our data acquisition and find intuitive, informative and interesting ways to summarize large network data such as our Twitter data set 2
3 (2) Velocity --- conducting real-time analysis in huge-volume data flow Perhaps the most important and unique feature of social media compared against all the traditional news media is the real-time responsiveness of the data For example, it has been observed that, in life-critical disasters of societal scale, Twitter is the most important and timely source from which people find out and track the breaking news before any mainstream media picks up on them and rebroadcast the footage Consequently, it is essential that we are able to conduct mining and analysis in the huge-volume data flow in a real-time fashion One important topic in social media study is the bursty topics which capture social events attracting population-wise attention Our work in [ACL 12] proposed the first algorithm to find such topics from Twitter in an offline fashion To achieve the real-time responsiveness, our work published at KDD 13 proposed a novel mining framework called TopicSketch which is able to detect bursty topics earlier than traditional news media and can potentially handle hundreds of millions tweets per day which is close to the total number of daily tweets in Twitter One example of bursty topics detected from our data is illustrated in the following figure To our best knowledge, this is the first work that achieves real-time detection on social media of such scale as Twitter The future work includes incorporating community-awareness and information diffusion structure into the detection algorithm such that bursty events of different kinds can be distinguished and their potential virality can be predicted Other real-time mining and analysis such as frequent patterns and outlier detection would also be studied as part of the research in this dimension to handle the velocity of big data (3) Variety --- understanding data of high heterogeneity The challenge of big data also comes from the fact that the data is usually highly heterogeneous, ie, they are of different formats, types and come from different sources For example, even for the same user, we have text data from his tweets and reviews, multimedia data such as images from his Instagram account and videos from Youtube, trajectory and location data from his mobile devices and so on The analytical capacity to integrate, understand and leverage these highly heterogeneous data is immensely important The key is to find a connecting ingredient or a unifying model to achieve effective integration My approach in this dimension so far is to use what I deem the most characterizing feature of social media data --- user behavior --- as the gluing element to tie things together Our tutorial in DASFAA 13 titled Behavior Driven Social Network Mining and Analysis gives a selected summary of our recent research work along this line In particular we pushed the user behavior element into the following three mining tasks and produced interesting results which are otherwise unobtainable 3
4 (1) Behavior-driven Topic Modeling We proposed in [SDM 13] a B-LDA model to incorporate user behavior into the LDA topic modeling to better capture the user interactions which are critically important for topic analysis, user clustering and followee recommendation on social micro-blogging services such as Twitter (2) Behavior-driven Anomaly Detection We used group-level user behavior to characterize anomaly collections and identified spammer groups that are hard to catch with traditional point anomaly framework [SDM 12, CIKM 12] We also used collective user rating behavior to model anomalous users and products in online review settings and proposed a unifying framework based on mutual dependency principles [ICDM 12] Extensions of these pieces of work have been submitted to DMKD and TKDE (3) Behavior-driven Relationship Mining We studied the user follow links in Twitter network and developed a novel algorithm which, based on this piece of information alone, is able to identify with high accuracy those offline real-life friends of the target user [WebSci 12] This work has profound potential impact as we will further elaborate in the next part We also studied user follow linkage to dynamically propagate user attribute/relationship labels with user input [DASFAA 13] In another work published at [SocInfo 13], we re-visited the user ranking problem on social network and examined the problem from the user interaction perspective We provided a new angle to the problem based on the interplay between information and interaction (4) Value --- translating data analytical results into real-world impact This dimension of the Big Data challenge has not been well explored as yet In online social media setting, the central question to ask is --- How would all the analytical results about the online social data impact our offline real life? For example, all the research findings on social influence would remain inconsequential if we are not able to establish the linkage between the online and offline world My research agenda here is to fill this gap and establish the connection As the first effort toward this Holy Grail, we proposed [WebSci 12] a novel algorithm to distinguish a user s online and offline friends from her Twitter follow network, as illustrated in the right figure This work provides foundation for many exciting applications and future works including robust user modeling, business competitive analysis, user profile matching, spammer detection, etc Based on this work, our next work [DASFAA 13] is to propagate dynamically user attribute labels in the relationship network The corresponding demo system has won the Best Demo Award (Runner- Up) at DASFAA 13 A fundamental task in bridging the online and offline world is to integrate various aspects of information about the same user across different platforms The problem has profound impact to user modeling and business intelligence and has begun to attract a huge amount of research interest from the community We provide the first solution to use the whole range of user data and the result will be published in SIGMOD 14 4
5 Conclusion My research agenda in the past few years and in the near future will be focused on the Big Data challenge along, in particular, the four dimensions of Volume, Velocity, Variety and Value and with an emphasis on graph/network data Besides this main theme, I have also been working on other data mining applications including program parameter tuning [CoCoMile'12, LION'13], churn prediction [ASONAM'12], game strategy mining [CIG' 12] and network experimentation [ICWSM 13] References 1 "A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery", by Feida ZHU, Zequn ZHANG, Qiang QU, 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13), New York, USA, June, "Reviving Dormant Ties in an Online Social Network Experiment", by Ee-Peng LIM, Denzil CORRERA, David LO, Michael FINEGOLD, Feida ZHU, The 7th International AAAI Conference on Weblogs and Social Media (ICWSM'13), Boston, USA, July, "It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model", by Minghui QIU, Feida ZHU, and Jing JIANG, 05/2013, 2013 SIAM International Conference on Data Mining (SDM'13), Austin, Texas, USA, May, "TwiCube: A Real-time Twitter Online Community Analysis Tool", by Juan DU, Wei XIE, Cheng LI, Feida ZHU, and Ee Peng LIM, 04/2013, The 18th International Conference on Database Systems for Advanced Applications (DASFAA'13), Wuhan, China, April, "Dynamic Label Propagation in Social Networks", by Juan DU, Feida ZHU, and Ee Peng LIM, 04/2013, The 18th International Conference on Database Systems for Advanced Applications (DASFAA'13), Wuhan, China, April, "Automated Parameter Tuning Framework for Heterogeneous and Large Instances: Case study in Quadratic Assignment Problem", by LINDAWATI, Zhi YUAN, Hoong Chuin LAU, and Feida ZHU, 01/2013, Learning and Intelligent OptimizatioN Conference (LION 13), Catania, Italy 7 "A Survey of Recommender Systems in Twitter", by Su Mon KYWE, Ee Peng LIM, and Feida ZHU, 12/2012, International Conference on Social Informatics (SocInfo 12), Lausanne, Switzerland 8 "On Recommending Hashtags in Twitter Networks", by Su Mon KYWE, Tuan Anh HOANG, Ee Peng LIM, and Feida ZHU, 12/2012, International Conference on Social Informatics (SocInfo 12), Lausanne, Switzerland 9 "Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles", by Hanbo DAI, Feida ZHU, Ee Peng LIM, and Hwee Hwa PANG, 12/2012, The 12th IEEE International Conference on Data Mining (ICDM'12), Brussels, Belgium 10 "Impact of Multimedia in Sina Weibo: Popularity and Life Span", by Xun ZHAO, Feida ZHU, Weining QIAN, and Aoying ZHOU, 11/2012, The Joint Conference of the Sixth Chinese Semantic Web Symposium and the First Chinese Web Science Conference (CSWS & CWSC '12), Shenzheng, China 11 "Mining Coherent Anomaly Collections On Web Data", by Hanbo DAI, Feida ZHU, Ee Peng LIM, and Hwee Hwa PANG, 10/2012, the 21st Int Conf on Information and Knowledge Management (CIKM'12), Hawaii, USA 5
6 12 "In-Game Action List Segmentation and Labeling in Real-Time Strategy Games", by Wei GONG, Ee Peng LIM, Feida ZHU, Achananuparp PALAKORN, David LO, and Chong Tat Freddy CHUA, 09/2012, the 8th IEEE Conference on Computational Intelligence and Games (CIG' 12), Granada, Spain 13 "Follow Link Seeking Strategy: A Pattern Based Approach", by Agus Trisnajaya KWEE, Ee Peng LIM, Achananuparp PALAKORN, and Feida ZHU, 08/2012, the 6th ACM workshop on Social Network Mining and Analysis (SNAKDD' 12), Beijing, China 14 "Collective Churn Prediction in Social Network", by Jayadi Oentaryo RICHARD, Ee Peng LIM, David LO, Feida ZHU, and Philips Kokoh PRASETYO, 08/2012, Proc of the 4th Int Conf on Advances in Social Networks Analysis and Mining (ASONAM'12), Istanbul, Turkey 15 "Instance-specific Parameter Tuning via Constraint-based Clustering", by Lindawati LINDAWATI, Hoong Chuin LAU, and Feida ZHU, 08/2012, Proc of the 1st Int Workshop on Combining COnstraint solving with MIning and LEarning(CoCoMile' 12) joint with ECAI 2012, Montpellier, France 16 "Finding Bursty Topics From Microblogs", by Qiming DIAO, Jing JIANG, Feida ZHU, and Ee Peng LIM, 07/2012, , 50th Annual Meeting of the Association for Computational Linguistics (ACL 12), Jeju Island, Korea 17 "Detecting Anomalous Twitter Users by Extreme Group Behaviors", by Hanbo DAI, Ee Peng LIM, Feida ZHU, and Hwee Hwa PANG, 07/2012, Proc of the 2012 ACM Int Conf on Net Science (NetSci' 12), Chicago, Illinois, USA 18 "Detecting Extreme Rank Anomalous Collections", by Hanbo DAI, Feida ZHU, Ee Peng LIM, and Hwee Hwa PANG, 04/2012, SIAM International Conference on Data Mining (SDM 12), Anaheim, California, USA 19 "When a Friend in Twitter is a Friend in Life", by Wei XIE, Cheng LI, Feida ZHU, Ee Peng LIM, and Xueqing GONG, 04/2012, the 4th ACM Int Conf on Web Science (WebSci' 12), Chicago, Iillinois, USA 20 Mining Top-K Large Structural Patterns In Massive Networks, by Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han and Philip Yu, in Proc 2011 Int Conf on Very Large Data Base (VLDB 11), USA, August, "Mining Diversity On Networks", by Liu Lu, Feida Zhu, Chen Chen, Xifeng Yan, Jiawei Han, Philip S Yu, and Shiqiang Yang, in Proc 2010 Int Conf on Database Systems for Advanced Applications (DASFAA'10), Japan, April, "Efficient Topological OLAP on Information Networks", by Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu and Hongyan Li, in Proc 2011 Int Conf on Database Systems for Advanced Applications (DASFAA'11), Hong Kong, April, "Top-K Aggregation Queries over Large Networks", by Xifeng Yan, Bin He, Feida Zhu, and Jiawei Han, in Proc 2010 International Conference on Data Engineering (ICDE '10), USA, March gprune: A Constraint Pushing Framework for Graph Pattern Mining, by Feida Zhu, Xifeng Yan, Jiawei Han, and Philip S Yu, Proc of the 11th Pacific-Asia Conf on Knowledge Discovery and Data Mining (PAKDD'07), Nanjing, China, May
7 25 Mining Colossal Frequent Patterns by Core Pattern Fusion, by Feida Zhu, Xifeng Yan, Jiawei Han, Philip S Yu, and Hong Cheng, Proc of the 23th Int Conf on Data Engineering (ICDE'07), Istanbul, Turkey, April
PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL
Journal homepage: www.mjret.in ISSN:2348-6953 PULLING OUT OPINION TARGETS AND OPINION WORDS FROM REVIEWS BASED ON THE WORD ALIGNMENT MODEL AND USING TOPICAL WORD TRIGGER MODEL Utkarsha Vibhute, Prof. Soumitra
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518
International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,
More informationDate: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea
Microsoft Research Yonsei University Joint Workshop Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea PROGRAM Time 14:00 ~ 14:10
More informationUser Modeling in Big Data. Qiang Yang, Huawei Noah s Ark Lab and Hong Kong University of Science and Technology 杨 强, 华 为 诺 亚 方 舟 实 验 室, 香 港 科 大
User Modeling in Big Data Qiang Yang, Huawei Noah s Ark Lab and Hong Kong University of Science and Technology 杨 强, 华 为 诺 亚 方 舟 实 验 室, 香 港 科 大 Who we are: Noah s Ark LAB Have you watched the movie 2012?
More informationLearn Software Microblogging - A Review of This paper
2014 4th IEEE Workshop on Mining Unstructured Data An Exploratory Study on Software Microblogger Behaviors Abstract Microblogging services are growing rapidly in the recent years. Twitter, one of the most
More informationDiscovering Social Media Experts by Integrating Social Networks and Contents
Proceedings of the Twenty-Third Australasian Database Conference (ADC 2012), Melbourne, Australia Discovering Social Media Experts by Integrating Social Networks and Contents Zhao Zhang Bin Zhao Weining
More informationMimicking human fake review detection on Trustpilot
Mimicking human fake review detection on Trustpilot [DTU Compute, special course, 2015] Ulf Aslak Jensen Master student, DTU Copenhagen, Denmark Ole Winther Associate professor, DTU Copenhagen, Denmark
More informationTowards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis
Towards SoMEST Combining Social Media Monitoring with Event Extraction and Timeline Analysis Yue Dai, Ernest Arendarenko, Tuomo Kakkonen, Ding Liao School of Computing University of Eastern Finland {yvedai,
More informationData Mining: Opportunities and Challenges
Data Mining: Opportunities and Challenges Xindong Wu University of Vermont, USA; Hefei University of Technology, China ( 合 肥 工 业 大 学 计 算 机 应 用 长 江 学 者 讲 座 教 授 ) 1 Deduction Induction: My Research Background
More informationTAAI 2012 Panel Discussion: Big Data. About Me: Chin Yew Lin
TAAI 2012 Panel Discussion: Big Data Chin Yew Lin cyl@microsoft.com Microsoft Research Asia About Me: Chin Yew Lin Senior Researcher, Knowledge Mining Group, Microsoft Research Asia Areas of Interest Natural
More informationA GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize
More informationIntroduction. Chapter 1
This chapter is from Social Media Mining: An Introduction. By Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. Cambridge University Press, 2014. Draft version: April 20, 2014. Complete Draft and Slides
More informationWeb Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113
CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/
More informationCommunity Mining from Multi-relational Networks
Community Mining from Multi-relational Networks Deng Cai 1, Zheng Shao 1, Xiaofei He 2, Xifeng Yan 1, and Jiawei Han 1 1 Computer Science Department, University of Illinois at Urbana Champaign (dengcai2,
More informationMATTEO RIONDATO Curriculum vitae
MATTEO RIONDATO Curriculum vitae 100 Avenue of the Americas, 16 th Fl. New York, NY 10013, USA +1 646 292 6641 riondato@acm.org http://matteo.rionda.to EDUCATION Ph.D. Computer Science, Brown University,
More informationTop Top 10 Algorithms in Data Mining
ICDM 06 Panel on Top Top 10 Algorithms in Data Mining 1. The 3-step identification process 2. The 18 identified candidates 3. Algorithm presentations 4. Top 10 algorithms: summary 5. Open discussions ICDM
More informationPartially Supervised Word Alignment Model for Ranking Opinion Reviews
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-4 E-ISSN: 2347-2693 Partially Supervised Word Alignment Model for Ranking Opinion Reviews Rajeshwari
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationInternational Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
More informationInternational Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
More informationAN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS
Chapter 1 AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS Charu C. Aggarwal IBM T. J. Watson Research Center Hawthorne, NY 10532 charu@us.ibm.com Abstract The advent of online social networks has been
More informationCollege information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
More informationAN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP
AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP Asst.Prof Mr. M.I Peter Shiyam,M.E * Department of Computer Science and Engineering, DMI Engineering college, Aralvaimozhi.
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationBig Data Analytics in Mobile Environments
1 Big Data Analytics in Mobile Environments 熊 辉 教 授 罗 格 斯 - 新 泽 西 州 立 大 学 2012-10-2 Rutgers, the State University of New Jersey Why big data: historical view? Productivity versus Complexity (interrelatedness,
More information(b) How data mining is different from knowledge discovery in databases (KDD)? Explain.
Q2. (a) List and describe the five primitives for specifying a data mining task. Data Mining Task Primitives (b) How data mining is different from knowledge discovery in databases (KDD)? Explain. IETE
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
More informationSocial Influence Analysis in Social Networking Big Data: Opportunities and Challenges. Presenter: Sancheng Peng Zhaoqing University
Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges Presenter: Sancheng Peng Zhaoqing University 1 2 3 4 35 46 7 Contents Introduction Relationship between SIA and BD
More informationII. OLAP(ONLINE ANALYTICAL PROCESSING)
Association Rule Mining Method On OLAP Cube Jigna J. Jadav*, Mahesh Panchal** *( PG-CSE Student, Department of Computer Engineering, Kalol Institute of Technology & Research Centre, Gujarat, India) **
More informationData Mining & Data Stream Mining Open Source Tools
Data Mining & Data Stream Mining Open Source Tools Darshana Parikh, Priyanka Tirkha Student M.Tech, Dept. of CSE, Sri Balaji College Of Engg. & Tech, Jaipur, Rajasthan, India Assistant Professor, Dept.
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationPrinciples of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n
Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
More informationA Process Driven Architecture of Analytical CRM Systems with Implementation in Bank Industry
International Journal of Intelligent Information Technology Application 1:1 (2008) 48-52 Available at http://www.engineering-press.org/ijiita.htm A Process Driven Architecture of Analytical CRM Systems
More informationBlog Post Extraction Using Title Finding
Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School
More informationSocial Computing: Challenges in Research and Applications
Social Computing: Challenges in Research and Applications Huan Liu, Shamanth Kumar, Fred Morstatters Conducting state-of-the-art research in data mining and machine learning, social computing, and artificial
More information1 Results from Prior Support
1 Results from Prior Support Dr. Shashi Shekhar s work has been supported by multiple NSF grants [21, 23, 18, 14, 15, 16, 17, 19, 24, 22]. His most recent grant relating to spatiotemporal network databases
More informationData Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise
More informationTop 10 Algorithms in Data Mining
Top 10 Algorithms in Data Mining Xindong Wu ( 吴 信 东 ) Department of Computer Science University of Vermont, USA; 合 肥 工 业 大 学 计 算 机 与 信 息 学 院 1 Top 10 Algorithms in Data Mining by the IEEE ICDM Conference
More informationFlorida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationMorteza Zihayat Curriculum Vitae October 2015
Morteza Zihayat Curriculum Vitae October 2015 Contact Information Ph.D Candidate Phone: (+1) 647-831-6167 E-mail: zihayatm@cse.yorku.ca 4700 Keele St. Room LS2057 Website: http://www.cse.yorku.ca/~zihayatm/
More informationProject Participants
Annual Report for Period:10/2006-09/2007 Submitted on: 08/15/2007 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of
More informationWorkshop on Internet and BigData Finance (WIBF)
Workshop on Internet and BigData Finance (WIBF) Central University of Finance and Economics June 11-12, 2015 In a 2013 study, IBM found that 71 percent of the banking and financial firms report that the
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationMachine Learning Department, School of Computer Science, Carnegie Mellon University, PA
Pengtao Xie Carnegie Mellon University Machine Learning Department School of Computer Science 5000 Forbes Ave Pittsburgh, PA 15213 Tel: (412) 916-9798 Email: pengtaox@cs.cmu.edu Web: http://www.cs.cmu.edu/
More informationMicroblog Sentiment Analysis with Emoticon Space Model
Microblog Sentiment Analysis with Emoticon Space Model Fei Jiang, Yiqun Liu, Huanbo Luan, Min Zhang, and Shaoping Ma State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory
More informationA Comparative Study on Sentiment Classification and Ranking on Product Reviews
A Comparative Study on Sentiment Classification and Ranking on Product Reviews C.EMELDA Research Scholar, PG and Research Department of Computer Science, Nehru Memorial College, Putthanampatti, Bharathidasan
More informationRESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE
RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming Institute of GIS, Chinese Academy of Surveying and Mapping No.16, Road Beitaiping, District Haidian, Beijing, P.R.China,
More informationA Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,
More informationData Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Chapter 1 Introduction SURESH BABU M ASST PROF IT DEPT VJIT 1 Chapter 1. Introduction Motivation: Why data mining? What is data mining? Data Mining: On what kind of
More informationResearch of Postal Data mining system based on big data
3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication
More informationBig Data in Pictures: Data Visualization
Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of
More informationWeb Database Integration
Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationHow To Use Data Mining For Knowledge Management In Technology Enhanced Learning
Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning
More informationA Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,
More informationThe 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China
WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human
More informationA Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments
A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments incent S. Tseng 1, Eric Hsueh-Chan Lu 1, Chia-Ming Tsai 1, and Chun-Hung Wang 1 Department of Computer Science and Information
More informationMINING CLICKSTREAM-BASED DATA CUBES
MINING CLICKSTREAM-BASED DATA CUBES Ronnie Alves and Orlando Belo Departament of Informatics,School of Engineering, University of Minho Campus de Gualtar, 4710-057 Braga, Portugal Email: {alvesrco,obelo}@di.uminho.pt
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 1
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 1 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationJiliang Tang. 701 First Avenue Yahoo!, Voice: (408) 744-2053 E-mail: jlt@yahoo-inc.com Sunnyvale, CA, 94089 US. Contact Information
Jiliang Tang Contact Information Research Interests 701 First Avenue Yahoo!, Voice: (408) 744-2053 Yahoo Labs E-mail: jlt@yahoo-inc.com Sunnyvale, CA, 94089 US URL: http://www.public.asu.edu/~jtang20 Data
More informationPredicting Information Popularity Degree in Microblogging Diffusion Networks
Vol.9, No.3 (2014), pp.21-30 http://dx.doi.org/10.14257/ijmue.2014.9.3.03 Predicting Information Popularity Degree in Microblogging Diffusion Networks Wang Jiang, Wang Li * and Wu Weili College of Computer
More informationResearch Statement: Human-Powered Information Management Aditya Parameswaran (www.stanford.edu/ adityagp)
Research Statement: Human-Powered Information Management Aditya Parameswaran (www.stanford.edu/ adityagp) My research broadly revolves around information management, with special emphasis on incorporating
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationResearch Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
More informationKaiquan Xu, Associate Professor, Nanjing University. Kaiquan Xu
Kaiquan Xu Marketing & ebusiness Department, Business School, Nanjing University Email: xukaiquan@nju.edu.cn Tel: +86-25-83592129 Employment Associate Professor, Marketing & ebusiness Department, Nanjing
More informationMining Mobile Group Patterns: A Trajectory-Based Approach
Mining Mobile Group Patterns: A Trajectory-Based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu, and Ee-Peng Lim Department of Information Management National Sun Yat-Sen University, Kaohsiung, Taiwan
More informationContent-Based Discovery of Twitter Influencers
Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy irma.metra@mail.polimi.it chiara.francalanci@polimi.it
More informationMining Association Rules: A Database Perspective
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology
More informationThe multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) The multilayer sentiment analysis model based on Random forest Wei Liu1, Jie Zhang2 1 School of
More informationCurriculum Vitae. Summer internship in a financial company that is active in quantitative analysis or development of quantitative
Curriculum Vitae XIAOXIAO SHI Department of Computer Science University of Illinois at Chicago Office: 851 S. Morgan St., Rm 1336 SEO, Chicago, IL 60607 xshi9@uic.edu, xiao.x.shi@gmail.com (preferred)
More informationData Mining in the Application of Criminal Cases Based on Decision Tree
8 Journal of Computer Science and Information Technology, Vol. 1 No. 2, December 2013 Data Mining in the Application of Criminal Cases Based on Decision Tree Ruijuan Hu 1 Abstract A briefing on data mining
More informationRESEARCH INTERESTS Modeling and Simulation, Complex Systems, Biofabrication, Bioinformatics
FENG GU Assistant Professor of Computer Science College of Staten Island, City University of New York 2800 Victory Boulevard, Staten Island, NY 10314 Doctoral Faculty of Computer Science Graduate Center
More informationMining Signatures in Healthcare Data Based on Event Sequences and its Applications
Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1
More informationAvailable online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science
Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia
More informationSEARCH ENGINE OPTIMIZATION USING D-DICTIONARY
SEARCH ENGINE OPTIMIZATION USING D-DICTIONARY G.Evangelin Jenifer #1, Mrs.J.Jaya Sherin *2 # PG Scholar, Department of Electronics and Communication Engineering(Communication and Networking), CSI Institute
More informationManagement of Human Resource Information Using Streaming Model
, pp.75-80 http://dx.doi.org/10.14257/astl.2014.45.15 Management of Human Resource Information Using Streaming Model Chen Wei Chongqing University of Posts and Telecommunications, Chongqing 400065, China
More informationA Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
More informationThe Design Study of High-Quality Resource Shared Classes in China: A Case Study of the Abnormal Psychology Course
The Design Study of High-Quality Resource Shared Classes in China: A Case Study of the Abnormal Psychology Course Juan WANG College of Educational Science, JiangSu Normal University, Jiangsu, Xuzhou, China
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationHow To Create A Text Classification System For Spam Filtering
Term Discrimination Based Robust Text Classification with Application to Email Spam Filtering PhD Thesis Khurum Nazir Junejo 2004-03-0018 Advisor: Dr. Asim Karim Department of Computer Science Syed Babar
More informationBPOE Research Highlights
BPOE Research Highlights Jianfeng Zhan ICT, Chinese Academy of Sciences 2013-10- 9 http://prof.ict.ac.cn/jfzhan INSTITUTE OF COMPUTING TECHNOLOGY What is BPOE workshop? B: Big Data Benchmarks PO: Performance
More informationJiexun Li, Ph.D. College of Information Science and Technology, Drexel University, Philadelphia, PA
EDUCATION Jiexun Li, Ph.D. Assistant Professor College of Information Science and Technology Drexel University, Philadelphia, PA 19104 Phone: (215) 895-1459 Fax: (215) 895-2494 Email: jiexun.li@ischool.drexel.edu
More informationEmoticon Smoothed Language Models for Twitter Sentiment Analysis
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Emoticon Smoothed Language Models for Twitter Sentiment Analysis Kun-Lin Liu, Wu-Jun Li, Minyi Guo Shanghai Key Laboratory of
More informationText Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com
Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around
More information民 國 九 十 七 年 四 月 第 38 卷 第 2 期
民 國 九 十 七 年 四 月 第 38 卷 第 2 期 1============================================================ Inside of Internet Data Nien-Yi Jan Ming-Tsung Chen Wan-Ting Chang Wei Shen Chow Along with the Internet technology
More informationRobust Outlier Detection Technique in Data Mining: A Univariate Approach
Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,
More informationIMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD
Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated
More informationConTag: Conceptual Tag Clouds Video Browsing in e-learning
ConTag: Conceptual Tag Clouds Video Browsing in e-learning 1 Ahmad Nurzid Rosli, 2 Kee-Sung Lee, 3 Ivan A. Supandi, 4 Geun-Sik Jo 1, First Author Department of Information Technology, Inha University,
More informationSearch Result Optimization using Annotators
Search Result Optimization using Annotators Vishal A. Kamble 1, Amit B. Chougule 2 1 Department of Computer Science and Engineering, D Y Patil College of engineering, Kolhapur, Maharashtra, India 2 Professor,
More informationStatistical Analysis and Visualization for Cyber Security
Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference
More informationIncSpan: Incremental Mining of Sequential Patterns in Large Database
IncSpan: Incremental Mining of Sequential Patterns in Large Database Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 hcheng3@uiuc.edu Xifeng
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More information