Inference Methods for Analyzing the Hidden Semantics in Big Data. Phuong LE-HONG
|
|
- Willis Franklin
- 8 years ago
- Views:
Transcription
1 Inference Methods for Analyzing the Hidden Semantics in Big Data Phuong LE-HONG
2 Introduction Grant proposal for basic research project Nafosted, months Principal Investigator: KhoatTQ, SoICT, HUST June 2014 Nafosted Proposal 2
3 Goal Develop a class of inference algorithms that enable us to explore and discover hidden structures (semantics) from massive text collections; to do accurate predictions in practical applications June 2014 Nafosted Proposal 3
4 Methodologies Key directions in Distributed Processing and Machine Learning: Topic modeling (Blei, 2012) Matrix factorization (Lee & Sung, 1999) Online learning (Hazan & Kale, 2012) Stochastic inference (Hoffman et al., 2013) June 2014 Nafosted Proposal 4
5 Applications Develop efficient methods for Question answering Text and web mining Recommendation systems Social network analysis June 2014 Nafosted Proposal 5
6 Literature Review Inferring hidden structures from data is an attractive research topic with many applications: Exploration of a century of scientific journals (Mimno, 2012; Blei & Lafferty, 2007) Exploration of a century of literature (Jockers & Mimno, 2013) Exploration of online forums/networks (Cao et al., 2011; Gerrish & Blei, 2012; Sun & Lin, 2013) Analyzing political opinions from online forums (Cao et al., 2011; Gerrish & Blei, 2012; Grimmer, 2010; Levy & Franklin, 2013) Analyzing behaviors and interests of online users (Gerrish & Blei, 2012; Sun & Lin, 2013; Wang et al., 2011) June 2014 Nafosted Proposal 6
7 Literature Review Many approaches: Bayesian networks (Darwiche, 2010) Gaussian graphical models (Hsieh et al., 2013) Topic modeling (Hofmann, 2001; Blei, 2012), Non-negative matrix factorization (NMF) (Lee & Seung, 1999; Wang et al., 2011) This project will use topic modeling and NMF as the main ways to develop efficient methods for analyzing big text collections. June 2014 Nafosted Proposal 7
8 Literature Review Inference for a document: Estimation of variables that are hidden in that document (topics, entities, entity relations) Inference for a dataset: Learning of the hidden structures (topics, topical networks, social communities, user trends) Inference is NP-hard (Sontag & Roy, 2011) June 2014 Nafosted Proposal 8
9 Literature Review Various methods for efficient inference have been proposed: Maximum likelihood estimation (ML) (Hofmann, 2001) Variational Bayesian (VB) (Blei et al., 2003) Collapsed variational Bayesian (CVB) (Asuncion et al., 2009) Collapsed Gibbs sampling (CGS) (Griffiths & Steyvers, 2004) Maximum a posteriori estimation (MAP) (Chien & Wu, 2008) June 2014 Nafosted Proposal 9
10 Literature Review Some remarks: Sampling-based methods are guaranteed to converge to the underlying distributions, but with unknown rate. VB and CVB are much faste CVB0 (Asuncion et al., 2009) often performs the best. June 2014 Nafosted Proposal 10
11 Literature Review Over 20 years of development, many open problems. Accuracy of inferring a model from data Attacked by (Arora et al., 2012; Arora et al., 2013; Anandkumar et al., 2012), breakthrough results; But those results are limited to some restricted models under certain conditions. A large class of topic models and NMF still lack a theoretical guarantee. And those results do not cover inference for individual document. June 2014 Nafosted Proposal 11
12 Literature Review Previous works on processing big data collections: Focus mainly on utilizing parallel/distributed architectures Works well with million documents; Two main limitations: LDA models are dense, which might consume huge memory when the domain dimension is very large; Existing methods for inferring individual documents do not have any theoretical guarantee for neither inference quality nor inference time. June 2014 Nafosted Proposal 12
13 Five Problems P1: Can we develop a fast inference method that has provably theoretical guarantees on quality? P2: How can we learn a big topic model from big data? P3: Can we develop methods with provable guarantees on quality for handling streaming/dynamic text collections? June 2014 Nafosted Proposal 13
14 Five Problems P4: Can we develop an optimized big data processing framework to handle massive distributed computations of inference methods? P5: How can the hidden semantics recovered by our inference methods be useful in fundamental problems of NLP and IR? QA Text and web mining Recommendation June 2014 Nafosted Proposal 14
15 Three Groups Inference methods: TQ. Khoat, NK. Anh, NV. Linh P1, P2, P3 Large-scale computation: TV. Trung, NB. Minh, TQ. Khoat P3, P4 Applications: LH. Phuong, NV. Linh, NK. Anh, TQ. Khoat P1, P5 June 2014 Nafosted Proposal 15
16 Expected Results A fast inference method that has a theoretical guarantee on quality and is general enough to be easily employed in a large class of statistical models A family of methods for analyzing the hidden structures/semantics in text collections and nonnegative data A provably fast method that enables us to work with streaming/dynamic text collections and non-negative data. June 2014 Nafosted Proposal 16
17 Expected Results A new theory that enables us to design fast algorithms for non-convex inference problems, which appear in a large class of probabilistic models New effective methods for practical applications such as question answering, text & web mining, recommendation, social network analysis June 2014 Nafosted Proposal 17
18 Expected Results Publications: Articles in ISI-covered journals: 2 National/International conferences: 5 Training results: Masters: 2 PhD: 3 June 2014 Nafosted Proposal 18
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationOnline Courses Recommendation based on LDA
Online Courses Recommendation based on LDA Rel Guzman Apaza, Elizabeth Vera Cervantes, Laura Cruz Quispe, José Ochoa Luna National University of St. Agustin Arequipa - Perú {r.guzmanap,elizavvc,lvcruzq,eduardo.ol}@gmail.com
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationPrediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationLatent Dirichlet Markov Allocation for Sentiment Analysis
Latent Dirichlet Markov Allocation for Sentiment Analysis Ayoub Bagheri Isfahan University of Technology, Isfahan, Iran Intelligent Database, Data Mining and Bioinformatics Lab, Electrical and Computer
More informationMultilingual Rules for Spam Detection
Multilingual Rules for Spam Detection Minh Tuan Vu 1, Quang Anh Tran 1, Frank Jiang 2 and Van Quan Tran 1 1 Faculty of Information Technology, Hanoi University, Hanoi, Vietnam 2 School of Engineering and
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationMachine Learning and Statistics: What s the Connection?
Machine Learning and Statistics: What s the Connection? Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh, UK August 2006 Outline The roots of machine learning
More informationTopical Authority Identification in Community Question Answering
Topical Authority Identification in Community Question Answering Guangyou Zhou, Kang Liu, and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences 95
More informationCSCI-599 Advanced Big Data Analytics
CSCI-599 Advanced Big Data Analytics 1. Basic Information Course: Advanced Data Analytics, CSCI-599 Place and time: TBA, Wed 2:00-4:40pm/ Fall Instructor: Yan Liu Assistant Professor of Computer Science
More informationDecision Support System For A Customer Relationship Management Case Study
61 Decision Support System For A Customer Relationship Management Case Study Ozge Kart 1, Alp Kut 1, and Vladimir Radevski 2 1 Dokuz Eylul University, Izmir, Turkey {ozge, alp}@cs.deu.edu.tr 2 SEE University,
More informationLearning outcomes. Knowledge and understanding. Competence and skills
Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationTensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationProbabilistic. review articles. Surveying a suite of algorithms that offer a solution to managing large document archives.
doi:10.1145/2133806.2133826 Surveying a suite of algorithms that offer a solution to managing large document archives. by David M. Blei Probabilistic Topic Models As our collective knowledge continues
More informationBig learning: challenges and opportunities
Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,
More informationBIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION
BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationNavigating the Local Modes of Big Data: The Case of. Topic Models
Navigating the Local Modes of Big Data: The Case of Topic Models Margaret E. Roberts, Brandon M. Stewart, and Dustin Tingley This draft: June 28, 2015 Prepared for Computational Social Science: Discovery
More informationMining Topics in Documents Standing on the Shoulders of Big Data. Zhiyuan (Brett) Chen and Bing Liu
Mining Topics in Documents Standing on the Shoulders of Big Data Zhiyuan (Brett) Chen and Bing Liu Topic Models Widely used in many applications Most of them are unsupervised However, topic models Require
More informationOn Smoothing and Inference for Topic Models
On Smoothing and Inference for Topic Models Arthur Asuncion, Max Welling, Padhraic Smyth Department of Computer Science University of California, Irvine Irvine, CA, USA {asuncion,welling,smyth}@ics.uci.edu
More information15.00 15.30 30 XML enabled databases. Non relational databases. Guido Rotondi
Programme of the ESTP training course on BIG DATA EFFECTIVE PROCESSING AND ANALYSIS OF VERY LARGE AND UNSTRUCTURED DATA FOR OFFICIAL STATISTICS Rome, 5 9 May 2014 Istat Piazza Indipendenza 4, Room Vanoni
More informationVariational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation
Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation Gautham J. Mysore gmysore@adobe.com Advanced Technology Labs, Adobe Systems Inc., San Francisco,
More informationComparative Analysis of EM Clustering Algorithm and Density Based Clustering Algorithm Using WEKA tool.
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 9, Issue 8 (January 2014), PP. 19-24 Comparative Analysis of EM Clustering Algorithm
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationWhen scientists decide to write a paper, one of the first
Colloquium Finding scientific topics Thomas L. Griffiths* and Mark Steyvers *Department of Psychology, Stanford University, Stanford, CA 94305; Department of Brain and Cognitive Sciences, Massachusetts
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More information01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.
(International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models
More informationSampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data
Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data (Oxford) in collaboration with: Minjie Xu, Jun Zhu, Bo Zhang (Tsinghua) Balaji Lakshminarayanan (Gatsby) Bayesian
More informationProbabilistic topic models for sentiment analysis on the Web
University of Exeter Department of Computer Science Probabilistic topic models for sentiment analysis on the Web Chenghua Lin September 2011 Submitted by Chenghua Lin, to the the University of Exeter as
More informationLearning to Read Between the Lines: The Aspect Bernoulli Model
Learning to Read Between the Lines: The Aspect Bernoulli Model A. Kabán E. Bingham T. Hirsimäki Abstract We present a novel probabilistic multiple cause model for binary observations. In contrast to other
More informationData Mining and Machine Learning in Bioinformatics
Data Mining and Machine Learning in Bioinformatics PRINCIPAL METHODS AND SUCCESSFUL APPLICATIONS Ruben Armañanzas http://mason.gmu.edu/~rarmanan Adapted from Iñaki Inza slides http://www.sc.ehu.es/isg
More informationMachine Learning Logistic Regression
Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.
More informationBig graphs: Theory and Practice, January 6-8, 2016, UC San Diego. Abstracts
Big graphs: Theory and Practice, January 6-8, 2016, UC San Diego Anima Anandkumar (UC Irvine) Abstracts Learning mixed membership community models via spectral methods Abstract: Learning hidden communities
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationTopic Modeling using Topics from Many Domains, Lifelong Learning and Big Data
Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data Zhiyuan Chen Department of Computer Science, University of Illinois at Chicago Bing Liu Department of Computer Science, University
More informationResearch Methods Courses
Research Methods Courses ACCTG 501 ADTED 550 ADTED 551 A ED 502 AEE 520 AEE 521 AEREC 510 AEREC 511 APLNG 578 APLNG 581 BB H 505 Research Methods in Accounting Qualitative Research in Adult Ed (Introduction
More informationISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS
Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED
More informationPREA: Personalized Recommendation Algorithms Toolkit
Journal of Machine Learning Research 13 (2012) 2699-2703 Submitted 7/11; Revised 4/12; Published 9/12 PREA: Personalized Recommendation Algorithms Toolkit Joonseok Lee Mingxuan Sun Guy Lebanon College
More informationDirichlet Processes A gentle tutorial
Dirichlet Processes A gentle tutorial SELECT Lab Meeting October 14, 2008 Khalid El-Arini Motivation We are given a data set, and are told that it was generated from a mixture of Gaussian distributions.
More informationTopic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationJournal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationBig Data from a Database Theory Perspective
Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationChapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network
Chapter 2 The Research on Fault Diagnosis of Building Electrical System Based on RBF Neural Network Qian Wu, Yahui Wang, Long Zhang and Li Shen Abstract Building electrical system fault diagnosis is the
More informationLearning to Suggest Questions in Online Forums
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Learning to Suggest Questions in Online Forums Tom Chao Zhou 1, Chin-Yew Lin 2,IrwinKing 3, Michael R. Lyu 1, Young-In Song 2
More informationSimilarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle
Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential
More informationStatistical Machine Translation: IBM Models 1 and 2
Statistical Machine Translation: IBM Models 1 and 2 Michael Collins 1 Introduction The next few lectures of the course will be focused on machine translation, and in particular on statistical machine translation
More informationA Variational Approximation for Topic Modeling of Hierarchical Corpora
A Variational Approximation for Topic Modeling of Hierarchical Corpora Do-kyum Kim dok027@cs.ucsd.edu Geoffrey M. Voelker voelker@cs.ucsd.edu Lawrence K. Saul saul@cs.ucsd.edu Department of Computer Science
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationBayesian Predictive Profiles with Applications to Retail Transaction Data
Bayesian Predictive Profiles with Applications to Retail Transaction Data Igor V. Cadez Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. icadez@ics.uci.edu Padhraic
More informationSMTP: Stedelijk Museum Text Mining Project
SMTP: Stedelijk Museum Text Mining Project Jeroen Smeets Maastricht University smeetsjeroen@hotmail.com Prof. Dr. Ir. Johannes C. Scholtes Maastricht University j.scholtes@maastrichtuniversity.nl Dr. Claartje
More informationDetecting client-side e-banking fraud using a heuristic model
Detecting client-side e-banking fraud using a heuristic model Tim Timmermans tim.timmermans@os3.nl Jurgen Kloosterman jurgen.kloosterman@os3.nl University of Amsterdam July 4, 2013 Tim Timmermans, Jurgen
More informationHigh Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationMassive Cloud Auditing using Data Mining on Hadoop
Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed
More informationTRENDS IN AN INTERNATIONAL INDUSTRIAL ENGINEERING RESEARCH JOURNAL: A TEXTUAL INFORMATION ANALYSIS PERSPECTIVE. wernervanzyl@sun.ac.
TRENDS IN AN INTERNATIONAL INDUSTRIAL ENGINEERING RESEARCH JOURNAL: A TEXTUAL INFORMATION ANALYSIS PERSPECTIVE J.W. Uys 1 *, C.S.L. Schutte 2 and W.D. Van Zyl 3 1 Indutech (Pty) Ltd., Stellenbosch, South
More informationBayesian Factorization Machines
Bayesian Factorization Machines Christoph Freudenthaler, Lars Schmidt-Thieme Information Systems & Machine Learning Lab University of Hildesheim 31141 Hildesheim {freudenthaler, schmidt-thieme}@ismll.de
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationDissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES. Eva Hörster
Dissertation TOPIC MODELS FOR IMAGE RETRIEVAL ON LARGE-SCALE DATABASES Eva Hörster Department of Computer Science University of Augsburg Adviser: Readers: Prof. Dr. Rainer Lienhart Prof. Dr. Rainer Lienhart
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationStock Option Pricing Using Bayes Filters
Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationDistributed forests for MapReduce-based machine learning
Distributed forests for MapReduce-based machine learning Ryoji Wakayama, Ryuei Murata, Akisato Kimura, Takayoshi Yamashita, Yuji Yamauchi, Hironobu Fujiyoshi Chubu University, Japan. NTT Communication
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationA Bayesian Topic Model for Spam Filtering
Journal of Information & Computational Science 1:12 (213) 3719 3727 August 1, 213 Available at http://www.joics.com A Bayesian Topic Model for Spam Filtering Zhiying Zhang, Xu Yu, Lixiang Shi, Li Peng,
More informationOnline Optimization and Personalization of Teaching Sequences
Online Optimization and Personalization of Teaching Sequences Benjamin Clément 1, Didier Roy 1, Manuel Lopes 1, Pierre-Yves Oudeyer 1 1 Flowers Lab Inria Bordeaux Sud-Ouest, Bordeaux 33400, France, didier.roy@inria.fr
More informationDate: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea
Microsoft Research Yonsei University Joint Workshop Date: May 6 (Wednesday), 2015, 14:00 ~ 18:00 Venue: Room No. 201, Engineering Building 2, Yonsei University, Seoul, Korea PROGRAM Time 14:00 ~ 14:10
More informationMACHINE LEARNING BASICS WITH R
MACHINE LEARNING [Hands-on Introduction of Supervised Machine Learning Methods] DURATION 2 DAY The field of machine learning is concerned with the question of how to construct computer programs that automatically
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationInteroperability and Analytics February 29, 2016
Interoperability and Analytics February 29, 2016 Matthew Hoffman MD, CMIO Utah Health Information Network Conflict of Interest Matthew Hoffman, MD Has no real or apparent conflicts of interest to report.
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
More informationAugmented Search for Software Testing
Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,
More informationTensor Factorization for Multi-Relational Learning
Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationMachine learning in financial forecasting. Haindrich Henrietta Vezér Evelin
Machine learning in financial forecasting Haindrich Henrietta Vezér Evelin Contents Financial forecasting Window Method Machine learning-past and future MLP (Multi-layer perceptron) Gaussian Process Bibliography
More informationForecasting Trade Direction and Size of Future Contracts Using Deep Belief Network
Forecasting Trade Direction and Size of Future Contracts Using Deep Belief Network Anthony Lai (aslai), MK Li (lilemon), Foon Wang Pong (ppong) Abstract Algorithmic trading, high frequency trading (HFT)
More informationScaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce
Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu
More informationRole Description. Position of a Data Scientist Machine Learning at Fractal Analytics
Opportunity to work with leading analytics firm that creates Insights, Impact and Innovation. Role Description Position of a Data Scientist Machine Learning at Fractal Analytics March 2014 About the Company
More informationHT2015: SC4 Statistical Data Mining and Machine Learning
HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric
More informationData Science at U of U
Data Science at U of U Je M. Phillips Assistant Professor, School of Computing Center for Extreme Data Management, Analysis, and Visualization Director, Data Management and Analysis Track University of
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationA Stock Trading Algorithm Model Proposal, based on Technical Indicators Signals
Informatica Economică vol. 15, no. 1/2011 183 A Stock Trading Algorithm Model Proposal, based on Technical Indicators Signals Darie MOLDOVAN, Mircea MOCA, Ştefan NIŢCHI Business Information Systems Dept.
More informationLecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions
SMA 50: Statistical Learning and Data Mining in Bioinformatics (also listed as 5.077: Statistical Learning and Data Mining ()) Spring Term (Feb May 200) Faculty: Professor Roy Welsch Wed 0 Feb 7:00-8:0
More informationMaster of Science in Computer Science
Master of Science in Computer Science Background/Rationale The MSCS program aims to provide both breadth and depth of knowledge in the concepts and techniques related to the theory, design, implementation,
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationDoctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
More informationA Hybrid Neural Network-Latent Topic Model
Li Wan Leo Zhu Rob Fergus Dept. of Computer Science, Courant institute, New York University {wanli,zhu,fergus}@cs.nyu.edu Abstract This paper introduces a hybrid model that combines a neural network with
More informationA Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationApplied mathematics and mathematical statistics
Applied mathematics and mathematical statistics The graduate school is organised within the Department of Mathematical Sciences.. Deputy head of department: Aila Särkkä Director of Graduate Studies: Marija
More informationa Data Science initiative @ Univ. Piraeus [GR]
a Data Science initiative @ Univ. Piraeus [GR] The Data Science Lab members June 2015 What is Data Science source: quora.com! Looking at data! Tools and methods used to analyze large amounts of data! Anything
More information