# Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Save this PDF as:

Size: px
Start display at page:

Download "Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting"

## Transcription

1 Inf1-DA III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA III: 2 / 89 Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval Statistical Analysis of Data: III.2 Data scales and summary statistics III.3 Hypothesis testing and correlation III.4 χ 2 and collocations Inf1-DA III: 3 / 89 Staff-Student Liaison Meeting Today 1pm Informatics 1 teaching staff and student reps Send mail to the reps at if there with any comments you would like them to make at the meeting Coursework Assignment Three sample exam questions, download from course web page Due 4pm Friday 11 March, to box outside ITO Marked by tutors and returned for discussion in week 11 tutorial Not for credit; you can discuss and ask for help (do!)

2 Inf1-DA III: 4 / 89 Examples of Unstructured Data Plain text. There is structure, the sequence of characters, but this is intrinsic to the data, not imposed. We may wish to impose structure by, e.g., annotating (as in Part II). Bitmaps for graphics or pictures, digitized sound, digitized movies, etc. These again have intrinsic structure (e.g., picture dimensions). We may wish to impose structure by, e.g., recognising objects, isolating single instruments from music, etc. Experimental results. Here there may be structure in how represented (e.g., collection of points in n-dimensional space). But an important objective is to uncover implicit structure (e.g., confirm or refute an experimental hypothesis). Inf1-DA III: 5 / 89 Topics We consider two topics in dealing with unstructured data. 1. Information retrieval How to find data of interest in within a collection of unstructured data documents. 2. Statistical analysis of data How to use statistics to identify and extract properties from unstructured data (e.g., general trends, correlations between different components, etc.) Inf1-DA III: 6 / 89 Information Retrieval The Information retrieval (IR) task: given a query, find the documents in a given collection that are relevant to it. Assumptions: 1. There is a large document collection being searched. 2. The user has a need for particular information, formulated in terms of a query (typically keywords). 3. The task is to find all and only the documents relevant to the query. Example: Searching a library catalogue. Document collection to be searched: books and journals in library collection. Information needed: user specifies query giving details about author, title, subject or similar. Search program returns a list of (potentially) relevant matches.

3 Inf1-DA III: 7 / 89 Key issues for IR Specification issues: Evaluation: How to measure the performance of an IR system. Query type: How to formulate queries to an IR system. Retrieval model: How to find the best-matching document, and how to rank them in order of relevance. Implementation issues: Indexing: how to represent the documents searched by the system so that the search can be done efficiently. The goal of this lecture is to look at the three specification issues in more detail. Inf1-DA III: 8 / 89 Evaluation of IR The performance of an IR system is naturally evaluated in terms of two measures: Precision: What proportion of the documents returned by the system match the original objectives of the search. Recall: What proportion of the documents matching the objectives of the search are returned by the system. We call documents matching the objectives of the search relevant documents. Inf1-DA III: 9 / 89 True/false positives/negatives Relevant Non-relevant Retrieved true positives false positives Not retrieved false negatives true negatives True positives (TP): number of relevant documents that the system retrieved. False positives (FP): number of non-relevant documents that the system retrieved. True negatives (TN): number of non-relevant documents that the system did not retrieve. False negatives (FN): number of relevant documents that the system did not retrieve.

4 Inf1-DA III: 10 / 89 Defining precision and recall Relevant Non-relevant Retrieved true positives false positives Not retrieved false negatives true negatives Precision P = TP TP + FP Recall R = TP TP + FN Inf1-DA III: 11 / 89 Comparing 2 IR systems example Document collection with 130 documents. 28 documents relevant for a given theory. System 1: retrieves 25 documents, 16 of which are relevant TP 1 = 16, FP 1 = = 9, FN 1 = = 12 P 1 = TP 1 = 16 TP 1 + FP 1 25 = 0.64 R TP 1 1 = = 16 TP 1 + FN 1 28 = 0.57 System 2: retrieves 15 documents, 12 of which are relevant TP 2 = 12, FP 2 = = 3, FN 2 = = 16 P 2 = TP 2 = 12 TP 2 + FP 2 15 = 0.80 R TP 2 2 = = 12 TP 2 + FN 2 28 = 0.43 N.B. System 2 has higher precision. System 1 has higher recall. Inf1-DA III: 12 / 89 Precision versus Recall A system has to achieve both high precision and recall to perform well. It doesn t make sense to look at only one of the figures: If system returns all documents in the collection: 100% recall, but low precision. If system returns only one document, which is relevant: 100% precision, but low recall. Precision-recall tradeoff: System can optimize precision at the cost of recall, or increase recall at the cost of precision. Whether precision or recall is more important depends on the application of the system.

5 Inf1-DA III: 13 / 89 F-score The F-score is an evaluation measure that combines precision and recall. 1 F α = α 1 P + (1 α) 1 R Here α is a weighting factor with 0 α 1. High α means precision more important. Low α means recall is more important. Often α = 0.5 is used, giving the harmonic mean of P and R: F 0.5 = 2P R P + R Inf1-DA III: 14 / 89 Using F-score to compare example We compare the examples on slide III: 11 using the F-score (with α = 0.5). F 0.5 (System 1 ) = 2P 1R = = 0.60 P 1 + R F 0.5 (System 2 ) = 2P 2R = = 0.56 P 2 + R The F-score (with this weighting) rates System 1 as better than System 2.

### Performance Measures for Machine Learning

Performance Measures for Machine Learning 1 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall F Break Even Point ROC ROC Area 2 Accuracy Target: 0/1, -1/+1, True/False,

### Evaluation & Validation: Credibility: Evaluating what has been learned

Evaluation & Validation: Credibility: Evaluating what has been learned How predictive is a learned model? How can we evaluate a model Test the model Statistical tests Considerations in evaluating a Model

### Binary and Ranked Retrieval

Binary and Ranked Retrieval Binary Retrieval RSV(d i,q j ) {0,1} Does not allow the user to control the magnitude of the output. In fact, for a given query, the system may return under-dimensioned output

### Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

### Data Mining Practical Machine Learning Tools and Techniques

Counting the cost Data Mining Practical Machine Learning Tools and Techniques Slides for Section 5.7 In practice, different types of classification errors often incur different costs Examples: Loan decisions

### T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier. Santosh Tirunagari : 245577

T-61.3050 : Email Classification as Spam or Ham using Naive Bayes Classifier Santosh Tirunagari : 245577 January 20, 2011 Abstract This term project gives a solution how to classify an email as spam or

### Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

### COMP9417: Machine Learning and Data Mining

COMP9417: Machine Learning and Data Mining A note on the two-class confusion matrix, lift charts and ROC curves Session 1, 2003 Acknowledgement The material in this supplementary note is based in part

### Private Record Linkage with Bloom Filters

To appear in: Proceedings of Statistics Canada Symposium 2010 Social Statistics: The Interplay among Censuses, Surveys and Administrative Data Private Record Linkage with Bloom Filters Rainer Schnell,

### Lecture 5: Evaluation

Lecture 5: Evaluation Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk 1 Overview 1 Recap/Catchup

### Machine Learning (CS 567)

Machine Learning (CS 567) Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol Han (cheolhan@usc.edu)

### Machine Learning model evaluation. Luigi Cerulo Department of Science and Technology University of Sannio

Machine Learning model evaluation Luigi Cerulo Department of Science and Technology University of Sannio Accuracy To measure classification performance the most intuitive measure of accuracy divides the

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

### Machine Learning. Topic: Evaluating Hypotheses

Machine Learning Topic: Evaluating Hypotheses Bryan Pardo, Machine Learning: EECS 349 Fall 2011 How do you tell something is better? Assume we have an error measure. How do we tell if it measures something

### Validation parameters: An introduction to measures of

Validation parameters: An introduction to measures of test accuracy Types of tests All tests are fundamentally quantitative Sometimes we use the quantitative result directly However, it is often necessary

### boolean retrieval some slides courtesy James

boolean retrieval some slides courtesy James Allan@umass 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the properties of

### Economic Statistics (ECON2006), Statistics and Research Design in Psychology (PSYC2010), Survey Design and Analysis (SOCI2007)

COURSE DESCRIPTION Title Code Level Semester Credits 3 Prerequisites Post requisites Introduction to Statistics ECON1005 (EC160) I I None Economic Statistics (ECON2006), Statistics and Research Design

### AMS 5 Statistics. Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering. July 11, 2008

AMS 5 Statistics Instructor: Bruno Mendes mendes@ams.ucsc.edu, Office 141 Baskin Engineering July 11, 2008 Course contents and objectives Our main goal is to help a student develop a feeling for experimental

### Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.

Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 0 Organizational Issues Lecture 21.10.2014 03.02.2015

### COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection.

COMP 551 Applied Machine Learning Lecture 6: Performance evaluation. Model assessment and selection. Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551 Unless otherwise

### ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

ROC Graphs: Notes and Practical Considerations for Data Mining Researchers Tom Fawcett 1 1 Intelligent Enterprise Technologies Laboratory, HP Laboratories Palo Alto Alexandre Savio (GIC) ROC Graphs GIC

### Performance Measures in Data Mining

Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter J.M. Cejuela Department of Computer Science Technische Universität München

### Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!

### Web Forensic Evidence of SQL Injection Analysis

International Journal of Science and Engineering Vol.5 No.1(2015):157-162 157 Web Forensic Evidence of SQL Injection Analysis 針 對 SQL Injection 攻 擊 鑑 識 之 分 析 Chinyang Henry Tseng 1 National Taipei University

### E-discovery Taking Predictive Coding Out of the Black Box

E-discovery Taking Predictive Coding Out of the Black Box Joseph H. Looby Senior Managing Director FTI TECHNOLOGY IN CASES OF COMMERCIAL LITIGATION, the process of discovery can place a huge burden on

### COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

Course: IEOR 4575 Business Analytics for Operations Research Lectures MW 2:40-3:55PM Instructor Prof. Guillermo Gallego Office Hours Tuesdays: 3-4pm Office: CEPSR 822 (8 th floor) Textbooks and Learning

### So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

### Search and Information Retrieval

Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

### Implementing Heuristic Miner for Different Types of Event Logs

Implementing Heuristic Miner for Different Types of Event Logs Angelina Prima Kurniati 1, GunturPrabawa Kusuma 2, GedeAgungAry Wisudiawan 3 1,3 School of Compuing, Telkom University, Indonesia. 2 School

### 5.5 Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall. Figure 5-2

Class Announcements TIM 50 - Business Information Systems Lecture 15 Database Assignment 2 posted Due Tuesday 5/26 UC Santa Cruz May 19, 2015 Database: Collection of related files containing records on

### Information Retrieval. Lecture 8 - Relevance feedback and query expansion. Introduction. Overview. About Relevance Feedback. Wintersemester 2007

Information Retrieval Lecture 8 - Relevance feedback and query expansion Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 32 Introduction An information

### Evaluation and Credibility. How much should we believe in what was learned?

Evaluation and Credibility How much should we believe in what was learned? Outline Introduction Classification with Train, Test, and Validation sets Handling Unbalanced Data; Parameter Tuning Cross-validation

### INSTITUTE of TECHNOLOGY CARLOW Intelligent Anti-Spam Technology. User Manual

INSTITUTE of TECHNOLOGY CARLOW Intelligent Anti-Spam Technology User Manual Author: CHEN LIU (C00140374) Supervisor: Paul Barry Date: 16 th April 2010 Content 1. System Requirement... 3 2. Interface and

### Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

### MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample

MATH 10: Elementary Statistics and Probability Chapter 9: Hypothesis Testing with One Sample Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of

### Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992

Downloaded from UvA-DARE, the institutional repository of the University of Amsterdam (UvA) http://hdl.handle.net/11245/2.122992 File ID Filename Version uvapub:122992 1: Introduction unknown SOURCE (OR

### Big Data in Education

Big Data in Education Alex J. Bowers, Ph.D. Associate Professor of Education Leadership Teachers College, Columbia University April 24, 2014 Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight

### Chris Slaughter, DrPH. GI Research Conference June 19, 2008

Chris Slaughter, DrPH Assistant Professor, Department of Biostatistics Vanderbilt University School of Medicine GI Research Conference June 19, 2008 Outline 1 2 3 Factors that Impact Power 4 5 6 Conclusions

### Online Reviews as First Class Artifacts in Mobile App Development

Online Reviews as First Class Artifacts in Mobile App Development Claudia Iacob (1), Rachel Harrison (1), Shamal Faily (2) (1) Oxford Brookes University Oxford, United Kingdom {iacob, rachel.harrison}@brookes.ac.uk

### Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets

Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University

### Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

### Using Exploration and Learning for Medical Records Search: An Experiment in Identifying Cohorts for Comparative Effectiveness Research.

Using Exploration and Learning for Medical Records Search: An Experiment in Identifying Cohorts for Comparative Effectiveness Research By Harvey Hyman, PhD University of South Florida HymanPhD@gmail.com

### PERSONAL DEVELOPMENT PLANNING - SKILLS AUDIT

PERSONAL DEVELOPMENT PLANNING - SKILLS AUDIT Name: Course: Year: Date: Section 1 Consider the skills listed below and rate how important you think each skill is for employment and how competent you are

### TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms

TRTML - A Tripleset Recommendation Tool based on Supervised Learning Algorithms Alexander Arturo Mera Caraballo 1, Narciso Moura Arruda Júnior 2, Bernardo Pereira Nunes 1, Giseli Rabello Lopes 1, Marco

### Probability Concepts Probability Distributions. Margaret Priest (margaret.priest@wgc.school.nz) Nokuthaba Sibanda (nokuthaba.sibanda@vuw.ac.

Probability Concepts Probability Distributions Margaret Priest (margaret.priest@wgc.school.nz) Nokuthaba Sibanda (nokuthaba.sibanda@vuw.ac.nz) Our Year 13 students who are most likely to want to continue

### Statistics in Medicine Research Lecture Series CSMC Fall 2014

Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power

### Model Building and Gains from Trade

Model Building and Gains from Trade Previously... Economics is the study of how people allocate their limited resources to satisfy their nearly unlimited wants. Scarcity refers to the limited nature of

### Course outline. Code: INF701 Title: Management Informatics

Course outline Code: INF701 Title: Management Informatics Faculty of: Arts and Business School of Business Teaching Session: Semester 1 Year: 2015 Course Coordinator: Associate Professor Donald Kerr Office:

### THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS. Fall 2013

THE UNIVERSITY OF TEXAS AT TYLER COLLEGE OF NURSING 1 COURSE SYLLABUS NURS 5317 STATISTICS FOR HEALTH PROVIDERS Fall 2013 & Danice B. Greer, Ph.D., RN, BC dgreer@uttyler.edu Office BRB 1115 (903) 565-5766

### Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

### Software-assisted document review: An ROI your GC can appreciate. kpmg.com

Software-assisted document review: An ROI your GC can appreciate kpmg.com b Section or Brochure name Contents Introduction 4 Approach 6 Metrics to compare quality and effectiveness 7 Results 8 Matter 1

### Evaluating Teaching Innovation

Evaluating Teaching Innovation By Richard James The evaluation of teaching and learning, and particularly of teaching and learning innovation, has become a priority in universities. The idea in writing

### Exercise 2: Pose Estimation and Step Counting from IMU Data

Exercise 2: Pose Estimation and Step Counting from IMU Data Due date: 11:59pm, Oct 21st, 2015 1 Introduction In this exercise, you will implement pose estimation and step counting. We strongly encourage

### Introduction to Machine Learning

Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 5: Decision Theory & ROC Curves Gaussian ML Estimation Many figures courtesy Kevin Murphy s textbook,

### MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

### Course outline. Code: BUS706 Title: International Business Law and Ethics

Course outline Code: BUS706 Title: International Business Law and Ethics Faculty of: Arts and Business School of Business Teaching Session: Semester 1 Year: 2015 Course Coordinator: Nathalie Wharton Blaga

### Course outline. Code: EMB781 Title: Managerial Business Analytics

Course outline Code: EMB781 Title: Managerial Business Analytics Faculty of Arts and Business School of Business Teaching Session: Session 7 Year: 2015 Course Coordinator: Professor Willem Selen Room:

### Course outline. Code: CMN248 Title: Creative Advertising

Faculty of: Arts and Business School of: Communication and Creative Industries Teaching Session: Semester 1 Year: 2016 Course Coordinator: Dr Kelly Choong Email: kchoong@usc.edu.au Course outline Code:

### Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

### Course Outline 2015 FINANCE 261: INTRODUCTION TO INVESTMENTS (15 POINTS) Semester 2 (1155)

Course Outline 2015 FINANCE 261: INTRODUCTION TO INVESTMENTS (15 POINTS) Semester 2 (1155) 1. Course Description This course examines markets for shares, fixed income securities, options and futures; methods

### 8 Evaluating Search Engines

8 Evaluating Search Engines Evaluation, Mr. Spock Captain Kirk, Star Trek: e Motion Picture 8.1 Why Evaluate? Evaluation is the key to making progress in building better search engines. It is also essential

### Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

### DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

### WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS?

WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS? Ricardo Campos 1, 2, 4 Gaël Dias 2, Alípio Jorge 3, 4 1 Tomar Polytechnic Institute, Tomar, Portugal 2 Centre of Human Language Tecnnology and Bioinformatics,

### Quality and Complexity Measures for Data Linkage and Deduplication

Quality and Complexity Measures for Data Linkage and Deduplication Peter Christen and Karl Goiser Department of Computer Science, Australian National University, Canberra ACT 0200, Australia {peter.christen,karl.goiser}@anu.edu.au

### ATHK1001 Analytical Thinking

ATHK1001 Analytical Thinking Unit of Study Code: ATHK1001 Coordinator: Other lecturing staff: Dr Bruce Burns Office: Room 512 Griffith Taylor Building Phone: 9351 8286 E-mail: bruce.burns@sydney.edu.au

### Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

### BSC100 Building Blocks for Science Students

BSC100 Building Blocks for Science Students Unit description Approaching science requires an understanding of the scientific method and skills. This unit facilitates student transition into university

### Mining Text Data: An Introduction

Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

### Precision and Relative Recall of Search Engines: A Comparative Study of Google and Yahoo

and Relative Recall of Engines: A Comparative Study of Google and Yahoo B.T. Sampath Kumar J.N. Prakash Kuvempu University Abstract This paper compared the retrieval effectiveness of the Google and Yahoo.

### Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

### Sentiment Analysis on Big Data

SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

### Course outline. Code: ICT311 Title: Software Development 2

Faculty of Arts and Business School of Business Teaching Session: Semester 2 Year: 2015 Course Coordinator: Dr Mark Utting Office: K2.02A Telephone: +61 7 5459 4495 Email: Utting@usc.edu.au Consultation

### STUDENT PORTAL - TURNITIN

Online STUDENT PORTAL - TURNITIN Student Manual Ver. 5 London School of Commerce & School of Business and Law IT Department 2012 1 What is new in STUDENT PORTAL? www.lsclondon.co.uk/student/studentmanual.pdf

### Text Analytics. Introduction to Information Retrieval. Ulf Leser

Text Analytics Introduction to Information Retrieval Ulf Leser Content of this Lecture Information Retrieval Introduction Documents Queries Related topics A first idea: Boolean queries and vector space

### Topics in basic DBMS course

Topics in basic DBMS course Database design Transaction processing Relational query languages (SQL), calculus, and algebra DBMS APIs Database tuning (physical database design) Basic query processing (ch

### ACST829 CAPITAL BUDGETING AND FINANCIAL MODELLING. Semester 1, 2011. Department of Applied Finance and Actuarial Studies

ACST829 CAPITAL BUDGETING AND FINANCIAL MODELLING Semester 1, 2011 Department of Applied Finance and Actuarial Studies MACQUARIE UNIVERSITY FACULTY OF BUSINESS AND ECONOMICS UNIT OUTLINE Study Period:

### On generating large-scale ground truth datasets for the deduplication of bibliographic records

On generating large-scale ground truth datasets for the deduplication of bibliographic records James A. Hammerton j_hammerton@yahoo.co.uk Michael Granitzer mgrani@know-center.at Maya Hristakeva maya.hristakeva@mendeley.com

### 2015 2016 preparatory courses foundation year

2015 2016 preparatory courses foundation year undergraduate programmes preparatory course foundation year 02 1. Level Centre: Istituto Marangoni, London Course Title: Foundation Year Programme: Foundation

### Two-sample hypothesis testing, II 9.07 3/16/2004

Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,

### USING WIRESHARK TO CAPTURE AND ANALYZE NETWORK DATA

USING WIRESHARK TO CAPTURE AND ANALYZE NETWORK DATA CPSC 441 TUTORIAL JANUARY 30, 2012 TA: RUITING ZHOU The content of these slides are taken from CPSC 526 TUTORIAL by Nashd Safa (Extended and partially

### Projektgruppe. Categorization of text documents via classification

Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

### Optimizing Search Engines using Clickthrough Data

Optimizing Search Engines using Clickthrough Data Presented by - Kajal Miyan Seminar Series, 891 Michigan state University *Slides adopted from presentations of Thorsten Joachims (author) and Shui-Lung

### HPI in-memory-based database system in Task 2b of BioASQ

CLEF 2014 Conference and Labs of the Evaluation Forum BioASQ workshop HPI in-memory-based database system in Task 2b of BioASQ Mariana Neves September 16th, 2014 Outline 2 Overview of participation Architecture

### The impact of Discussion Classes on ODL Learners in Basic Statistics Ms S Muchengetwa and Mr R Ssekuma e-mail: muches@unisa.ac.za, ssekur@unisa.ac.

1 The impact of Discussion Classes on ODL Learners in Basic Statistics Ms S Muchengetwa and Mr R Ssekuma e-mail: muches@unisa.ac.za, ssekur@unisa.ac.za Abstract Open distance learning institutions, like

### Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/

### Enhancing Quality of Data using Data Mining Method

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

### Foundation Year Description

Foundation Year Description Title: Engineering/Physics/Geophysics Foundation Year (Language Pathway A) This description provides a concise summary of the main features of the Foundation Year and the learning

### THE UNIVERSITY OF MANCHESTER Postgraduate Programme Specification

Note: To add a new row to any table sit in the last column of the last row and press the Tab key. 1. GENERAL INFORMATION Award Programme Title Duration Mode of study MSc Information Systems Engineering

### Machine Learning Final Project Spam Email Filtering

Machine Learning Final Project Spam Email Filtering March 2013 Shahar Yifrah Guy Lev Table of Content 1. OVERVIEW... 3 2. DATASET... 3 2.1 SOURCE... 3 2.2 CREATION OF TRAINING AND TEST SETS... 4 2.3 FEATURE

### Content Marketing Integration Workbook

Content Marketing Integration Workbook 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com Introduction Like the Molière character who is delighted to learn he has

### Course outline. Code: CMN120 Title: Public Relations: Contemporary Perspectives

Course outline Code: CMN120 Title: Public Relations: Contemporary Perspectives Faculty of: Arts and Business School of: and Creative Industries Teaching Session: Semester 1 Year: 2015 Course Coordinator:

### WISE Power Tutorial All Exercises

ame Date Class WISE Power Tutorial All Exercises Power: The B.E.A.. Mnemonic Four interrelated features of power can be summarized using BEA B Beta Error (Power = 1 Beta Error): Beta error (or Type II

### Non-Inferiority Tests for One Mean

Chapter 45 Non-Inferiority ests for One Mean Introduction his module computes power and sample size for non-inferiority tests in one-sample designs in which the outcome is distributed as a normal random

### Web Browsing Quality of Experience Score

Web Browsing Quality of Experience Score A Sandvine Technology Showcase Contents Executive Summary... 1 Introduction to Web QoE... 2 Sandvine s Web Browsing QoE Metric... 3 Maintaining a Web Page Library...

### BFM234 ACCOUNTING FOR NON FINANCIAL MANAGERS

BFM234 ACCOUNTING FOR NON FINANCIAL MANAGERS Academic Year 2012/13 Number of Aston Credits: 15 Number of ECTS Credits: 7.5 Staff Member Responsible for the Module: Mr Matt Davies, Finance and Accounting

### Course outline. Code: ACC211 Title: Business Finance

Faculty of: Arts and Business School of Business Teaching Session: Semester 1 Year: 2015 Course Coordinator: Gabrielle Parle Office: K2.44 Telephone: (07) 5430 1275 Email: gparle@usc.edu.au Consultation

### Natural Language Processing for Clinical Informatics and Translational Research Informatics

Natural Language Processing for Clinical Informatics and Translational Research Informatics Imre Solti, M. D., Ph. D. solti@uw.edu K99 Fellow in Biomedical Informatics University of Washington Background

### Course outline. Code: HRM210 Title: Managing Human Resources

Faculty of: Arts and Business School of Business Teaching Session: Semester 1 Year: 2015 Course Coordinator: Dr John Whiteoak Office: K1.05 Telephone: (07)5459 4809 Email: whiteoak@usc.edu.au Consultation

### INFS5991 BUSINESS INTELLIGENCE METHODS. Course Outline Semester 1, 2015

Business School School of Information Systems, Technology and Management INFS5991 BUSINESS INTELLIGENCE METHODS Course Outline Semester 1, 2015 Part A: Course-Specific Information Please consult Part B