Rolling the Dice on Big Data. Ilse Ipsen Department of Mathematics
|
|
- Emery Eaton
- 8 years ago
- Views:
Transcription
1 Rolling the Dice on Big Data Ilse Ipsen Department of Mathematics
2 The Economist, 27 February 2010
3 Science, 11 February 2011
4 McKinsey Global Institute, May 2011
5
6
7 Rolling the Dice on Big Data What is Big?
8 Measuring Units 1 byte 1 character 10 bytes 1 word 100 bytes 1 sentence 1 kilobyte = 1,000 bytes 1 page
9 1 byte 1 character 10 bytes 1 word 100 bytes 1 sentence Measuring Units 1 kilobyte = 1,000 bytes 1 page 1 megabyte = 1,000 kilobytes complete works of Shakespeare 1 gigabyte = 1,000 megabytes a big shelf full of books
10 1 byte 1 character 10 bytes 1 word 100 bytes 1 sentence Measuring Units 1 kilobyte = 1,000 bytes 1 page 1 megabyte = 1,000 kilobytes complete works of Shakespeare 1 gigabyte = 1,000 megabytes a big shelf full of books 1 terabyte = 1,000 gigabytes all books in the Library of Congress
11 1 byte 1 character 10 bytes 1 word 100 bytes 1 sentence Measuring Units 1 kilobyte = 1,000 bytes 1 page 1 megabyte = 1,000 kilobytes complete works of Shakespeare 1 gigabyte = 1,000 megabytes a big shelf full of books 1 terabyte = 1,000 gigabytes all books in the Library of Congress 1 petabyte = 1,000 terabytes 20 million 4-door filing cabinets full of text
12 1 byte 1 grain of sand
13 1 byte 1 grain of sand 1 terabyte number of grains to fill a swimming pool
14 Rolling the Dice on Big Data Data
15 Rolling the Dice on Big Data
16 Rolling the Dice on Big Data Not quite
17 The Data in this Talk Given: Database: Collection of documents (data points) Query: Single document (data point) Want: Documents closest to query A tiny example to illustrate a big data problem
18 A Tiny Data Example Database: s from known authors 1: shipment of gold damaged in a fire 2: delivery of silver arrived in a silver truck 3: shipment of gold arrived in a truck Query: from unknown author gold silver truck Which s match the query best? These s may give clues about the author of query Simplest approach for matching: Word frequency
19 Tabulating s and Query Database (term document matrix) + Query Terms Query a arrived damaged delivery fire gold in of silver shipment truck
20 Basic Approach for Finding Matching s 1 Common words For each Count number of words common to and Query
21 Basic Approach for Finding Matching s 1 Common words For each Count number of words common to and Query 2 Length Count number of words in each , and in Query
22 Basic Approach for Finding Matching s 1 Common words For each Count number of words common to and Query 2 Length Count number of words in each , and in Query 3 Matching score for each Matching score = Number of common words (Length of ) (Length of Query) s with highest matching scores: May give clues about authors of Query
23 Count Common Words in Query and 1 Terms E 1 Q Multiply a arrived damaged delivery fire gold in of silver shipment truck Sum 1 # common words in 1 and Query: E 1 Q = 1
24 Count Common Words in Query and 2 Terms E 2 Q Multiply a arrived damaged delivery fire gold in of silver shipment truck Sum 3 # common words in 2 and Query: E 2 Q = 3
25 Count Common Words in Query and 3 Terms E 3 Q Multiply a arrived damaged delivery fire gold in of silver shipment truck Sum 2 # common words in 3 and Query: E 3 Q = 2
26 Basic Approach for Finding Matching s 1 Number of words common to s and Query E 1 Q = 1 E 2 Q = 3 E 3 Q = 2 2 Length Count number of words in each , and in query
27 Length of Query Terms Q Square a 0 0 arrived 0 0 damaged 0 0 delivery 0 0 fire 0 0 gold 1 1 in 0 0 of 0 0 silver 1 1 shipment 0 0 truck 1 1 Sum 3 Length of Query: Q = 3 1.7
28 Length of 2 Terms E 2 Square a 1 1 arrived 1 1 damaged 0 0 delivery 1 1 fire 0 0 gold 0 0 in 1 1 of 1 1 silver 2 4 shipment 0 0 truck 1 1 Sum 10 Length of 2: E 2 =
29 Basic Approach for Finding Matching s 1 Number of words common to s and Query E 1 Q = 1 E 2 Q = 3 E 3 Q = 2 2 Length of s and Query Q = E 1 = E 2 = E 3 = 7 2.6
30 Matching Score for each Matching score = Number of common words (Length of ) (Length of query) E 1 Q E 1 Q = E 2 Q E 2 Q = E 3 Q E 3 Q = is the best match for the query
31 Conclusion for Tiny Data Example Database: s from known authors 1: shipment of gold damaged in a fire 2: delivery of silver arrived in a silver truck 3: shipment of gold arrived in a truck Query: from unknown author gold silver truck Best matching 2: delivery of silver arrived in a silver truck
32 The Reason for the Weird Way of Counting Vector Space Model s, Query = vectors Matching score = cosine of angle between and Query E Q = cos (E, Q) E Q
33 What this means in practice Average number of s per day: 294 billion Number words in English language: at least 250,000 Matching one query with a single 250,000 operations (one for every possible word) Matching one query with all s: 250,000 * 294 billion = operations
34 What this means in practice Average number of s per day: 294 billion Number words in English language: at least 250,000 Matching one query with a single 250,000 operations (one for every possible word) Matching one query with all s: 250,000 * 294 billion = operations Fast PC (Intel Core i7 980 XE) 109 Gflops = floating point operations per second Matching one query with all s: about 8 days
35 What this means in practice Average number of s per day: 294 billion Number words in English language: at least 250,000 Matching one query with a single 250,000 operations (one for every possible word) Matching one query with all s: 250,000 * 294 billion = operations Fast PC (Intel Core i7 980 XE) 109 Gflops = floating point operations per second Matching one query with all s: about 8 days US supercomputer (Cray XT5, Opteron quad core 2.3GHz) Peak 1,381,400 Gflops Matching one query with all s: about 1 minute
36 Can the Matching be Performed Faster?
37 Can the Matching be Performed Faster? Yes! Ralph Abbey, Sarah Warkentin, Sylvester Eriksson-Bique, Mary Solbrig, Michael Stefanelli
38 Rolling the Dice on Big Data Rolling the Dice
39 Rolling the Dice on Big Data Rolling the Dice on which words to use for the matching
40 Idea Randomized Query Matching Algorithm Do not use every word in query and s Monte Carlo Sampling: Use only selected words {Downsize to smaller database with fewer words}
41 Idea Randomized Query Matching Algorithm Do not use every word in query and s Monte Carlo Sampling: Use only selected words {Downsize to smaller database with fewer words} Justification Don t need exact matching scores Identify only s with highest matching scores Database available for offline computation Derive statistics based on word frequencies Perform query matching online Use statistics to select words used for matching
42 Suggestions for Downsizing the Database Statistics n: number of words in database Q j : frequency of word j in query W j : frequency of word j in database Suggestion for selecting word j Probability of sampling word j W j Q j p j = W 1 Q W n Q n Frequently occurring words more likely to be sampled
43 Rolling the Dice = Downsizing the Database User input s: number of samples {number of words in downsized database} Monte Carlo Sampling {Roll the dice s times} For t = 1,..., s Sample index j t from {1,..., n} with probability p jt independently and with replacement Downsized database contains only s words: word j 1, word j 2,..., word j s
44 Matching with Downsized Database Downsized database: word j 1, word j 2,..., word j s Word frequency in Query: ˆQ = ( Q j1 Q j2... Q js ) For each E:
45 Matching with Downsized Database Downsized database: word j 1, word j 2,..., word j s Word frequency in Query: ˆQ = ( Q j1 Q j2... Q js ) For each E: Word frequency Ê = ( F j1 F j2... F js )
46 Matching with Downsized Database Downsized database: word j 1, word j 2,..., word j s Word frequency in Query: ˆQ = ( Q j1 Q j2... Q js ) For each E: Word frequency Ê = ( F j1 F j2... F js ) Approximate number of words common to and Query C = 1 ( Fj1 Q j1 + F j 2 Q j2 + + F ) j s Q js s p j1 p j2 p js {s, p j1, p j2,..., p js compensate for fewer words}
47 Matching with Downsized Database Downsized database: word j 1, word j 2,..., word j s Word frequency in Query: ˆQ = ( Q j1 Q j2... Q js ) For each E: Word frequency Ê = ( F j1 F j2... F js ) Approximate number of words common to and Query C = 1 ( Fj1 Q j1 + F j 2 Q j2 + + F ) j s Q js s p j1 p j2 p js {s, p j1, p j2,..., p js compensate for fewer words} Approximate matching score of C Ê ˆQ
48 Reuters Collection: Transcribed Subset 201 documents and 5601 words Number of sampled words s = 56 1 percent Ranking Deterministic Uniform Deterministic q Bucket of computed 25 best matches contains Correct 10 best matches in 99% of all cases
49 Wikipedia Dataset 200 documents and 198,853 words Average percent of correct 10 best matches as function of sample size % correct rankings Number of samples, c Sampling 1% of the words gives correct 9 best matches. More sampling does not help a lot.
50 Big data Summary Matching queries against document database Rolling the dice But... Randomized downsizing of database vocabulary Frequently occurring words more likely to be kept
51 Big data Summary Matching queries against document database Rolling the dice Randomized downsizing of database vocabulary Frequently occurring words more likely to be kept But... Why not use a predictable (deterministic) algorithm? Why use a randomized algorithm?
52 Big data Summary Matching queries against document database Rolling the dice Randomized downsizing of database vocabulary Frequently occurring words more likely to be kept But... Why not use a predictable (deterministic) algorithm? Why use a randomized algorithm? Advantages of randomized algorithm Easy to analyze Fast, and simple to implement As good in practice as deterministic algorithm (for this type of application)
53 The Bigger Picture Many different methods for fast query matching Algorithm in this talk: Randomized matrix vector multiplication Other randomized matrix algorithms: Matrix multiplication Subset selection Least squares problems (regression) Low rank approximation (PCA) Applications for randomized algorithms: Social network analysis, population genetics, circuit testing,...
54 NSF Web National Science Foundation, 29 March 2012 Press Release NSF Leads Federal Efforts In Big Data At White House event, NSF Director announces new Big Data solicitation, $10 million Expeditions in Computing award, and awards in cyberinfrastructure, geosciences, training UC Irvine's HIPerW advances earth sc and visualization Credit and Larger Hurricane Ike visualization created by Texas Advanced Computing Center (TACC) supercomputer Ranger. Credit and Larger Version March 29, 2012
Using Patterns of Integer Exponents
8.EE.1 Know and apply the properties of integer exponents to generate equivalent numerical expressions. How can you develop and use the properties of integer exponents? The table below shows powers of
More informationClustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
More informationComputer Logic (2.2.3)
Computer Logic (2.2.3) Distinction between analogue and discrete processes and quantities. Conversion of analogue quantities to digital form. Using sampling techniques, use of 2-state electronic devices
More informationBIG DATA, BIOBANKS AND PREDICTIVE ANALYTICS FOR A BETTER CLINICAL OUTCOME
BIG DATA, BIOBANKS AND PREDICTIVE ANALYTICS FOR A BETTER CLINICAL OUTCOME Π. Ε. Βάρδας MD, PhD(London) DISCLOSURES My great love to innovative ideas BIG DATA It is a broad term for data sets, so large
More informationBuilding a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
More informationlesson 1 An Overview of the Computer System
essential concepts lesson 1 An Overview of the Computer System This lesson includes the following sections: The Computer System Defined Hardware: The Nuts and Bolts of the Machine Software: Bringing the
More informationAvailability Digest. www.availabilitydigest.com. Data Deduplication February 2011
the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements
More informationThe Procedures of Monte Carlo Simulation (and Resampling)
154 Resampling: The New Statistics CHAPTER 10 The Procedures of Monte Carlo Simulation (and Resampling) A Definition and General Procedure for Monte Carlo Simulation Summary Until now, the steps to follow
More informationBig Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs
1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE
More informationAn Introduction to Applied Mathematics: An Iterative Process
An Introduction to Applied Mathematics: An Iterative Process Applied mathematics seeks to make predictions about some topic such as weather prediction, future value of an investment, the speed of a falling
More informationPACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics
More informationCOMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)
COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1) Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University
More informationMental Questions. Day 1. 1. What number is five cubed? 2. A circle has radius r. What is the formula for the area of the circle?
Mental Questions 1. What number is five cubed? KS3 MATHEMATICS 10 4 10 Level 8 Questions Day 1 2. A circle has radius r. What is the formula for the area of the circle? 3. Jenny and Mark share some money
More informationSupervised Learning Evaluation (via Sentiment Analysis)!
Supervised Learning Evaluation (via Sentiment Analysis)! Why Analyze Sentiment? Sentiment Analysis (Opinion Mining) Automatically label documents with their sentiment Toward a topic Aggregated over documents
More informationThe Mysterious Cloud What s In It For Propane? Aaron Cargas acargas@cargas.com CargasEnergy.com Booth: 1339
The Mysterious Cloud What s In It For Propane? Aaron Cargas acargas@cargas.com CargasEnergy.com Booth: 1339 Introduction Aaron Cargas VP of Marketing and Product Development at Cargas Systems Our Company
More informationLogical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
More informationRecommending News Articles using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1
Paper 1886-2014 Recommending News s using Cosine Similarity Function Rajendra LVN 1, Qing Wang 2 and John Dilip Raj 1 1 GE Capital Retail Finance, 2 Warwick Business School ABSTRACT Predicting news articles
More informationHOW TO BECOME AN ESI HERO
HOW TO BECOME AN ESI HERO taking the mystery out of ediscovery www.fxhnd.com info@fxhnd.com Electronically Stored Information Boo! But why do I have to learn about all this technology? It s how we communicate
More informationUsing Ultra-Large Data Sets in Healthcare New Questions-New Answers
Using Ultra-Large Data Sets in Healthcare New Questions-New Answers David Hartzband, D.Sc.. Director, Technology Research, RCHN Community Health Foundation & Lecturer, Engineering Systems Division Massachusetts
More informationMATHS ACTIVITIES FOR REGISTRATION TIME
MATHS ACTIVITIES FOR REGISTRATION TIME At the beginning of the year, pair children as partners. You could match different ability children for support. Target Number Write a target number on the board.
More informationRecognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28
Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object
More informationAn open source software project that enables the distributed processing of very large data sets across multiple servers Basically:
Big Data, Fine Print Business and Legal Considerations for Companies Dealing in Data Rachel Tarko Hudson, Technology Transactions Team & Privacy Team Sheppard Mullin Richter & Hampton LLP 2015 What is
More informationApplications for Business Intelligence, Predictive Analytics and Big Data
Finance, Management, & Operations Applications for Business Intelligence, Predictive Analytics and Big Data Patrick Bogan, Chief Information Officer, Fuzion Analytics Kyle Korzenowski, Chief Information
More informationELECTRONIC DOCUMENT IMAGING
AIIM: Association for Information and Image Management. Trade association and professional society for the micrographics, optical disk and electronic image management markets. Algorithm: Prescribed set
More informationCHAPTER 2: HARDWARE BASICS: INSIDE THE BOX
CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX Multiple Choice: 1. Processing information involves: A. accepting information from the outside world. B. communication with another computer. C. performing arithmetic
More informationData Visualization for Atomistic/Molecular Simulations. Douglas E. Spearot University of Arkansas
Data Visualization for Atomistic/Molecular Simulations Douglas E. Spearot University of Arkansas What is Atomistic Simulation? Molecular dynamics (MD) involves the explicit simulation of atomic scale particles
More informationNon-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationMammoth Scale Machine Learning!
Mammoth Scale Machine Learning! Speaker: Robin Anil, Apache Mahout PMC Member! OSCON"10! Portland, OR! July 2010! Quick Show of Hands!# Are you fascinated about ML?!# Have you used ML?!# Do you have Gigabytes
More informationDetermining Your Computer Resources
Determining Your Computer Resources There are a number of computer components that must meet certain requirements in order for your computer to perform effectively. This document explains how to check
More informationWhy is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
More informationMaths Targets for pupils in Year 2
Maths Targets for pupils in Year 2 A booklet for parents Help your child with mathematics For additional information on the agreed calculation methods, please see the school website. ABOUT THE TARGETS
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationStudent Name GRADE. Mississippi Curriculum Test, Second Edition PRACTICE TEST BOOK MATHEMATICS
Student Name Mississippi Curriculum Test, Second Edition MCT2 GRADE 3 PRACTICE TEST BOOK MATHEMATICS Practice Test 3 for MCT2 is developed and under contract with the Mississippi Department of Education
More informationFlash Drives and File Management (Windows 7 and 8)
Better Technology, Onsite and Personal Connecting NIOGA s Communities www.btopexpress.org www.nioga.org [Type Flash Drives and File Management (Windows 7 and 8) Overview: This class focuses on saving,
More informationEvidence to Action: Use of Predictive Models for Beach Water Postings
Evidence to Action: Use of Predictive Models for Beach Water Postings Canadian Society for Epidemiology and Biostatistics Caitlyn Paget, June 4 th 2015 Goal is to improve program delivery Can we improve
More informationThe Green Index: A Metric for Evaluating System-Wide Energy Efficiency in HPC Systems
202 IEEE 202 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum The Green Index: A Metric
More information7 th Grade Math Foundations for Teaching Unit One: Numbers & Operations Module One: Rational Number s
Unit One: Numbers & Operations Module One: s Day/Date TEKS Activities and Resources Essential Question Aug 25 Aug 26 Aug 27 Aug 28 Aug 29 (SS) (SS) (SS) (SS) - First Day Procedures - Math Class Student
More informationThe Central Processing Unit:
The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Objectives Identify the components of the central processing unit and how they work together and interact with memory Describe how
More informationChallenges in e-science: Research in a Digital World
Challenges in e-science: Research in a Digital World Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
More informationProbabilities. Probability of a event. From Random Variables to Events. From Random Variables to Events. Probability Theory I
Victor Adamchi Danny Sleator Great Theoretical Ideas In Computer Science Probability Theory I CS 5-25 Spring 200 Lecture Feb. 6, 200 Carnegie Mellon University We will consider chance experiments with
More informationDoing Multidisciplinary Research in Data Science
Doing Multidisciplinary Research in Data Science Assoc.Prof. Abzetdin ADAMOV CeDAWI - Center for Data Analytics and Web Insights Qafqaz University aadamov@qu.edu.az http://ce.qu.edu.az/~aadamov 16 May
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationSolving Systems of Equations Introduction
Solving Systems of Equations Introduction Outcome (learning objective) Students will write simple systems of equations and become familiar with systems of equations vocabulary terms. Student/Class Goal
More informationMass Storage Structure
Mass Storage Structure 12 CHAPTER Practice Exercises 12.1 The accelerating seek described in Exercise 12.3 is typical of hard-disk drives. By contrast, floppy disks (and many hard disks manufactured before
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationCalculating VaR. Capital Market Risk Advisors CMRA
Calculating VaR Capital Market Risk Advisors How is VAR Calculated? Sensitivity Estimate Models - use sensitivity factors such as duration to estimate the change in value of the portfolio to changes in
More informationData-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
More informationSolving Systems of Linear Equations Putting it All Together
Solving Systems of Linear Equations Putting it All Together Outcome (lesson objective) Students will determine the best method to use when solving systems of equation as they solve problems using graphing,
More informationQueuing Theory. Long Term Averages. Assumptions. Interesting Values. Queuing Model
Queuing Theory Queuing Theory Queuing theory is the mathematics of waiting lines. It is extremely useful in predicting and evaluating system performance. Queuing theory has been used for operations research.
More informationWhat is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume.
8/26/2014 CS581 Big Data - Fall 2014 1 8/26/2014 CS581 Big Data - Fall 2014 2 CS535/CS581A BIG DATA What is Big Data? PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA 2. COURSE INTRODUCTION PART 0. INTRODUCTION
More informationA Catalogue of the Steiner Triple Systems of Order 19
A Catalogue of the Steiner Triple Systems of Order 19 Petteri Kaski 1, Patric R. J. Östergård 2, Olli Pottonen 2, and Lasse Kiviluoto 3 1 Helsinki Institute for Information Technology HIIT University of
More informationCAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
More informationFEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
More informationDefinition of Computers. INTRODUCTION to COMPUTERS. Historical Development ENIAC
Definition of Computers INTRODUCTION to COMPUTERS Bülent Ecevit University Department of Environmental Engineering A general-purpose machine that processes data according to a set of instructions that
More informationTo convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:
Binary Numbers In computer science we deal almost exclusively with binary numbers. it will be very helpful to memorize some binary constants and their decimal and English equivalents. By English equivalents
More informationCongrats to Game Winners. How can computation use data to solve problems? What topics have we covered in CS 202? Part 1: Completed!
CS 202: Introduction to Computation " UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department Professor Andrea Arpaci-Dusseau How can computation use data to solve problems? Congrats to Game Winners
More informationMath Journal HMH Mega Math. itools Number
Lesson 1.1 Algebra Number Patterns CC.3.OA.9 Identify arithmetic patterns (including patterns in the addition table or multiplication table), and explain them using properties of operations. Identify and
More informationIntroduction to Fractions, Equivalent and Simplifying (1-2 days)
Introduction to Fractions, Equivalent and Simplifying (1-2 days) 1. Fraction 2. Numerator 3. Denominator 4. Equivalent 5. Simplest form Real World Examples: 1. Fractions in general, why and where we use
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationTorgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances
Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances It is possible to construct a matrix X of Cartesian coordinates of points in Euclidean space when we know the Euclidean
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationIntroduction to the Mathematics of Big Data. Philippe B. Laval
Introduction to the Mathematics of Big Data Philippe B. Laval Fall 2015 Introduction In recent years, Big Data has become more than just a buzz word. Every major field of science, engineering, business,
More informationBenchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
More informationBig Data in OpenTopography
Big Data in OpenTopography Vishu Nandigam San Diego Supercomputer Center NSF Big Data in Educa
More informationPromises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in
More informationSearch engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
More informationEM Clustering Approach for Multi-Dimensional Analysis of Big Data Set
EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationDatabase Fundamentals
Database Fundamentals Computer Science 105 Boston University David G. Sullivan, Ph.D. Bit = 0 or 1 Measuring Data: Bits and Bytes One byte is 8 bits. example: 01101100 Other common units: name approximate
More informationMehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics
INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree
More informationHow to report the percentage of explained common variance in exploratory factor analysis
UNIVERSITAT ROVIRA I VIRGILI How to report the percentage of explained common variance in exploratory factor analysis Tarragona 2013 Please reference this document as: Lorenzo-Seva, U. (2013). How to report
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationRECORDS & INFORMATION MANAGEMENT
RECORDS & INFORMATION MANAGEMENT GOVERNANCE & POLICY DEVELOPMENT, WORKING WITH STAKEHOLDERS AND POLICY IMPLEMENTATION PRESENTED BY LAUREN BARNES, CRM SPONSORS: THE ARCHIVISTS ROUND TABLE NEW YORK & ARMA
More informationWhat Is Singapore Math?
What Is Singapore Math? You may be wondering what Singapore Math is all about, and with good reason. This is a totally new kind of math for you and your child. What you may not know is that Singapore has
More informationParallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
More informationBeyond "Big data": Introducing the EOI framework for analytics teams to drive business impact
Beyond "Big data": Introducing the EOI framework for analytics teams to drive business impact Michael Li Business Analytics, LinkedIn Nov 20, 2014 J onathan Wu raveen Neppalli Naga P Chi-Yi Kuan Business
More informationPURSUITS IN MATHEMATICS often produce elementary functions as solutions that need to be
Fast Approximation of the Tangent, Hyperbolic Tangent, Exponential and Logarithmic Functions 2007 Ron Doerfler http://www.myreckonings.com June 27, 2007 Abstract There are some of us who enjoy using our
More informationNF5-12 Flexibility with Equivalent Fractions and Pages 110 112
NF5- Flexibility with Equivalent Fractions and Pages 0 Lowest Terms STANDARDS preparation for 5.NF.A., 5.NF.A. Goals Students will equivalent fractions using division and reduce fractions to lowest terms.
More informationCSCA0102 IT & Business Applications. Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global
CSCA0102 IT & Business Applications Foundation in Business Information Technology School of Engineering & Computing Sciences FTMS College Global Chapter 2 Data Storage Concepts System Unit The system unit
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More informationChapter 3: Computer Hardware Components: CPU, Memory, and I/O
Chapter 3: Computer Hardware Components: CPU, Memory, and I/O What is the typical configuration of a computer sold today? The Computer Continuum 1-1 Computer Hardware Components In this chapter: How did
More informationOptimization of Preventive Maintenance Scheduling in Processing Plants
18 th European Symposium on Computer Aided Process Engineering ESCAPE 18 Bertrand Braunschweig and Xavier Joulia (Editors) 2008 Elsevier B.V./Ltd. All rights reserved. Optimization of Preventive Maintenance
More informationPrime Time: Homework Examples from ACE
Prime Time: Homework Examples from ACE Investigation 1: Building on Factors and Multiples, ACE #8, 28 Investigation 2: Common Multiples and Common Factors, ACE #11, 16, 17, 28 Investigation 3: Factorizations:
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationTHE ROSEN MARKET TIMING LETTER
THE ROSEN MARKET TIMING LETTER PRECIOUS METALS - FOREX - STOCK INDICES - COMMODITIES www.deltasociety.com/product/rosen-market-timing-letter RONALD L. ROSEN June 10, 2016 REPORT AND REVIEW GOLD AND SILVER
More informationStat 20: Intro to Probability and Statistics
Stat 20: Intro to Probability and Statistics Lecture 16: More Box Models Tessa L. Childers-Day UC Berkeley 22 July 2014 By the end of this lecture... You will be able to: Determine what we expect the sum
More informationIntroduction To Computers: Hardware and Software
What Is Hardware? Introduction To Computers: Hardware and Software A computer is made up of hardware. Hardware is the physical components of a computer system e.g., a monitor, keyboard, mouse and the computer
More informationChapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World
Chapter 4 System Unit Components Discovering Computers 2012 Your Interactive Guide to the Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationThe following are general terms that we have found being used by tenants, landlords, IT Staff and consultants when discussing facility space.
The following are general terms that we have found being used by tenants, landlords, IT Staff and consultants when discussing facility space. Terminology: Telco: Dmarc: NOC: SAN: GENSET: Switch: Blade
More informationDay 1. Mental Arithmetic Questions. 1. What number is five cubed? 2. A circle has radius r. What is the formula for the area of the circle?
Mental Arithmetic Questions 1. What number is five cubed? KS3 MATHEMATICS 10 4 10 Level 6 Questions Day 1 2. A circle has radius r. What is the formula for the area of the circle? 3. Jenny and Mark share
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More informationText Analytics using High Performance SAS Text Miner
Text Analytics using High Performance SAS Text Miner Edward R. Jones, Ph.D. Exec. Vice Pres.; Texas A&M Statistical Services Abstract: The latest release of SAS Enterprise Miner, version 13.1, contains
More informationThe 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation. *solve: to find a solution, explanation, or answer for
The 5 P s in Problem Solving 1 How do other people solve problems? The 5 P s in Problem Solving *prob lem: a source of perplexity, distress, or vexation *solve: to find a solution, explanation, or answer
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationIP Video Rendering Basics
CohuHD offers a broad line of High Definition network based cameras, positioning systems and VMS solutions designed for the performance requirements associated with critical infrastructure applications.
More informationMain Memory & Backing Store. Main memory backing storage devices
Main Memory & Backing Store Main memory backing storage devices 1 Introduction computers store programs & data in two different ways: nmain memory ntemporarily stores programs & data that are being processed
More information