Survey of the Mathematics of Big Data



Similar documents
Math Courses Available After College Algebra. nsm.uh.edu

Consolidation,Work,Group, Final,Report!

G C.3 Construct the inscribed and circumscribed circles of a triangle, and prove properties of angles for a quadrilateral inscribed in a circle.

Allocation of Mathematics Modules at Maynooth to Areas of Study for PME (Professional Masters in Education)

If n is odd, then 3n + 7 is even.

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

Applied Mathematics and Mathematical Modeling

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Credit Number Lecture Lab / Shop Clinic / Co-op Hours. MAC 224 Advanced CNC Milling MAC 229 CNC Programming

Image Compression through DCT and Huffman Coding Technique

So which is the best?

Tim Hsu. Updated Fall 2012

Applications to Data Smoothing and Image Processing I

HANDBOOK FOR THE APPLIED AND COMPUTATIONAL MATHEMATICS OPTION. Department of Mathematics Virginia Polytechnic Institute & State University

Paper 2 Revision. (compiled in light of the contents of paper1) Higher Tier Edexcel

4.1. Title: data analysis (systems analysis) Annotation of educational discipline: educational discipline includes in itself the mastery of the

Prentice Hall Algebra Correlated to: Colorado P-12 Academic Standards for High School Mathematics, Adopted 12/2009

How mathematicians deal with new trends like big data, etc.

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Figure 1: Relation between codec, data containers and compression algorithms.

CS Introduction to Data Mining Instructor: Abdullah Mueen

Introduction to the Mathematics of Big Data. Philippe B. Laval

Zero: If P is a polynomial and if c is a number such that P (c) = 0 then c is a zero of P.

ISU Department of Mathematics. Graduate Examination Policies and Procedures

Conceptual similarity to linear algebra

The Fourth International DERIVE-TI92/89 Conference Liverpool, U.K., July Derive 5: The Easiest... Just Got Better!

Curriculum Map by Block Geometry Mapping for Math Block Testing August 20 to August 24 Review concepts from previous grades.

Student Activity: To investigate an ESB bill

Learning mathematics Some hints from the psychologists Examples:

Sparsity-promoting recovery from simultaneous data: a compressive sensing approach

When is missing data recoverable?

Mathematics. Mathematics MATHEMATICS Sacramento City College Catalog. Degree: A.S. Mathematics AS-T Mathematics for Transfer

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

Diablo Valley College Catalog

Introduction to Medical Imaging. Lecture 11: Cone-Beam CT Theory. Introduction. Available cone-beam reconstruction methods: Our discussion:

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

School of Mathematics, Computer Science and Engineering. Mathematics* Associate in Arts Degree COURSES, PROGRAMS AND MAJORS

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

Zeros of Polynomial Functions

Sequence of Mathematics Courses

Spectrum Level and Band Level

Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)

Geometry Course Summary Department: Math. Semester 1

International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XXXIV-5/W10

ANALYSIS OF THE EFFECTIVENESS IN IMAGE COMPRESSION FOR CLOUD STORAGE FOR VARIOUS IMAGE FORMATS

(2) (3) (4) (5) 3 J. M. Whittaker, Interpolatory Function Theory, Cambridge Tracts

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Prerequisite: High School Chemistry.

Copyrighted Material. Chapter 1 DEGREE OF A CURVE

Basic Math for the Small Public Water Systems Operator

FIELDS-MITACS Conference. on the Mathematics of Medical Imaging. Photoacoustic and Thermoacoustic Tomography with a variable sound speed

Graph Based Processing of Big Images

This material is intended for use by undergraduate students who are contemplating taking a course with me.

Resolution Enhancement of Photogrammetric Digital Images

Big Ideas in Mathematics

COLLEGE ALGEBRA IN CONTEXT: Redefining the College Algebra Experience

The following is a reprint of an interview with

Mathematics (MAT) MAT 061 Basic Euclidean Geometry 3 Hours. MAT 051 Pre-Algebra 4 Hours

Common Core Unit Summary Grades 6 to 8

The Applied and Computational Mathematics (ACM) Program at The Johns Hopkins University (JHU) is

SEMESTER PLANS FOR MATH COURSES, FOR MAJORS OUTSIDE MATH.

EASTERN ARIZONA COLLEGE Differential Equations

Welcome to Math 7 Accelerated Courses (Preparation for Algebra in 8 th grade)

DEPARTMENT of CHEMISTRY AND BIOCHEMISTRY

SAN DIEGO COMMUNITY COLLEGE DISTRICT CITY COLLEGE ASSOCIATE DEGREE COURSE OUTLINE

N Q.3 Choose a level of accuracy appropriate to limitations on measurement when reporting quantities.

Prentice Hall: Middle School Math, Course Correlated to: New York Mathematics Learning Standards (Intermediate)

Derive 5: The Easiest... Just Got Better!

You know from calculus that functions play a fundamental role in mathematics.

High School Algebra Reasoning with Equations and Inequalities Solve equations and inequalities in one variable.

How to write a technique essay? A student version. Presenter: Wei-Lun Chao Date: May 17, 2012

ON THE DEGREES OF FREEDOM OF SIGNALS ON GRAPHS. Mikhail Tsitsvero and Sergio Barbarossa

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

GEOFF DIESTEL, PHD. Assistant Professor at Texas A&M-Central Texas(TAMUCT) from 2011-present.

Computer and Systems Engineering (CSE) Master of Science Programs

Conceptual Framework Strategies for Image Compression: A Review

Math 241, Exam 1 Information.

Course overview Processamento de sinais 2009/10 LEA

Math at a Glance for April

Hybrid Lossless Compression Method For Binary Images

November UNIT SELECTION GUIDE Bachelor of Education (Secondary: Mathematics) and Bachelor of Science

Essential Mathematics for Computer Graphics fast

An Application of Analytic Geometry to Designing Machine Parts--and Dresses

PROPOSAL FOR ELECTRICAL ENGINEERING MAJOR

Signal and Information Processing

What s New In Maple. Maple 17: By the Numbers. The Möbius Project TM. Embedded Video. new commands for mathematical problem-solving

Core Maths C2. Revision Notes

Similarity and Diagonalization. Similar Matrices

MATHEMATICS Department Chair: Michael Cordova - mccordova@dcsdk12.org

Introduction to image coding

ELECTRICAL ENGINEERING

Algebra I Credit Recovery

MATHEMATICS FOR ENGINEERING BASIC ALGEBRA

A Robust and Lossless Information Embedding in Image Based on DCT and Scrambling Algorithms

MATH 2 Course Syllabus Spring Semester 2007 Instructor: Brian Rodas

Stable Signal Recovery from Incomplete and Inaccurate Measurements

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

Transcription:

Survey of the Mathematics of Big Data Philippe B. Laval KSU September 12, 2014 Philippe B. Laval (KSU) Math & Big Data September 12, 2014 1 / 23

Introduction We survey some mathematical techniques used with Big Data. The goal here is to make you aware of these techniques rather than giving you detail about them. That task would take several semesters for each technique. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 2 / 23

Outline 1 How Big is Big Data? 2 How Can Mathematics Help Types of Data Mathematics to the Rescue! 3 Acquisition of Data Information Content Compressed Sensing 4 Data Analysis 5 Conclusion High Dimensional Data Imaging Data Philippe B. Laval (KSU) Math & Big Data September 12, 2014 3 / 23

How Big is Big Data? We live in a digital world, which generates a lot of data. New technologies produce enormous amount of data. We are gathering more data than ever, even from old technologies. Problem: Acquisition/storage, analysis and transmission of data. Total data generated > total storage. Increase in generation rate >> increase in communication rate. Analysis can be very complex. Problem: Data is noisy, unstructured and dynamic. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 4 / 23

How Can Mathematics Help? One answer is given in a video from a previous lecture, remembering that many of the skills mathematicians have are needed in programming. But also, mathematics......allows us to formalize both the data and the problem....provides a big "chest" of tools or methodologies....allows validation of the methodologies (proof of functionality). Philippe B. Laval (KSU) Math & Big Data September 12, 2014 5 / 23

How Can Mathematics Help? - Types of Data Signals. Can be represented by a function f : R R Philippe B. Laval (KSU) Math & Big Data September 12, 2014 6 / 23

How Can Mathematics Help? - Types of Data Images. Can be represented by a function f : R 2 R Philippe B. Laval (KSU) Math & Big Data September 12, 2014 7 / 23

How Can Mathematics Help? - Types of Data Data on manifolds. Can be represented by a function f : S 2 R Philippe B. Laval (KSU) Math & Big Data September 12, 2014 8 / 23

Acquisition of Data As noted above, the main problem is the size of the data. However, this cannot be solved by increasing storage capacity. Possible solutions: Search for better ways to compress. Acquire less data. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 9 / 23

Acquisition of Data - Better Ways to Compress Data Compression saves storage but also decreases bandwidth. General ideas behind compression: Take advantage of redundant information. Only keep the most relevant information. Einstein: "Not everything that can be counted counts, and not everything hat counts can be counted." Philippe B. Laval (KSU) Math & Big Data September 12, 2014 10 / 23

Acquisition of Data - Better Ways to Compress The standard for data compression was JPEG. It is based on the discrete cosine transform (DCT) (see Fourier analysis, Fourier transform, fast Fourier transform (FFT) and the discrete fast Fourier transform (DFFT). It expresses the image in terms of building blocks, the functions cos [ π n ( i + 1 2 ) k ]. Definition The DCT is a linear, invertible function f : R n R n. It transforms the numbers (x 0, x 1,..., x n 1 ) into (X 0, X 1,..., X n 1 ) by the formula n 1 [ ( π X k = x i cos i + 1 ) ] k n 2 i=0 for k = 0, 1,..., n 1 Philippe B. Laval (KSU) Math & Big Data September 12, 2014 11 / 23

Acquisition of Data - Better Ways to Compress The latest standard for data compression is JPEG2000. It is similar to JPEG, but it uses wavelets instead of the discrete cosine transform. The branch of mathematics which studies wavelets is called Harmonic Analysis. Like the DCT, wavelets decompose the image in terms of building blocks. But the building blocks are not limited to the cosine function. Both the DCT and wavelets use the divide and conquer method. Decompose an image in terms of its building blocks. Only keep the most relevant building blocks. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 12 / 23

Acquisition of Data - Better Ways to Compress Problem: The raw data still has to be stored first Solution: Compressed sensing (Donoho, Candès, Romberg, Tao (2006) Philippe B. Laval (KSU) Math & Big Data September 12, 2014 13 / 23

Acquisition of Data - Compressed Sensing Main idea: Goal: Assumption: Only a small percentage of the data to be captured is really relevant (sparse data). Problem: We don t know which. Solution: Use random building blocks. This is pushing the sampling theorem beyond its limits! Goal: To capture a signal with as few points as possible. Thus the raw data will be already compressed. Requires a lot of linear algebra, solving sparse systems. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 14 / 23

Applications of Compressed Sensing Business Compression Astronomy Radar Biology Communications Imaging Science Geology Medicine Information theory Remote sensing Optics Philippe B. Laval (KSU) Math & Big Data September 12, 2014 15 / 23

Data Analysis There are many techniques, statistical and mathematical which already exist. Problem: The analysis maybe be too complex for current computing power. Data can be noisy, unstructured and dynamic. However, these problems cannot be simply solved by more computing power. Possible solutions include: Graph Theory. TDA (Topological Data Analysis). Philippe B. Laval (KSU) Math & Big Data September 12, 2014 16 / 23

Data Analysis - High-Dimensional Data Strategy: Detection of inner structure. Reduction of dimension Examples: A circle can be approximated by lines. A sphere can be approximated by flat surfaces. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 17 / 23

Data Analysis - High-Dimensional Data Philippe B. Laval (KSU) Math & Big Data September 12, 2014 18 / 23

Data Analysis - High-Dimensional Data Question: How do we identify inner structure? By using topology. The link will open a website. Scroll down to play the video. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 19 / 23

Data Analysis - Imaging Data Tasks: Removing Noise. Pattern recognition. Reconstructing missing data.... Tools: Statistics. Fourier and harmonic analysis. Partial differential equations. Compressed sensing. Linear Algebra. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 20 / 23

Conclusion Big Data is here to stay. We need to tame it! The techniques presented are fairly diffi cult and require a lot knowledge in various branches of mathematics, statistics and science. Not everybody needs to be a mathematician. But chances are you will work with one. Knowing at least some mathematics will make the communication and the teamwork easier. Questions? Philippe B. Laval (KSU) Math & Big Data September 12, 2014 21 / 23

Some Questions 1 Name some new technologies which generate data which were not mentioned during the talk. 2 Give examples of unstructured data not mentioned during the talk. 3 Why is removing noise from data (images as well as other data) important? 4 Give applications of pattern recognition in images. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 22 / 23

Online Articles to Read 1 The Mathematical Shape of Things to Come from Quanta Magazine. 2 The Real Secret to Unlocking Big Data? Math. 3 The New Shape of Big Data. 4 New Mathematical Method May Help Tame Big Data from Communications of the ACM. 5 Better Way to Make Sense of Big Data from Science Daily. Philippe B. Laval (KSU) Math & Big Data September 12, 2014 23 / 23