What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume.

Size: px
Start display at page:

Download "What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume."

Transcription

1 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall CS535/CS581A BIG DATA What is Big Data? PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA 2. COURSE INTRODUCTION PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA Sangmi Lee Pallickara Computer Science, Colorado State University 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Big Data Things one can do at a large scale that cannot be done at a smaller one To extract new insights Create new forms of values Big Data is about analytics of huge quantities of data in order to infer probabilities Big Data is NOT about trying to teach a computer to think like humans Providing a quantitative dimension it never had before The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume around 1,200 exabytes Voluminous Only 2 percent is non-digital It does not have to be certain number of petabytes or quantity. Velocity How fast the data is coming in? How fast you need to be able to analyze and utilize it Variety Number of sources or incoming vectors Real-time data analysis Veracity Can you trust the data itself, source of the data, or the process? User entry errors, redundancy, corruption of the values Data cleaning 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Use all of the data, or just a little? 1/3 John Graunt s population example Instead of counting every person in London, he inferred the population size using sampling Conducting censuses Costly and time-consuming The US Constitution mandated one every decade As the growing country measured itself in millions Use all of the data, or just a little? 2/3 By the late nineteenth century, even that was proving problematic Data outstripped the Census Bureau s ability to keep up. The 1880 census took 8 years to complete The information was obsolete even before it became available! The 1890 census was estimated to require a full 13 years 1

2 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Use all of the data, or just a little? 3/3 In 2009 a new flu virus was discovered. The Census Bureau contracted with Herman Hollerith An American inventor used his idea of punch cards and tabulation machines for the 1890 census Reduced duration to less than one year Methods of acquiring and analyzing big data is very expensive Image of the newly identified H1N1 influenza virus, taken in the CDC Influenza Laboratory. Source: 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall H1N1 virus Combines elements of the viruses that causes bird flu and swine flu Some projected outbreak to be on the scale of the 1918 Spanish flu that had: Infected half a billion people Killed tens of millions In the United States, The Centers for Disease Control and Prevention (CDC) Requested doctors inform patients of new flu cases Tabulated the numbers once a week What will be the problems with this system? No vaccine was ready Public health authorities needed to know where it already was. 8/26/2014 CS581 Big Data - Fall Detecting influenza epidemics using search engine query data -1/2 J. Ginsberg, et al., Detecting influenza epidemics using search engine query data Nature 457 pp ~ 1014, February nature07634.html 8/26/2014 CS581 Big Data - Fall Detecting influenza epidemics using search engine query data -2/2 Predict the spread of the winter flu in the United States Nationally Specific regions and states Looking at what people were searching for on the Internet Google More than three billion search queries every day Saves all the queries 50 million most common search terms Americans type and compared the list with CDC data On the spread of seasonal flu between 2003 and 2008 Identify areas infected by the flu virus by what people searched for on the Internet 2

3 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall google.org: Flu Trends Example More information How did they build a prediction model? (1/3) Aggregating historical logs of online web search queries Between 2003 and 2008 Computed a time series of weekly counts for 50 million of the most common search queries in the US Separate aggregate weekly counts were kept for every query in each state Simple model that estimates the probability that a random physician visit in a particular region is related to an influenza-like illness(ili) 8/26/2014 CS581 Big Data - Fall How did they build a prediction model? (2/3) The probability that a random search query submitted from the same region is ILI-related 8/26/2014 CS581 Big Data - Fall How did they build a prediction model? (3/3) Each of the 50 million candidate queries was separately tested To identify the search queries that could most accurately model the CDC ILI visit percentage in each region logit(i(t)) = αlogit(q(t))+ε where: I(t) is the percentage of ILI physician visits at time t Q(t) is the ILI-related query fraction at time t α is the multiplicative coefficient ε is the error term logit(p) = ln(p/(1 p)) Generated the highest scoring search queries 8/26/2014 CS581 Big Data - Fall Evaluation of combining top-scoring queries Maximal performance at estimating out-of sample points during cross-validation was obtained by summing top 45 search queries J Ginsberg et al. Nature 000, 1-3 (2008) doi: /nature07634 After adding query 81 oscar nominations 8/26/2014 CS581 Big Data - Fall 2014 In the top 100, 18 topics like high school basket ball was not Included. It coincided with influenza season in the US Most co-related queries Search query topic Top 45 queries Next 55 queries n Weighted n Weighted Influenza complication Cold/flu remedy General influenza symptoms Term for influenza Specific influenza symptom Symptoms of an influenza complication Antibiotic medication remedies General influenza Symptoms of a related disease Antiviral medication Related disease Unrelated to influenza Total The top 45 queries were used in the final model; the next 55 queries are presented for comparison purposes. The number of queries in each topic is indicated, as well as queryvolume-weighted counts, reflecting the relative frequency of queries in each topic. 3

4 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Data sources Public dataset from the CDC s Influenza Sentinel Provider Surveillance Network Average percentage of all outpatient visits to sentinel providers that were ILI-related Query submitted to the Google search engine 2003 ~ 2008 What makes this work innovative? There have been other approaches: CDC s weekly report Call volume to telephone triage advice lines Tracking over-the-counter drug sales Online activity for influenza surveillance Search queries submitted to a Swedish medical website Tracking search keys ( flu, or influenza ) in Yahoo search queries What is the innovation in Google s approach? Google processes more than 24 Petabytes of data per day 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Related research areas Storage systems How can we efficiently resolve queries on massive amounts of input data? The input dataset may be presented in the form of a distributed data stream Machine learning How can we efficiently solve large-scale machine learning problems? The input data may be massive; stored in a distributed cluster of machines Distributed computing How can we efficiently solve large-scale optimization problems in distributed computing environments? For example, how can we efficiently solve large-scale combinatorial problems, e.g. processing of large scale graphs? PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA 2. COURSE INTRODUCTION PART 0. INTRODUCTION 2. COURSE INTRODUCTION Sangmi Lee Pallickara Computer Science, Colorado State University CS581A BIG DATA 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Goal of this course Understanding fundamental concepts in Big Data Analytics Learn about existing technologies and how to apply them Related courses Machine Learning/Statistics Distributed Systems Database Systems Computer Communications Computer Security 4

5 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall When is the best time to take this course? About me Research area Big Data Storage, retrievals, analytics and visualization Projects Galileo Distributed data storage system for large scale geospatial time-series datasets Mendel Genomic data storage system GeoLens Visual Analytics 8/26/2014 CS581 Big Data - Fall Galileo: Big Picture 8/26/2014 CS581 Big Data - Fall Galileo: key features Data is voluminous Outpaces what is available on a single hard drive Storage must be over a collection of machines Avoid central coordinators Cope with failures Preserve data locality without introducing storage imbalances And the accompanying query hotspots Support range queries and fast ingest of new data Support for large numbers (10 9 ) of small files High throughput storage and retrieval Data is multidimensional with multiple types Time-series data Support for exact match and range queries (with wildcards) along multiple dimensions Support for multiple data formats netcdf, BUFR, HDF 4/5, and data from the Defense Meteorological Satellite Program 8/26/2014 CS581 Big Data - Fall Galileo Design Considerations 8/26/2014 CS581 Big Data - Fall Distributed Hash Tables (DHTs) Symmetric storage nodes No special-function or controller nodes Storage and retrievals may go to any node, and will be forwarded to the targeted node(s) Incremental scale-up Failure-resiliency These are decentralized systems Each node maintains information about a subset of nodes Each node uses its local information to reach any other node in the system Typically used as <key, value> stores 5

6 8/26/2014 CS581 Big Data - Fall Storage Throughput Block is about 8 KB of data 56,000 blocks per second in a system with 48-nodes 8/26/2014 CS581 Big Data - Fall Communications Course Website Announcements: Check the course website at least twice a week. Schedule (course materials, readings, assignments) Policies RamCT Assignment submission Discussion board Grades Contact Instructor sangmi@cs Office hour: T, TH 11:00AM ~ noon and by appointment Office: CSB456 URL: 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Course Structure Programming Assignments Assignment 2 Part 0 Part 1 Part 2 Part 3 Part 0 Part 1 Part 2 Part 3 Introduction to Big Data Data Storage models Scalable algorithms and Big Data Analytics Frameworks of Big Data Analytics Introduction to Big Data Data Storage models Scalable algorithms and Big Data Analytics Frameworks of Big Data Analytics Assignment 1 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Term Project-1/2 Phase 1. Motivating articles Read 3 motivating articles and summarize them Phase 2. Term project proposal Choose your topic Challenging Big Data topics e.g. Simple distributed storage systems, N-gram analysis engine Big Data challenges in your research domain e.g. Simple analytical tool for large scale data analysis in bioinformatics Plan your project Software design Functional requirements Timeline Evaluation method Team/Individual interview Term Project-2/2 Phase 3. Final submission/demo Description and code Phase 4. Presentation You may work on the term project (Phase 2 ~ 4) individually or as a team (only two member teams will be allowed) You should submit phase 1 individually. Group efforts must be correspondingly bigger If you are working with a teammate, please your team by September 8. 6

7 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Term Project Grading Phase 1 (Motivating articles): 10% Phase 2 (Proposal): 25% Phase 3 (Final submission/demo): 40% Phase 4 (Presentation): 25% Course Timeline Phase 1. September 3 Phase 2. PA1 PA2 Phase 4. September 17 October 6 November 5 Phase 3. Week 16 December 2 August September October November December Midterm I October 2 Midterm 2 December 4 Phase 1. description has been posted. 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Quizzes and Participation There will be 10+ pop quizzes in class 1-2 simple questions Open-book, open-material, not open-internet Minute survey Grading Policy Category Points Percentage of Final Grade Programming Assignments (15% x 2) % Term Project % Quizzes and participation % Midterm Exams (15% x 2) % 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Final Letter Grade Course Policy Letter Grade A Total Percentage % and higher No make-up for missed exams Except in extraordinary circumstances (e.g., major illness, family emergency) B ~ % C ~ % D ~ % No make-up for missed quizzes Except for the case where student provided an advance notice to the instructor based on an emergency F Below % 7

8 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall Course Policy No Cell-phones in the class. No Laptops during the exam, and quiz. If you need to use a laptop, please sit in the back row. Course Policy: Assignments Assignment submission RamCT Late policy for the assignments Up to a maximum of 2 day past the deadline. 10% penalty per day will be applied I will ask you to turn off your laptop if needed. Grading will be done by interview. 8/26/2014 CS581 Big Data - Fall /26/2014 CS581 Big Data - Fall More importantly, Questions? Attend the class, ask questions, and discuss Check the course web page and RamCT regularly Try new technologies and apply them Share your experiences with other students in class 8

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 10 March 6 th, March 12 th, 2016

Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 10 March 6 th, March 12 th, 2016 Ohio Department of Health Seasonal Influenza Activity Summary MMWR Week 10 March 6 th, March 12 th, 2016 Current Influenza Activity: Current Ohio Activity Level (Geographic Spread) - Widespread Definition:

More information

Big Data and Analytics (Fall 2015)

Big Data and Analytics (Fall 2015) Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs 1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

PREPARING FOR A PANDEMIC. Lessons from the Past Plans for the Present and Future

PREPARING FOR A PANDEMIC. Lessons from the Past Plans for the Present and Future PREPARING FOR A PANDEMIC Lessons from the Past Plans for the Present and Future Pandemics Are Inevitable TM And their impact can be devastating 1918 Spanish Flu 20-100 million deaths worldwide 600,000

More information

Influenza Surveillance Weekly Report CDC MMWR Week 16: Apr 17 to 23, 2016

Influenza Surveillance Weekly Report CDC MMWR Week 16: Apr 17 to 23, 2016 Office of Surveillance & Public Health Preparedness Program of Public Health Informatics Influenza Surveillance Weekly Report CDC MMWR Week 16: Apr 17 to 23, 2016 Influenza Activity by County, State, and

More information

Internet Search Activity & Crowdsourcing

Internet Search Activity & Crowdsourcing Novel Data Sources for Disease Surveillance and Epidemiology: Internet Search Activity & Crowdsourcing Sumiko Mekaru, DVM, MPVM, MLIS HealthMap, Boston Children s Hospital American College of Epidemiology

More information

COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks

COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks COS 116 The Computational Universe Laboratory 9: Virus and Worm Propagation in Networks You learned in lecture about computer viruses and worms. In this lab you will study virus propagation at the quantitative

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace what is this class about? health informatics managing and making sense of biomedical information but mostly from an

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Estimating PageRank Values of Wikipedia Articles using MapReduce

Estimating PageRank Values of Wikipedia Articles using MapReduce Estimating PageRank Values of Wikipedia Articles using MapReduce Due: Sept. 30 Wednesday 5:00PM Submission: via Canvas, individual submission Instructor: Sangmi Pallickara Web page: http://www.cs.colostate.edu/~cs535/assignments.html

More information

Preparing for the consequences of a swine flu pandemic

Preparing for the consequences of a swine flu pandemic Preparing for the consequences of a swine flu pandemic What CIGNA is Doing To help ensure the health and well-being of the individuals we serve, CIGNA is implementing its action plan to prepare for the

More information

Understanding Neo4j Scalability

Understanding Neo4j Scalability Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER

PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER PUBLIC HEALTH MEETS SOCIAL MEDIA: MINING HEALTH INFO FROM TWITTER Michael Paul (@mjp39) Johns Hopkins University Crowdsourcing and Human Computation Lecture 18 Learning about the real world through Twitter

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data

More information

Major Public Health Threat

Major Public Health Threat GAO United States General Accounting Office Testimony Before the Committee on Government Reform, House of Representatives For Release on Delivery Expected at 10:00 a.m. EST Thursday, February 12, 2004

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Plumas County Public Health Agency. Preparing the Community for Public Health Emergencies

Plumas County Public Health Agency. Preparing the Community for Public Health Emergencies Plumas County Public Health Agency Preparing the Community for Public Health Emergencies Safeguarding Your Investment Local businesses have invested significant time and resources into being successful.

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Why is Internal Audit so Hard?

Why is Internal Audit so Hard? Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

?????? Data Analytics

?????? Data Analytics ?????? Data Analytics Prof. Dr.-Ing. Lars Linsen Prof. Dr. Adalbert FX Wilhelm Fall 2015 0. Organizational Stuff 0.1 Syllabus and Organization Data Analytics 3 Course website http://www.faculty.jacobsuniversity.de/llinsen/teaching/??????.htm

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

North Texas School Health Surveillance: First-Year Progress and Next Steps

North Texas School Health Surveillance: First-Year Progress and Next Steps Dean Lampman, MBA, Regional Surveillance Coordinator, Southwest Center for Advanced Public Health Practice* (Fort Worth, TX), dflampman@tarrantcounty.com Tabatha Powell, MPH, Doctoral student, University

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

ECDC SURVEILLANCE REPORT

ECDC SURVEILLANCE REPORT ECDC SURVEILLANCE REPORT Pandemic (H1N1) 2009 Weekly report: Individual case reports EU/EEA countries 31 July 2009 Summary The pandemic A(H1N1) 2009 is still spreading despite the fact that the regular

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Online Communicable Disease Reporting Handbook For Schools, Child-care Centers & Camps

Online Communicable Disease Reporting Handbook For Schools, Child-care Centers & Camps Online Communicable Disease Reporting Handbook For Schools, Child-care Centers & Camps More Information On Our Website https://www.accesskent.com/health/commdisease/school_daycare.htm Kent County Health

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

More information

Enhancing Respiratory Infection Surveillance on the Arizona-Sonora Border BIDS Program Sentinel Surveillance Data

Enhancing Respiratory Infection Surveillance on the Arizona-Sonora Border BIDS Program Sentinel Surveillance Data Enhancing Respiratory Infection Surveillance on the Arizona-Sonora Border BIDS Program Sentinel Surveillance Data Catherine Golenko, MPH, Arizona Department of Health Services *Arizona border population:

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

Harnessing the power of advanced analytics with IBM Netezza

Harnessing the power of advanced analytics with IBM Netezza IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced

More information

Planning for Pandemic Flu. pandemic flu table-top exercise

Planning for Pandemic Flu. pandemic flu table-top exercise Planning for Pandemic Flu Lawrence Dickson - University of Edinburgh Background Previous phase of health and safety management audit programme raised topic of Business Continuity Management Limited ability

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?

More information

Influenza and Pandemic Flu Guidelines

Influenza and Pandemic Flu Guidelines Influenza and Pandemic Flu Guidelines Introduction Pandemic flu is a form of influenza that spreads rapidly to affect most countries and regions around the world. Unlike the 'ordinary' flu that occurs

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

SWINE FLU: FROM CONTAINMENT TO TREATMENT

SWINE FLU: FROM CONTAINMENT TO TREATMENT SWINE FLU: FROM CONTAINMENT TO TREATMENT SWINE FLU: FROM CONTAINMENT TO TREATMENT INTRODUCTION As Swine Flu spreads and more people start to catch it, it makes sense to move from intensive efforts to contain

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Creating the Resilient Corporation

Creating the Resilient Corporation Creating the Resilient Corporation Business Continuity Planning and Pandemics Presented by: Eric Millard, Delivery Manager, Business Continuity and Recovery Services, Hewlett-Packard 2006 Hewlett-Packard

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine

More information

Pandemic Risk Assessment

Pandemic Risk Assessment Research Note Pandemic Risk Assessment By: Katherine Hagan Copyright 2013, ASA Institute for Risk & Innovation Keywords: pandemic, Influenza A, novel virus, emergency response, monitoring, risk mitigation

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 What is Big Data? Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 Data in the Twentieth Century and before In 1663,

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014 Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Web Data Mining: A Case Study. Abstract. Introduction

Web Data Mining: A Case Study. Abstract. Introduction Web Data Mining: A Case Study Samia Jones Galveston College, Galveston, TX 77550 Omprakash K. Gupta Prairie View A&M, Prairie View, TX 77446 okgupta@pvamu.edu Abstract With an enormous amount of data stored

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

Overview. Why this policy? Influenza. Vaccine or mask policies. Other approaches Conclusion. epidemiology transmission vaccine

Overview. Why this policy? Influenza. Vaccine or mask policies. Other approaches Conclusion. epidemiology transmission vaccine Overview Why this policy? Influenza epidemiology transmission vaccine Vaccine or mask policies development and implementation Other approaches Conclusion Influenza or mask policy Receive the influenza

More information

Illinois Influenza Surveillance Report

Illinois Influenza Surveillance Report ILLINOIS DEPARTMENT OF PUBLIC HEALTH Illinois Influenza Surveillance Report Week 8: Week Ending Saturday, February 25, 2012 Division of Infectious Diseases Immunizations Section 3/5/2012 1 Please note

More information

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com>

IC05 Introduction on Networks &Visualization Nov. 2009. <mathieu.bastian@gmail.com> IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Statistical Analysis and Visualization for Cyber Security

Statistical Analysis and Visualization for Cyber Security Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

IST687 Scientific Data Management

IST687 Scientific Data Management 1 IST687 Scientific Data Management Spring 2012 Instructor: Jian Qin Email: jqin@syr.edu Office: 311 Hinds Hall Phone: 315-443-5642 Time: any time Location: anywhere Course Description The Scientific Data

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Big Data Systems CS 5965/6965 FALL 2014

Big Data Systems CS 5965/6965 FALL 2014 Big Data Systems CS 5965/6965 FALL 2014 Today General course overview Q&A Introduction to Big Data Data Collection Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2014.html

More information

Summary of infectious disease epidemiology course

Summary of infectious disease epidemiology course Summary of infectious disease epidemiology course Mads Kamper-Jørgensen Associate professor, University of Copenhagen, maka@sund.ku.dk Public health science 3 December 2013 Slide number 1 Aim Possess knowledge

More information

Swine Influenza Special Edition Newsletter

Swine Influenza Special Edition Newsletter Swine Influenza Newsletter surrounding swine flu, so that you ll have the right facts to make smart decisions for yourself and your family. While the World Health Organization (WHO) and the Centers for

More information

How To Create An Insight Analysis For Cyber Security

How To Create An Insight Analysis For Cyber Security IBM i2 Enterprise Insight Analysis for Cyber Analysis Protect your organization with cyber intelligence Highlights Quickly identify threats, threat actors and hidden connections with multidimensional analytics

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

PREPARING YOUR ORGANIZATION FOR PANDEMIC FLU. Pandemic Influenza:

PREPARING YOUR ORGANIZATION FOR PANDEMIC FLU. Pandemic Influenza: PREPARING YOUR ORGANIZATION FOR PANDEMIC FLU Pandemic Influenza: What Business and Organization Leaders Need to Know About Pandemic Influenza Planning State of Alaska Frank H. Murkowski, Governor Department

More information

HIV NOMOGRAM USING BIG DATA ANALYTICS

HIV NOMOGRAM USING BIG DATA ANALYTICS HIV NOMOGRAM USING BIG DATA ANALYTICS S.Avudaiselvi and P.Tamizhchelvi Student Of Ayya Nadar Janaki Ammal College (Sivakasi) Head Of The Department Of Computer Science, Ayya Nadar Janaki Ammal College

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University

Mining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University Mining Big Data Pang-Ning Tan Associate Professor Dept of Computer Science & Engineering Michigan State University Website: http://www.cse.msu.edu/~ptan Google Trends Big Data Smart Cities Big Data and

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Using Web and Social Media for Influenza Surveillance

Using Web and Social Media for Influenza Surveillance Using Web and Social Media for Influenza Surveillance Courtney D Corley 1, Diane J Cook 2, Armin R Mikler 3 and Karan P Singh 4 Abstract Analysis of Google influenza-like-illness (ILI) search queries has

More information

ANALYTICS PREDICTIVE. Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out.

ANALYTICS PREDICTIVE. Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out. PREDICTIVE ANALYTICS Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out. Unless you expect the unexpected you will ever find truth, for it is hard

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Data Analytics for Healthcare: Creating understanding from big data

Data Analytics for Healthcare: Creating understanding from big data Data Analytics for Healthcare: Creating understanding from big data Data Analytics for Healthcare Data analytics is an essential resource for any profession. This collection of data and information is

More information

Exploiting the power of Big Data

Exploiting the power of Big Data Exploiting the power of Big Data Timos Sellis School of Computer Science and Information Technology timos.sellis@rmit.edu.au ITECHLAW Asia-Pacific Conference, February 26-28, 2014 Melbourne Australia Timeline

More information

Introduction to infectious disease epidemiology

Introduction to infectious disease epidemiology Introduction to infectious disease epidemiology Mads Kamper-Jørgensen Associate professor, University of Copenhagen, maka@sund.ku.dk Public health science 24 September 2013 Slide number 1 Practicals Elective

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

CS/COE 1501 http://cs.pitt.edu/~bill/1501/

CS/COE 1501 http://cs.pitt.edu/~bill/1501/ CS/COE 1501 http://cs.pitt.edu/~bill/1501/ Lecture 01 Course Introduction Meta-notes These notes are intended for use by students in CS1501 at the University of Pittsburgh. They are provided free of charge

More information