Big data for beginners An introduction for developmental professionals and researchers

Size: px
Start display at page:

Download "Big data for beginners An introduction for developmental professionals and researchers"

Transcription

1 Big data for beginners An introduction for developmental professionals and researchers Sriganesh Lokanathan, LIRNEasia Research relevant to broadband policy and regulatory processes New Delhi, India 21 August 2014 This work was carried out with the aid of a grant from the International Development Research Centre, Canada and the Department for International Development UK..

2 Learning outcomes for today Some understanding of what big data means An appreciation for some of the developmental opportunities from leveraging big data using different sources A brief overview of the spillover effects of the big data phenomenon An appreciation of some of the analytical and other challenges concerning the use of big data for development 2

3 What is big data? 3

4 Big Data The Vs VELOCITY Rate of change of data VOLUME Size of data VARIETY Different forms of data VERACITY Uncertainty of data VALUE Potential value of data 4

5 What has facilitated the rise of big data? Vast drops in the cost of storing and retrieving information Exponential growth in computer power and memory data can reside in persistent memory instead of disk and tape Major improvements in techniques for performing machine learning and reasoning 5

6 Sources of big data Administrative data E.g. digitized medical records, insurance records, tax records, etc. Commercial transactions E.g. Bank transactions, credit card purchases, supermarket purchases, online purchases, etc. Sensors and tracking devices E.g. road and traffic sensors, climate sensors, equipment & infrastructure sensors, mobile phones, satellite/ GPS devices, etc. Online activities/ social media E.g. online search activity, online page views, blogs/ FB/ twitter posts, online audio/ video/ images, etc. 6

7 Big data isn t necessarily a new area: the census is the original big data First recorded census occurred in Egypt (circa 3340 BC) But the tools didn t exist to fully exploit them The first tools invented for the 1890 US census It took 7 years to process the 1880 census data Herman Hollerith invents the tabulating machine to process the 1890 census data Source: Adam Schuster 7

8 If we want comprehensive coverage of the population, what are the sources of big data in developing economies? Administrative data? E.g. digitized medical records, insurance records, tax records, etc. Commercial transactions? E.g. Bank transactions, credit card purchases, supermarket purchases, online purchases, etc. Sensors and tracking devices? E.g. road and traffic sensors, climate sensors, equipment & infrastructure sensors, mobile phones, satellite/ GPS devices, etc. Online activities/ social media? E.g. online search activity, online page views, blogs/ FB/ twitter posts, online audio/ video/ images, etc. 8

9 Currently only mobile network big data has the widest possible population coverage 9

10 LIRNEasia s Big Data for Development (BD4D) Research LIRNEasiahas negotiated access to historical and anonymized telecom network big data from multiple operators in Sri Lanka In the current research cycle we are conducting exploratory research on answering a few social science questions related to mobility and connectedness developing a framework with privacy and self-regulatory guidelines for the collection, use and sharing of mobile phone data. 10

11 The data sets Multiple mobile operators in Sri Lanka have provided LIRNEasia access to 4 different types of meta-data: Call Detail Records (CDRs) Records of calls, SMS-es, Internet access Airtime recharge records Data sets do not include any Personally Identifiable Information (PII). All phone numbers are anonymized and LIRNEasiadoes not maintain any mappings of identifiers to original phone numbers 11

12 What are some types of big data captured by mobile network operators? Call Detail Record (CDR) Records of all calls made and received by a person created mainly for the purposes of billing Similar records exist for all SMS-essent and received as well as for all Internet sessions The Cell ID in turn has a lat-long position associated with it. 12

13 What are some types of big data captured by mobile network operators (contd.)? Airtime reload records Records of all airtime reloads performed by prepaid SIMs Each row corresponds to a record of one person s activity: 13

14 How can we use mobile network big data for development? 14

15 We can understand the mobility patterns of the population Low Volume of people High 15

16 We can understand land use characteristics People leave digital traces when they use communication devices. Diurnal patterns of user activity in different locations can be used to classify land use characteristics Euclidean distance between two time series is used to cluster base stations in an unsupervised manner using k- means algorithm 16

17 With even a simplistic clustering into just two regions we can clearly see the commercial areas Cluster 1 corresponds to more commercial area Cluster 2 corresponds to more residential (and mixed-used regions) 17

18 We can understand the geospatial distribution of the social networks in the country Each link represents the normalized volume of calls between two DSDs Divisional Secretariat Division (DSD) is a third level administrative division; 331 in total in LK Low No. of calls High 18

19 We can understand community structures The social network is segregated such that overlapping connections between communities are minimized using a modularity approach For Sri Lanka, the optimal number of communities discovered by the algorithm was 11 19

20 How much do these communities mesh with existing administrative boundaries? Southern and Northern provinces have the highest similarity to their respective provincial boundaries. 14

21 How much do these communities mesh with existing administrative boundaries (contd.)? Colombo district is clustered as a single community and Gampaha is merged with North Western Province 15

22 We can leverage the differential patterns of airtime recharge. Average recharge amount Higher household income Lower household income Recharge frequency 22

23 to create poverty maps An example from Cote d Ivoir Source: ThoralfGutierrez, Gautier Krings, Vincent D. Blondel, (2013). Evaluating socio-economic state of a country analyzing airtime credit and mobile phone datasets. Available at Average airtime recharge 23

24 What s the use of such work? Can bring timely evidence into the policy making process in developing economies 24

25 An example of timely policy-relevant evidence The image on the left depicts relative density of people in Colombo city and the surrounding regions at 1300 compared to 0000 (midnight the previous day) on a normal weekday. The yellow to red colors depict areas whose density has increased relative to midnight. The blue color depicts areas whose density has decreased relative to midnight (the darker the blue, the greater the loss in density). The clear areas are those where the overall density has not changed. 25

26 Our findings closely match results from expensive & infrequent transportation surveys 26

27 Taking a broader view: the overall process of leveraging mobile network big data for development Mobile network big data (CDRs, Internet access usage, airtime recharge records) Construct behavioral variables (i) Mobility variables (ii) Social variables (iii) Consumption variables Other data sources (i) Census data (ii) HIES data (iii) Survey maps (iv) Transportation schedules (v) ++++ Analytics Insights (i) Urban planning (ii) Crisis management & DRR (iii) Health monitoring & planning (iv) Socio-economic monitoring & planning 27

28 What are other potential data sources for big data for development?: Some examples 28

29 UN Global Pulse Global Post-2015 Twitter Conversation Searches approximately 500mn tweets daily across 16 key development topics (represented by about 25k keywords in multiple languages) 29

30 Google Dengue Trends ( Uses global search activity Follow-up to the successful (or not) Google Flu Trends ( but more on that later 30

31 That s cool, but who can get such work done? Answer: collaborative inter-disciplinary teams Data mining / computer science Information science Econometrics/ statistics Social science (economics, political science, sociology, anthropology, etc.) Domain expertise (e.g. urban & transport policy, etc.) 31

32 Spillovers from the big data phenomenon Greater emphases on data driven decision making Better tools for data extraction, analysis and visualization even for small data PDF to excel convertors e.g. Tabula Google Fusion Tables Data Driven Documents (D3) 32

33 Analytical challenges of working with big data Data provenance & data cleaning Representativeness Behavioral change Ground context Causation vs. correlation The role of small data for verification as well as for bootstrapping Transparency & replicability 33

34 Data provenance Data provenance is the process of tracing the pathways taken by data from the originating source through all the processes that may have mutated and/or replicated the data up until it is utilized for the analyses in question How relevant is it for big data analyses? Quite relevant though it may not be feasible to establish provenance to the extent desired by the scientific community E.g. when analyzing CDRs, some operators may have multiple records for forwarded calls. Not knowing this, can potentially affect social network analysis 34

35 Data cleaning Same steps apply for big data as for small data : First, verify that quantitative and categorical variables are coded as it should be Second, remove outliers. The methods though might have to be different E.g. when removing outliers, you have to use decision tree algorithms or heuristics such as 3-sigma from the mean, etc. 35

36 Is the data representative? Large volume may make sampling rate irrelevant but doesn t necessarily make the data representative Question: can you think of some representativeness issues in reference to mobile network big data? 36

37 Changes in behavior Once you know the process people will try to beat it E.g. Google page rank and Search Engine Optimization (SEO) Digitized online behavior can be subject to self-censorship and the creation of multiple personas E.g. your Facebook and Twitter posts can mirror how narcissistic you are Transactional data (by-product of doing something else e.g. making phone calls) is therefore the best form of behavioral data But even transactional data can be subject to behavioral changes Question: can you think of an example with respect to mobile network big data? 37

38 Ground context Knowing and understanding real world context is important, otherwise you might make false inferences Example from Sri Lanka: The majority of airtime reloads occur through scratch cards. Higher denomination cards are not easily available Question: Based on what you learnt earlier about how to differentiate between those form lower and higher income households, can you identify a potential problem when trying to develop poverty maps from mobile network big data? 38

39 Causation versus Correlation There are often more police in precincts with high crime, but that does not imply that increasing the number of police in a precinct would increase crime Source: Varian, H. R. (2013). Big Data: New Tricks for Econometrics. Available at Big Data analyses can ONLY reveal correlation NOT causation You can get to causal effect in a Big Data world by running experiments, but you and I cannot do that easily 39

40 But in some instances correlation can be enough to take actions E.g. Street Bump mapp in Boston Released by Boston City Hall for smartphones Once loaded on smartphones, uses the phone s accelerometer to detect potholes when users are driving, and then notifies City Hall of location. Certain changes in accelerometer readings correlates with passing over a pothole, but no guarantee of causal effect. Question: Why is correlation enough to take action in this case? 40

41 The role of traditional small data in verification and bootstrapping Verification: E.g. we need transport survey data to verify if the patterns of mobility discovered using mobile network big data meshes with real conditions Bootstrapping E.g. We can use mobile network big data from around the time of the census to build models to approximate socio-economic levels from just mobile network big data. This is then used to reverse engineer approximate census/ poverty maps in future when other data is not easily available. See Frias-Martinez, V., & Virseda, J. (2012). On the relationship between socio-economic factors and cell phone usage. In both cases, big data is complement and not a substitute for small data, except in some cases. 41

42 Transparency & replicability A lot of interesting big data sources are held by private entities The benefits of peer-review not yet easily available for much big data based research Less transparency and replicabilitymeans less chance of catching errors and improving the models 42

43 An illustration of the challenges using the example of Google Flu Trends (GFT) Since being launched successfully in 2008, subsequent research by others has found problems with how well GFT can predict incidence of Flu Problems: Original results based on bootstrapping with data from CDC, but there was no subsequent fine-tuning Over time more and more people started using Google to search for information on health issues, creating rebound effects post introduction of GFT Influenza-like illness rates did not necessarily correlate with actual influenza virus rates Original research article did not reveal the specific 45 search terms that were used to build the model 43

44 Analytical challenges of working with big data Data provenance & data cleaning Representativeness Behavioral change Ground context Causation vs. correlation The role of small data for verification as well as for bootstrapping Transparency & replicability For more discussion see my guest blog post at 44

45 Other challenges of trying to leverage big data for development Getting access to data Privacy 45

46 Getting access Some, such as Twitter data are accessible to all (with some volume limits) But for most private data sources, initially it will be through ad hoc means, though that is changing E.g. Orange Data for Development competition & Telecom Italia Big Data challenge See ml LIRNEasia and others working towards opening up the playing field for others 46

47 Privacy There are personal and collective privacy implications The difficulty is that: Informed consent is meaningless in a big data world People actually don t really know what their generalizable privacy needs or how their preferences might evolve Ways to deal with the privacy issue: Anonymization (with caveats) Different levels of disaggregation of shared data Evolution from a rights based approach to a harms based approach However more research still required 47

48 What you should have gained from today s lecture (hopefully ) Some understanding of what big data means An appreciation for some of the developmental opportunities from leveraging big data using different sources A brief overview of the spillover effects of the big data phenomenon An appreciation of some of the analytical and other challenges concerning the use of big data for development 48

49 Thank you 49

How big is big data? An introduc+on for developmental professionals and researchers

How big is big data? An introduc+on for developmental professionals and researchers How big is big data? An introduc+on for developmental professionals and researchers Sriganesh Lokanathan, LIRNEasia Young Scholar Seminar CPRsouth 2014 Maropeng, South Africa 7 September 2014 This work

More information

11 th World Telecommunication/ICT Indicators Symposium (WTIS-13)

11 th World Telecommunication/ICT Indicators Symposium (WTIS-13) 11 th World Telecommunication/ICT Indicators Symposium (WTIS-13) Mexico City, México, 4-6 December 2013 Contribution to WTIS-13 Document C/21-E 6 December 2013 English SOURCE: TITLE: LIRNEasia Leveraging

More information

How To Understand The Political And Social Situation In Sri Lanka

How To Understand The Political And Social Situation In Sri Lanka Mobile network big data for urban and transporta4on planning in Colombo, Sri Lanka Rohan Samarajiva & Sriganesh Lokanathan Data for Policy conference, 17 June 2015 This work was carried out with the aid

More information

Big Data for Development: New opportunities for emerging markets

Big Data for Development: New opportunities for emerging markets Big Data for Development: New opportunities for emerging markets International JRC-IPTS Workshop Big Data for the Analysis of Digital Economy and Society Sevilla, 22 September 2015 This work was carried

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

Big Data Analytics: 14 November 2013

Big Data Analytics: 14 November 2013 www.pwc.com CSM-ACE 2013 Big Data Analytics: Take it to the next level in building innovation, differentiation and growth 14 About me Data analytics in the UK Forensic technology and data analytics in

More information

Crime Hotspots Analysis in South Korea: A User-Oriented Approach

Crime Hotspots Analysis in South Korea: A User-Oriented Approach , pp.81-85 http://dx.doi.org/10.14257/astl.2014.52.14 Crime Hotspots Analysis in South Korea: A User-Oriented Approach Aziz Nasridinov 1 and Young-Ho Park 2 * 1 School of Computer Engineering, Dongguk

More information

Big Data Systems CS 5965/6965 FALL 2014

Big Data Systems CS 5965/6965 FALL 2014 Big Data Systems CS 5965/6965 FALL 2014 Today General course overview Q&A Introduction to Big Data Data Collection Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2014.html

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications

Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications Big Data Big Data: Introduction and Applications August 20, 2015 HKU-HKJC ExCEL3 Seminar Michael Chau, Associate Professor School of Business, The University of Hong Kong Ample opportunities for business

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Exploring New Methods of Data Gathering in Long-Distance Passenger Travel Data

Exploring New Methods of Data Gathering in Long-Distance Passenger Travel Data Exploring New Methods of Data Gathering in Long-Distance Passenger Travel Data Ben Pierce, Battelle June 8, 2011 1 Why Are New Methods Needed? Declining Response Rates Survey saturation in US Growing cell-only

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Crowdsourcing mobile networks from experiment

Crowdsourcing mobile networks from experiment Crowdsourcing mobile networks from the experiment Katia Jaffrès-Runser University of Toulouse, INPT-ENSEEIHT, IRIT lab, IRT Team Ecole des sciences avancées de Luchon Networks and Data Mining, Session

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume.

What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume. 8/26/2014 CS581 Big Data - Fall 2014 1 8/26/2014 CS581 Big Data - Fall 2014 2 CS535/CS581A BIG DATA What is Big Data? PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA 2. COURSE INTRODUCTION PART 0. INTRODUCTION

More information

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research

Big Data for Social Good. Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research Big Data for Social Good Nuria Oliver, PhD Scientific Director User, Data and Media Intelligence Telefonica Research 6.8 billion subscriptions 96% of world s population (ITU) Mobile penetration of 120%

More information

(Big) Data Analytics: From Word Counts to Population Opinions

(Big) Data Analytics: From Word Counts to Population Opinions (Big) Data Analytics: From Word Counts to Population Opinions Mark Keane Insight@University College Dublin October 2014 ~ RSS ~ Edinburgh September 2014/EPIC 2 September 2014/EPIC 3 September 2014/EPIC

More information

ENVS 298: Environmental Accountability and Data Visualization

ENVS 298: Environmental Accountability and Data Visualization ENVS 298: Environmental Accountability and Data Visualization Fall Quarter, 2014 S Y L L A B U S CLASS TIME MWF 10:00 10:50 Library 101/CRN 17339 INSTRUCTORS Professor G. Bothun AND CONTACT Office: 417

More information

USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA

USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA USING SPATIAL DATA MINING TO DISCOVER THE HIDDEN RULES IN THE CRIME DATA Karel, JANEČKA 1, Hana, HŮLOVÁ 1 1 Department of Mathematics, Faculty of Applied Sciences, University of West Bohemia Abstract Univerzitni

More information

Big Data for Development

Big Data for Development Big Data for Development Malarvizhi Veerappan Senior Data Scientist @worldbankdata @malarv Why is Big Data relevant for Development? In developing countries there are large data gaps, both in quantity

More information

How To Create A Data Science System

How To Create A Data Science System Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Big Data-Challenges and Opportunities

Big Data-Challenges and Opportunities Big Data-Challenges and Opportunities White paper - August 2014 User Acceptance Tests Test Case Execution Quality Definition Test Design Test Plan Test Case Development Table of Contents Introduction 1

More information

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon

The Impact of Big Data on Social Research David Rhind Sharon Witherspoon The Impact of Big Data on Social Research David Rhind Sharon Witherspoon 1 www.nuffieldfoundation.org The landscape to be covered What is Big Data? Just consultants hype? Key questions for SRA Technology

More information

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics North Highland and Analytics Governance Considerations for Big Analytics Agenda Traditional BI/Analytics vs. Big Analytics Types of Requiring Governance Key Considerations Information Framework Organizational

More information

Big Data how it changes the way you treat data

Big Data how it changes the way you treat data Big Data how it changes the way you treat data Oct. 2013 Chung-Min Chen Chief Scientist Info. Analysis Research & Services The views and opinions expressed in this presentation are those of the author

More information

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick Director @rkirkpatrick www.unglobalpulse.org @unglobalpulse Global Pulse Vision:

More information

monitoring and for development

monitoring and for development Measuring the Information Society Report 2014 Chapter 5. The role of big data for ICT monitoring and for development 5.1 Introduction One of the key challenges in measuring the information society has

More information

BIG DATA FOR DEVELOPMENT: A PRIMER

BIG DATA FOR DEVELOPMENT: A PRIMER June 2013 BIG DATA FOR DEVELOPMENT: A PRIMER Harnessing Big Data For Real-Time Awareness WHAT IS BIG DATA? Big Data is an umbrella term referring to the large amounts of digital data continually generated

More information

Using multiple models: Bagging, Boosting, Ensembles, Forests

Using multiple models: Bagging, Boosting, Ensembles, Forests Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or

More information

BIG DATA, SMALL DATA, OPEN DATA

BIG DATA, SMALL DATA, OPEN DATA BIG DATA, SMALL DATA, OPEN DATA opportunities and challenges for governance BETHSIMONENOVECK Belk School of Business, Charlotte, North Carolina MAY132014 http://online.northcarolina.edu/unconline/courses.php

More information

Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits

Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits George Mason University SYST 699: Masters Capstone Project Spring 2014 Project Proposal: SAP Big Data Analytics on Mobile Usage Inferring age and gender of a person through his/her phone habits February

More information

How is the World Bank harnessing Big Data for development? Isabelle Huynh Sr Operations Officer World Bank

How is the World Bank harnessing Big Data for development? Isabelle Huynh Sr Operations Officer World Bank How is the World Bank harnessing Big Data for development? Isabelle Huynh Sr Operations Officer World Bank How is the World Bank harnessing Big Data for development? Isabelle Huynh Sr Operations Officer

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

The Challenges of Geospatial Analytics in the Era of Big Data

The Challenges of Geospatial Analytics in the Era of Big Data The Challenges of Geospatial Analytics in the Era of Big Data Dr Noordin Ahmad National Space Agency of Malaysia (ANGKASA) CITA 2015: 4-5 August 2015 Kuching, Sarawak Big datais an all-encompassing term

More information

Data Isn't Everything

Data Isn't Everything June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives NET NATIVES HISTORY & SERVICES Welcome to our report on using data to analyse the behaviour of people

More information

SEO Marketing Strategy. Keeping you connected through SEO

SEO Marketing Strategy. Keeping you connected through SEO SEO Marketing Strategy Keeping you connected through SEO Table of Contents Popularity Keywords Content Uniqueness Marketing Links Traffic Ranking Innovation Optimization Analysis Popularity In order to

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Role of Social Networking in Marketing using Data Mining

Role of Social Networking in Marketing using Data Mining Role of Social Networking in Marketing using Data Mining Mrs. Saroj Junghare Astt. Professor, Department of Computer Science and Application St. Aloysius College, Jabalpur, Madhya Pradesh, India Abstract:

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Is big data the new oil fuelling development?

Is big data the new oil fuelling development? Is big data the new oil fuelling development? 12th National Convention on Statistics Manila, Philippines 2 October, 2013 Johannes Jütting PARIS21 Big data (2 The future? Linked data: Is this the future?..

More information

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity EXECUTIVE REPORT Big Data and the 3 V s: Volume, Variety and Velocity The three V s are the defining properties of big data. It is critical to understand what these elements mean. The main point of the

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Analyzing HTTP/HTTPS Traffic Logs

Analyzing HTTP/HTTPS Traffic Logs Advanced Threat Protection Automatic Traffic Log Analysis APTs, advanced malware and zero-day attacks are designed to evade conventional perimeter security defenses. Today, there is wide agreement that

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Small Business Webinar SEOTrainingSW.com 7 Secrets to Online Marketing

Small Business Webinar SEOTrainingSW.com 7 Secrets to Online Marketing Introduction: Jon Rognerud, the best selling author of The Ultimate Guide to Search Engine Optimization and one of the top SEO s in our industry was our guest during this 90- minute webinar focusing on

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance Big Data Enabling Better Pharmacovigilance INTRODUCTION Biopharmaceutical companies are seeing a surge in the amount of data generated and made available to identify better targets, better design clinical

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

SEO 2.0 ADVANCED SEO TIPS & TECHNIQUES ABSTRACT»

SEO 2.0 ADVANCED SEO TIPS & TECHNIQUES ABSTRACT» 2.0 ADVANCED TIPS & TECHNIQUES ABSTRACT» Savvy online marketers know that their website is a great tool for branding, content promotion and demand generation. And they realize that search engine optimization

More information

Setting the Standard for Safe City Projects in the United States

Setting the Standard for Safe City Projects in the United States Leading Safe Cities Setting the Standard for Safe City Projects in the United States Edge360 is a provider of Safe City solutions to State & Local governments, helping our clients ensure they have a secure,

More information

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

Traffic Prediction and Analysis using a Big Data and Visualisation Approach Traffic Prediction and Analysis using a Big Data and Visualisation Approach Declan McHugh 1 1 Department of Computer Science, Institute of Technology Blanchardstown March 10, 2015 Summary This abstract

More information

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith / http://about.me/mikessmith

Now, Next and the Future: IT, Big Data and other Implications for RIM. Presented by Michael S. Smith / http://about.me/mikessmith Now, Next and the Future: IT, Big Data and other Implications for RIM Agenda for This Afternoon Now: What trends are creating implications within the profession? Next: Why is IT now concerned about RIM?

More information

BIG DATA: IT MAY BE BIG BUT IS IT SMART?

BIG DATA: IT MAY BE BIG BUT IS IT SMART? BIG DATA: IT MAY BE BIG BUT IS IT SMART? Turning Big Data into winning strategies A GfK Point-of-view 1 Big Data is complex Typical Big Data characteristics?#! %& Variety (data in many forms) Data in different

More information

Digital Marketing Training Institute

Digital Marketing Training Institute Our USP Live Training Expert Faculty Personalized Training Post Training Support Trusted Institute 5+ Years Experience Flexible Batches Certified Trainers Digital Marketing Training Institute Mumbai Branch:

More information

Big Data and Open Data

Big Data and Open Data Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of

More information

Smart Cities. Opportunities for Service Providers

Smart Cities. Opportunities for Service Providers Smart Cities Opportunities for Service Providers By Zach Cohen Smart cities will use technology to transform urban environments. Cities are leveraging internet pervasiveness, data analytics, and networked

More information

International collaboration to understand the relevance of Big Data for official statistics

International collaboration to understand the relevance of Big Data for official statistics Statistical Journal of the IAOS 31 (2015) 159 163 159 DOI 10.3233/SJI-150889 IOS Press International collaboration to understand the relevance of Big Data for official statistics Steven Vale United Nations

More information

From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012

From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012 From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012 Safe Harbor Statement Forward Looking Statements The following information contains, or may be deemed to contain,

More information

Internet Marketing Institute Delhi Mobile No.: 9643815724 DIMI. Internet Marketing Institute Delhi (DIMI)

Internet Marketing Institute Delhi Mobile No.: 9643815724 DIMI. Internet Marketing Institute Delhi (DIMI) Internet Marketing Institute Delhi () LEARN INTERNET MARKETING To earn money at home and work with big entrepreneurs Batches Weekday Batches (60 hours) Morning Batches (60 hours) Evening Batches (60 hours)

More information

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business Bruhati Technologies ISO 9001:2008 certified Technology fit for Business About us 1 Strong, agile and adaptive Leadership Geared up technologies for and fast moving long lasting With sound understanding

More information

The STC for Event Analysis: Scalability Issues

The STC for Event Analysis: Scalability Issues The STC for Event Analysis: Scalability Issues Georg Fuchs Gennady Andrienko http://geoanalytics.net Events Something [significant] happened somewhere, sometime Analysis goal and domain dependent, e.g.

More information

Demystifying Big Data. James Rawlins Senior Consultant

Demystifying Big Data. James Rawlins Senior Consultant Demystifying Big Data James Rawlins Senior Consultant What s the Big Idea? Objectives for the session Understand what is meant by Big Data Provide some context around what is Big Are we big? Understand

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

Rapid Visualization with Big Data Analytics. Ravi Chalaka VP, Solution and Social Innovation Marketing

Rapid Visualization with Big Data Analytics. Ravi Chalaka VP, Solution and Social Innovation Marketing Rapid Visualization with Big Data Analytics Ravi Chalaka VP, Solution and Social Innovation Marketing Imagine the Future Innovative cities that dramatically enhance the wellbeing of its citizens Safer

More information

TERMS OF REFERENCE (TORs)

TERMS OF REFERENCE (TORs) TERMS OF REFERENCE (TORs) OVERVIEW TITLE LOCATION OF ASSIGNMENT LANGUAGE(S) REQUIRED TRAVEL DURATION OF CONTRACT SECTION & UNIT CONSULTANT REPORTING TO Data Science Researcher & Lead analyst for Information

More information

Recent developments in EU Transport Policy

Recent developments in EU Transport Policy Recent developments in EU Policy Paolo Bolsi DG MOVE - Unit A3 Economic Analysis and Impact Assessment 66 th Session - Working Party on Statistics United Nations Economic Committee for Europe 17-19 June

More information

BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: jrg.albert@nscb.gov.

BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: jrg.albert@nscb.gov. BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: jrg.albert@nscb.gov.ph 1 *Views expressed do not reflect those at NSCB Outline

More information

URBAN MOBILITY IN CLEAN, GREEN CITIES

URBAN MOBILITY IN CLEAN, GREEN CITIES URBAN MOBILITY IN CLEAN, GREEN CITIES C. G. Cassandras Division of Systems Engineering and Dept. of Electrical and Computer Engineering and Center for Information and Systems Engineering Boston University

More information

Sentiment Analysis on Big Data

Sentiment Analysis on Big Data SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social

More information

Data Mining Techniques in CRM

Data Mining Techniques in CRM Data Mining Techniques in CRM Inside Customer Segmentation Konstantinos Tsiptsis CRM 6- Customer Intelligence Expert, Athens, Greece Antonios Chorianopoulos Data Mining Expert, Athens, Greece WILEY A John

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Web Archiving and Scholarly Use of Web Archives

Web Archiving and Scholarly Use of Web Archives Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on

More information

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation Mobile Monetization Scenario Design & Big Data Arther Wu Senior Director of Monetization and Business Operation Agenda Quick update of Cheetah Mobile Ad Scenario Design Big Data / Relation with Advertising

More information

The State of Inbound Lead Generation

The State of Inbound Lead Generation www.hubspot.com The State of Inbound Lead Generation Analysis of lead generation best practices used by over 1,400 small- and medium-sized businesses March 2010 Contents Summary... 3 Introduction... 4

More information

BIG DATA: BIG BOOST TO BIG TECH

BIG DATA: BIG BOOST TO BIG TECH BIG DATA: BIG BOOST TO BIG TECH Ms. Tosha Joshi Department of Computer Applications, Christ College, Rajkot, Gujarat (India) ABSTRACT Data formation is occurring at a record rate. A staggering 2.9 billion

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Moreketing. With great ease you can end up wasting a lot of time and money with online marketing. Causing

Moreketing. With great ease you can end up wasting a lot of time and money with online marketing. Causing ! Moreketing Automated Cloud Marketing Service With great ease you can end up wasting a lot of time and money with online marketing. Causing frustrating delay and avoidable expense right at the moment

More information

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

Big Trouble. Does Big Data spell. for Lawyers? Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado Does Big Data spell Big Trouble for Lawyers? Paul Karlzen Director HR Information & Analytics April 1, 2015 Presented to Colorado Bar Association, Communications & Technology Law Section Denver, Colorado

More information

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns 20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing

More information

Preliminary Syllabus for the course of Data Science for Business Analytics

Preliminary Syllabus for the course of Data Science for Business Analytics Preliminary Syllabus for the course of Data Science for Business Analytics Miguel Godinho de Matos 1,2 and Pedro Ferreira 2,3 1 Cato_lica-Lisbon, School of Business and Economics 2 Heinz College, Carnegie

More information

TECHNICAL MARKETING TRACK

TECHNICAL MARKETING TRACK TECHNICAL MARKETING TRACK Our job is to connect to people, to interact with them in a way that leaves them better than we found them, more able to get where they d like to go. Seth Godin Our technical

More information

Privacy & data protection in big data: Fact or Fiction?

Privacy & data protection in big data: Fact or Fiction? Privacy & data protection in big data: Fact or Fiction? Athena Bourka ENISA ISACA Athens Conference 24.11.2015 European Union Agency for Network and Information Security Agenda 1 Privacy challenges in

More information

SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE

SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE SOCIAL MEDIA MEASUREMENT: IT'S NOT IMPOSSIBLE Chris Murdough A lot of excitement and optimism surround the potential of social media for marketers-after all, that is where attractive audience segments

More information

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information