BIG DATA. From hype to reality
|
|
|
- Prudence Bryant
- 10 years ago
- Views:
Transcription
1 Örebro University School of Business Statistics, advanced level thesis, 15 credits Supervisor: Per-Gösta Andersson Examiner: Sune Karlsson Spring 2014 BIG DATA From hype to reality Sabri Danesh
2 Abstract Big data is all of a sudden everywhere. It is too big to ignore! It has been six decades since the computer revolution, four decades after the development of the microchip, and two decades of the modern Internet! More than a decade after the 90s.com fizz, can Big Data be the next Big Bang? Big data reveals part of our daily lives. It has the potential to solve virtually any problem for a better urbanized global. Big Data sources are also very interesting from an official statistics point of view. The purpose of this paper is to explore the conceptions of big data and opportunities and challenges associated with using big data especially in official statistics. A petabyte is the equivalent of 1,000 terabytes, or a quadrillion bytes. One terabyte is a thousand gigabytes. One gigabyte is made up of a thousand megabytes. There are a thousand thousand i.e., a million petabytes in a zettabyte (Shaw 2014). And this is to be continued
3 Acknowledgments I would like to express my gratitude to my supervisor Per-Gösta Andersson for his assistance and guidance throughout my thesis. I cannot say thanks enough for his remarkable support and help. I would especially thank Ingegerd Jansson (Statistics Sweden) for guiding me throughout my thesis, directing me into new ideas and for making me motivated and encouraged. I would like to thank my examiner Sune Karlsson for providing valuable suggestions and corrections. I would also like to thank Panagiotis Mantalos for his caring attitudes and support. Furthermore, I would like to thank my partner, Kamal, for his love, kindness and support he has shown during my study which has taken me to finalize this thesis. I would also like to thank all my friends for their endless support. Finally yet importantly, I would like to dedicate this thesis to my beloved brother, Zhiaweh Danesh ( ), whom was a great economist-statistician and my life s greatest hero. He may rest in peace.
4 Table of contents 1. Introduction Big Data Definition Previous studies Big Data and official statistics Big Data at Statistics Sweden Big Data at other agencies Methods for inference Selectivity Method Challenges Data Privacy Analysis Discussion & Conclusion References... 24
5 1. Introduction Data is everywhere. As the world goes modern, more and more data are being generated. Data are produced from phones, credit cards, computers, sensor, trains, buses, planes, bridges, and factories! The list goes on. Marc Andreessen excellently argued: Software is eating the world in his 2011 essay. According to Andreessen (2011), in the next decade, at least five billion people worldwide will have own smartphones, that gives every one of them direct access to the Internet, at any time (Andreessen 2011). Figure 1 shows the digital data created annually worldwide. Figure 1: Digital Data Created Annually Worldwide Source: Energy-Facts.org (2012). The amount of data and the frequency at which they are produced have led to the introduction of the term Big Data. Everyone seems to be curious about it and willing to collect and analyzing it (Jansson & Isaksson 2013). Big data is a data source with at least three features: extremely large volumes of data, extremely high velocity of data and extremely wide variety of data. It is important because it allows for gathering, storing and managing enormous amounts of data in real time to gain a bigger understanding of the information (Hurwitz, Nugent, Halper & Kaufman 2013). 1
6 The data is here; its challenges and the way to make it useful are known to be an IT problem rather than a statistical issue (Jansson & Isaksson 2013). Big data has been looked at from an IT perspective where the focus is mainly on software and hardware issues (Daas et al. 2012). IT-people designed new methods for processing, evaluating, and presenting the data which is called Big Data Analytics. The statistical offices are now also beginning to adapt the big data s problem (Jansson & Isaksson 2013). But the question is if the same statistical methods are applicable to big data sources and if big data will meet the goals of official statistics? The aim of this paper is to investigate the term big data and the opportunities and challenges associated with using big data especially in official statistics production. Three options have been proposed on using big data in official statistics production by Robert M. Groves 1 : ignoring big data as the first option, destroying all official statistical structures and replace them with big data as the second option, or combining big data with traditional bases. Grove came to the conclusion that the first two options are unacceptable and irrational. So the third option, using big data to improve or somewhat replace traditional data sources, is the most possible case (Jansson & Isaksson 2013). However the theory so far is that that by combining the power of modern computing with the overflowing data of the digital era, big data promises to solve almost any problem (Cheung 2012). This paper provides an overview of the concept of big data in general in chapter 2. Section 3 presents some big data case studies, followed by discussing big data in the world of official statistics in section 4. Some methods for inference and the selectivity problem are explored in section 5. Next chapter is exploring the dark sides of big data. The problem with data, privacy and analyzing big data are discussed. Section 7 is discussing the paper along with some conclusions. 1 Robert M. Groves s speech at the opening session of NTTS in March
7 2. Big Data The world today is oversupplied with information. There are cellphones in almost every pocket, computers at every home and offices, Wi-Fi everywhere. The scale of information is growing faster than ever before and this quantitative shifting has led to a qualitative one. The term Big Data was first coined in the 2000s by sciences like astronomy after experiencing the data explosion (Cukier & Mayer-Schonberger 2013) Definition There is no accurate definition of big data. Every paper on big data defines the phenomenon differently. There are various existing definitions of big data available, which usually include the three Vs: volume, velocity and variety. Volume refers to the data sets being large; much larger than usual. Velocity points to the short time lag between the occurrence of an event and analyzing it. It can also refer to the regularity at which data is generated. Variety indicates the wide mixture of data sources and formats: from financial transactions to text and video messages (Cukier & Mayer-Schonberger 2013). Figure 2 shows an expanding on the three Vs. Figure 2: The 3 V:s at an increasing rate. Source: 3
8 IBM has a forth V, veracity, in its definition of big data which takes account to the accuracy of the information and if the data could be trusted enough in order to make important decisions (IBM 2012). Gartner, Inc., the world's leading information technology research and advisory company, defines big data as: Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Cukier and Mayer-Schönberger choose the following definition of big data in their book: Big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more. And statistical organizations regard big data as (Jansson & Isaksson 2013): Data that is difficult to collect, store or process within the conventional systems of statistical organizations. Either, their volume, velocity, structure or variety requires the adoption of new statistical software processing techniques and/or IT infrastructure to enable cost-effective insights to be made. An important factor, which makes big data different from official statistics, is that big data sources often contain information not necessarily directly related with statistical elements such as households, persons or enterprises. The information in big data is often a byproduct of some process not principally aimed at data collection, while survey sampling and registers clearly are. Therefore, analysis of big data is more data driven than hypothesis based (Buelens, Dass, Burger, Puts & Brakel 2014). Table 1 compares big data sources with traditional data sources such as sample surveys and administrative registers. Apart from the three V:s, three additional categories are listed. Records factor is looking at the information scale in which data are being observed and stored. Generating mechanism refers to how the data source is being generated. 4
9 The last difference listed in Table 1, fraction of population, refers to coverage of the data source in relation to the population of interest. The most important dissimilarity is between registers and big data; registers often has almost complete coverage of the population, while big data generally do not. In some cases of big data sources, it may even be indistinct what the target population is (Buelens et al 2014). Table 1: Comparing data sources for official statistics. Data source Sample survey Register Big data Volume Small Large Big Velocity Slow Slow Fast Variety Narrow Narrow Wide Records Units Units Events or units Generating mechanism Sample Administration Various Fraction of population Small Large, complete Large, incomplete Source: (Buelens et al 2014). There is one more category, which is not present in table 1. It is the error measuring for each of the three data sources. In survey sampling, all the sources of error such as sampling variance, non-response bias, interviewer effects and measurement errors are included in the concept of Total Survey Error (Buelens et al 2014). As for Big data, no complete approaches to error of budgeting or quality phases have developed yet. The bias due to selectivity affects the error accounting of big data, but on the other hand, there are some other features to consider. For example, the measuring instruments for big data sources differ from survey sampling, where the survey design and capable interviewer and well defined hypotheses are the key elements (Buelens et al 2014). 5
10 3. Previous studies Almost all previous studies about big data show the great opportunities that come with big data. Big data brings up the newfound facility to crunch a vast quantity of information, questioning it instantly, and even drawing shocking conclusions from it. Big data is a developing approach; it can translate numerous phenomena, all from the price of airline tickets to the text of millions of books, allowing to be searched successfully and by using our growing computing techniques discovers epiphanies that we never could have seen before. Big data is a revolution on the same level as the internet, it will change the way we think about many important matters such as business, health, politics, education, and innovation in many years to come (Cukier & Mayer-Schonberger 2013). Cukier and Mayer-Schönberger, two leading experts of big data, explain what big data is, how it will change our lives, and what we can do to protect ourselves from the hazards. Their book, Big Data, a revolution that will transform how we live, work, and think, is the first big book about the next big bang. Cukier and Mayer-Schönberger argue that the more data there is, the more useful it becomes. By analyzing sensitive facts about 100 million of observations rather than just one or maybe dozen or a small sample, diseases can be cured, elections could be win, billions of dollars could be earned and much more. The authors believe that by analyzing huge amounts of data, more patterns and relationships are possible to discover, patterns that are mostly invisible when using smaller amount of information. These integrations will guide us to new solutions and opportunities we would never otherwise have alleged. Cukier and Mayer-Schönberger write about many examples. One example involves the store Walmart and the notorious breakfast snack Pop-Tarts. Walmart decides to record every purchase by every customer for future analysis. After a while, the company analysts observed that when the National Weather Service warned of a tornado storm, the sale of Pop-Tarts rised significantly in Walmart stores in the affected area. Therefore, store managers put Pop-Tarts near the entry of the store during hurricane season, and sales flew. This is big data at its coolest. No one would have guessed the linking. The power tracking company, Efergy USA, is a big seller of monitors and hardware that connects to fuse boxes via wireless. The monitor shows the energy consumption up to 255 days in the past. It calculates hourly energy usage, the consumption trends and the price! 6
11 According to Juan Gonzalez, president of Efergy USA, It makes you realize when you re using too much electricity and see how you can reduce. Their system could be set to alert letting the customers know when they reach their target consumption. This way, it can be easier to save on electricity bills (Wakefield 2014). In Efergy s case, big data makes it clear to see what is happening on a larger scale and find solutions. For example, in a case where a customer wants to cut down the energy bill, he or she can see where the cost can be cut. The data collected also shows the client s peak hours. When you put data in a larger context, which is big data, it allows them to help make more sense of that information and make it more actionable, the only way we can detect all these things in our home is looking at many homes and developing an algorithm to determine the connection., states Ali Kashani, co-founder and the vice president of software development at Energy Aware, an energy monitoring business (Wakefield 2014). Cukier & Mayer-Schonberger discuss about how cheap and easy is to store gigantic amount of information nowadays, which once was impossible. As a result, we now can record almost everything. The authors also explain as well that simply throwing more data at a problem can create remarkable results. Microsoft Corporation found that the spell checker in word processing softwares could be highly improved by having it process a database of 1 billion words. Google Inc. boosted its language translation service by using the Internet for billions of pages of translated papers and analyzing them. Amazon.com used the customer s individual shopping preferences to suggest new books to each customer by using computers to analyze millions of transactions, which was not only a cheap method, but also gave excellent results (Cukier & Mayer-Schonberger 2013). Why? Who knows? Knowing what, not why, is good enough the authors focus. Big data analysis does not care about causality but correlation. It often uncovers surprising results. However, computers do not care, therefore statistical methods are required to unveil the hidden connotations (Cukier & Mayer-Schonberger 2013). In 2008 Wired magazine s editor in chief, Chris Anderson, stated how inefficient it is to use the scientific method due to big data. In his article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, he claimed that with enormous amount of data the scientific method would be out of date. Anderson emphasized that observing, developing a model and formulating a hypothesis, testing the hypothesis by conducting experiments and 7
12 collecting data, analyzing and interpreting the data, are all going to be replaced by statistical analysis of correlations and without any theory. He argues that all the old models or theories are invalid and by using more information, the modelling step could be skipped and instead statistical methods could be used to find patterns without making hypotheses first. He values correlation over causality (Anderson 2008). Anderson wrote: Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves. Data analysis is inspiring, but not perfect. Cukier and Mayer-Schönberger open the book by writing about Google s Flu Trends service, which uses studying of billions of internet searches to estimate the odds of flu in the United States. However, even this overhyped technique failed completely, when the estimate of flu cases was twice the actual number. In addition to finding trends, big data analysis is getting better and better at forecasting performance, the book points out. Police uses the technology to put patrols at certain times of day on certain streets in some cities around the world and in some states of USA. It is also used to decide which prisoners are too dangerous to release and which will be released conditionally (Cukier & Mayer-Schonberger 2013). As with every great opportunity, there are some drawbacks too. Big data can be rather creepy. The authors discuss the issue of privacy. In chapter 8 of the book, they argue that big data destroying privacy and intimidating freedom. Cukier and Mayer-Schönberger come to an agreement on the following chapter about how the elevations of big data can be seen without losing privacy. Their sparkling book leaves no doubt that big data is the next big thing! 8
13 4. Big Data and official statistics "Big Data is an increasing challenge. The official statistical community needs to better understand the issues, and develop new methods, tools and ideas to make effective use of Big Data sources" (UNECE ). Apart from building new opportunities in the private sector, big data could also be a very interesting input for official statistics; either used on its own, or combined with traditional data sources such as sample surveys and administrative registers. Otherwise, the private sector may benefit more of the big data era by producing more and more statistics that even beat official statistics. It is doubtful that national statistics offices will lose the official statistics characteristic but they could risk losing their position and importance as the time passes even with all precision, reliability and interpretability of the statistics produced in these national offices. However, selecting the information from big data and fitting it into a statistical production process is not easy (UNECE 2011) Big Data at Statistics Sweden Big data exists at Statistics Sweden, SCB. Statistics Sweden uses data from cash registers for calculating the Consumer Price Index (CPI) since The data comes weekly from more than 300 supplies. Jansson and Isaksson (2013), point out that the data is modified before arriving at Statistics Sweden. They emphasize that the big data that enters statistics Sweden, has been reduced in volume and comes in structured form and although the data is produced rapidly they come in fixed time intervals like once a week, or once a month and so on (Jansson & Isaksson 2013). According to Jansson & Isaksson (2013), big data at Statistics Sweden is used as a complement or as auxiliary data for traditional sources in order to get more exhaustive and/or cheaper data. The data is even used for modelling in some cases. Furthermore, they underline the fact that these kinds of data have not been used for direct analysis or rapid estimates, and either have they required a complete redesign or extra production systems so far. But they still vary from the traditional data sources. 2 United Nations Economic Commission for Europe. 9
14 The suppliers of big data in Statistics Sweden are either firms (stores that sell the goods of interest) in Sweden or companies providing sensor data and credit card information which are located overseas. Neither of these information providers is expected to consider the needs of Statistics Sweden. They are not even expected to report changes in their datasets which in long run might have thoughtful negative effects on the future ability of producing official statistics using time series data (Jansson & Isaksson 2013). Statistics Sweden is interested in future use of big data and therefore is taking part of an association called The Swedish Big Data Analytics Network 3. The emphasis is mainly to support the possibilities of big data by researches, enhanced substructures, ability structures and other key elements for future progress (Jansson & Isaksson 2013). In addition of their rapport, note Jansson & Isaksson, about the idea of using electricity as an accompaniment to housing statistics but there is a need for improvement before taking any action (Jansson & Isaksson 2013). The idea is similar to the use of electricity consumption data in Ireland. They estimate 1.5 million to 2.2 billion records monthly. It benefits to the improvement of household register, which in its turn leads to a better estimation of the electricity usage (Dunne 2013). The project included using the time series of electricity usage between July 2009 and January 2011 for around 6000 monitoring meters placed around Ireland. The goal of the project was to describe the electricity consumption behaviour and predict the electricity usage in Ireland (Silipo & Winters 2013) Big Data at other agencies Big data is a BIG issue for statistical agencies around the world, especially at Statistics Netherlands (CBS) (Jansson & Isaksson 2013). Statistics Netherlands investigated both the possibilities and the futility of big data. They have analysed 4 data from traffic investigation 3 The purposes are chiefly to: highlight the recent and increasing importance of advanced analysis of very large data sets in society and business, and the excellent position of Sweden to potentially be at the forefront in this area by leveraging national areas of strength in research and business development; to address the limiting factors that hinder us from realising this potential; and to propose national efforts for remedying these factors and creating a fertile ground for future businesses, services, and societal applications based on Big Data Analytics. (The Swedish Big Data Analytics Network 2013, pp. 2). 4 At almost 13,000 locations, the number of vehicles per minute, their speed, and their length was measured. All the data from all the locations during one day was used for analysis and it was concluded that, despite issues with missing data and noise, the data gave useful information about traffic flows and types of vehicles. Social 10
15 and from social media (Daas et al 2013). Population distribution and movement could be known by analysing mobile phone call activity data. However, representativeness of the data should be considered (De Jong et al (2012). It was after an earthquake in Christchurch, New- Zealand, in 2011, that data of mobile phone were used to observe the population activities following the earthquake. Those data made it possible to report the movement of the people in order to know where in the country help was most needed (Statistics New Zealand 2012). At the Nordic Chief Statisticians meeting in Bergen in August 2013, big data was discussed as one of the hottest subjects. It showed that the Nordic countries do not fully agree about the characteristic features of big data, and there is not a policy for big data made yet, but it is on everyone s schedule. The history of administrative data sources is way too old in Nordic countries which count as a valuable experience proving the usefulness of big data (Jansson & Isaksson 2013). There are as well a lot of big data discussions taking place at different levels in Eurostat and the UNECE (Jansson & Isaksson 2013). The Director Generals of the National Statistical Institutes within the EU acknowledge that Big Data represent new opportunities and challenges for Official Statistics, and therefore encourage the European Statistical System and its partners to effectively examine the potential of Big Data sources in that regard. Further, he adds: recognise that Big Data is a phenomenon which is impacting on many policy areas. It is therefore essential to develop an Official Statistics Big Data strategy and to examine the place and the interdependencies of this strategy within the wider context of an overall government strategy at national as well as at EU level (DGINS 2013). There are other aspects on big data. The plan is to adopt an action and a road map. There is a project going on during 2014 within the UNECE (Jansson & Isaksson 2013). media data were used to analyse the sentiment of the Dutch people, giving results that were highly correlated with official numbers compiled by traditional methods. A separate study of Twitter messages showed that the data contained a lot of noise. A number of methodological problems were identified through the above projects, but the data sources were still viewed as useful (Daas et al 2012). 11
16 5. Methods for inference According to UNECE (2011), big data has the potential to produce more appropriate and suitable statistics than traditional sources of official statistics. Official statistics has long been relied on survey data collections and administrative data 5, which is different from big data where most data are freely available, or with private companies. When the velocity of data generating process increases 6, administrative data becomes Big. Including relevant big data sources into official statistics process, National Statistics Offices, makes a higher accuracy and confirms the consistency of the output (UNECE 2011). As mentioned in the previous parts of this paper, big data is mostly unstructured, meaning that there is not any predefined model and/or it is not as the usual databases forms (UNECE 2011). Traditional indexes are predesigned with a limited search query where as big data does come in any form but structured and searchable. This huge amount of data of varying types and quality does not fit into neatly defined categories. The most common databases has for a long time been SQL, Structured Query Language, but the data-tsunami in recent years has led to something called nosql, which does not require the same demands as SQL databases. It accepts data of all types and sizes and makes the data into searchable form (Cukier & Mayer- Schönberger 2013). However, data-picking from big data and fitting it into a statistical production process is challenging (UNECE 2011) Selectivity In a finite population, a sample data is representative in terms of some variable if the variable of interest has the same distribution as in the population. All the other subsets are known as selective samples. It is much easier to work with representative subsets and they give an unbiased inference about the whole population but this is not the case with selective samples (Buelens et al. 2014). 5 Administrative data is one of the main data sources used by National Statistics Office (NSO) for statistical purposes. Administrative data is collected at regular periods of time by statistical offices and is used to produce official statistics. Traditionally, it has been received, often from public administrations, processed, stored, managed and used by the NSOs in a very structured manner. (UNECE 2011) 6 For instance using administrative data where data is collected daily or weekly instead. 12
17 One of the concerns that rise with big data is if it is representative. As discussed in part two of this study, big data is usually an infinite population and the reference population is not clear. The questions arises as what is the population, who generates the data and if we can draw a sample and achieve population properties? In traditional method and probability sampling, the focus is to get a representative sample for the population of interest. It is done with help of evolving a survey design that is expected to give a representative sample. Approximation theory in sample surveys is built on the representativeness assumption (Buelens et al. 2014). This assumption is invalid when using big data. As in big data, correlations may reflect what is happening but statistical inference are not possible to use (Cukier & Mayer-Schonberger 2013). There are some methods developed for correcting errors from representativeness, for example errors that caused by selective-nonresponse. The Generalised Regression Estimator (GREG) 7 is used at Statistics Netherlands currently (Bethlehem, Cobben & Schouten 2011). Classical estimation methods are essentially grounded on survey design, and are known as design-based methods. Unless the dataset covers the whole population of interest, it is uncertain that the data are representative when a data set collects through some other way than random sampling. Therefore, when using big data source in official statistics, the issue of selectivity needs to be considered (Buelens et al. 2014) Method Big data can be part of the production of official statistics. As discussed in previous part, selectivity of big data could pose problem depending on how the data are used (Buelens et al 2014). In the discussion paper, Selectivity of Big data, Buelens et al (2014), discuss four different cases where big data is used as information resources in production of official statistics. The first case is where big data are the only source of data used for the production of some statistics. With this background, well assessing and choosing of the data is crucial and the 7 A model assisted estimator designed to improve the accuracy of the estimates by means of auxiliary information. 13
18 more important is taking care of selection bias through choosing a suitable method of inference (Buelens et al 2014). Buelens et al (2012) argue the importance and power of the right method of inference that could overcome the problem of representativeness (Buelens et al. 2012). Model-based and algorithmic methods are designed to predict parameter values for unobserved parts, and are usually encountered in data mining and machine learning contexts (Hastie et al., 2003). Although selecting a proper method and validating its assumptions in detailed situations is not a straightforward task to do (Baker et al. 2013), but also there are limits in what can be achieved when correcting selectivity. The results will still be biased if particular subpopulations are fully missing in the big data set. According to Statistics Netherlands, none of the big data sources contains identifying variables and so far it has been impossible to link big data sources to register databases and therefore an assessment and correction for selectivity problem has not achieved yet (Buelens et al. 2014). Buelens et al (2014) consider using big data as auxiliary data in a process largely based on sample survey data as the second case where statistics based on big data are purely used as a covariate in model-based estimation methods applied to the traditional survey sample data. By doing so, the sample size reduces which in its turn leads to cost reduction and reduction of the non-response error. This idea was discovered when data from GPS tracking devices were used to measure connectivity between geographical areas. The degree to which an area is connected to other areas was found to be a good predictor of the variable in interest (in their case poverty). This means that big data in GPS tracks can be used as a predictor for survey based measurements. A risk with this method is the instability of the big data source over time, or the exhibition of sudden changes due to technical upgrades or other unexpected circumstances. This is a classic problem for secondary data sources that has even been observed in administrative data (Buelens et al 2014). Next case concerns the aspects of the big data application that can be used as a data collection strategy in sample surveys, for example the geographic location data collected over GPS devices in smartphones to movement s range, where only parts of data that have been selected by means of a probability sample, are observed (Arends et al. 2013). Schutt and O Neil (2013) claim that the smartphone and in-built tracking devices are replacing the traditional survey, but all elements of survey sampling and the connected estimation methods remain 14
19 valid. The size of data set collected in this way is not necessary big, but contains a number of properties of typical big data sets (Schutt and O Neil, 2013). Buelens et al (2014) mentions the fourth case as using big data regardless of selectivity complications. It is argued that the statement about the resulting statistics allowing bearing to an uncovered population by the big data source is false. However, such statistics may be of interest and may enhance the official publications of official statistics (Buelens et al 2014). It is also important to have in mind that the utility of Internet as a data source is not essentially a source for new statistics, but rather has the potential of improving existing statistics. There are some considerable problems, such as problems with double counting, sorting, causality, estimation and in particular representativeness (Heerschap 2013). Furthermore Buelens et al (2014) argue about internet searches being selective because not everyone in the population of interest uses the internet, and not all of them use Google as a search engine, and the most importantly that not everybody who looks for information does so through the internet or Google (Buelens et al. 2014). As the cost of collecting data and acquisition decreases quickly, the importance of big data will increase. Companies creating and implementing big data approaches an inexpensive gain. Big data methods need to find a place in official statistics and the focus needs to cover beyond using big data to answer known problems, to try to find out patterns that could help making decisions and opportunities that could never have imagined before (Parise, Iyer & Vesse 2012). Dunne (2013) suggests that organising big data into a large number of groups or pools of data could be a solution to dealing with big data streams. This way, the data are easily convenient in the traditional processing ways. The effective way to attain this is to know the volume and total of the groups being available, the capacities in which the data is processed and if it is necessary to keep the original data once processed (Dunne 2013). 15
20 6. Challenges There are some drawbacks attached to the promising valuable assets of big data. Questions about the analytical value and policy issues are raised as an effect of big data. There are concerns over the data being representative as well as its reliability together with the overarching privacy issues of using personal data (Cukier & Mayer-Schonberger 2013). Along with big data come computational challenges as well. Despite finding a way to generate manageable structured data from unstructured data, statistical analyses tolls such as R and SAS must be integrated to be able to process big data. Furthermore, there is also another reason to worry; it is the risk of too many correlations. If correlations between two variables are looked at over 100 times, there will be a risk of finding, unintendedly, about five false correlations that appear statistically significant 8 even when there is no real significant connection between the variables. Lacking careful control can seriously increase such errors (Cukier & Mayer-Schonberger 2013). Some of the dark sides of big data are explored in the following parts Data Along with big data comes a very old problem: relying on the numbers when they are far more fallible than we think (Cukier & Mayer-Schönberger 2013). Management and the ability to analyze data have always obtained high benefits and great challenges for systems of all sizes and types. Capturing information about customers, products, and services, are valuable for businesses. Indeed, a lot of complexity comes along with data. Some data are structured and kept in a traditional database, while other data are unstructured. For instance, it would be much easier if all the customers always bought the same products in the same way but that is far from reality. Companies and sale markets have developed with time and are complicated. To overcome the complexity of data, more product lines was added to the list and that was how the data become Big. Data difficulties are not limited to sale markets only. Research and development (R&D) organizations, are an 8 Type I error in hypothesis testing 16
21 example of others whom struggle to get sufficient computing control in order to use scientific data and run sophisticated models on it (Hurwitz et al. 2013). It is also important to consider other new sources of data produced by machineries such as sensors or the huge amount of information that are generated by humans, for example data from social media (Hurwitz et al. 2013). In addition, as is discussed in the previous parts, the world is oversupplied with information as newer, more powerful mobile devices are being available and the access to global networks increases, which will drive the creation of new sources for data. While each data source could independently be managed, the challenge is how analytics can interpret and manage the intersection of all these different sorts of data. In the big data case, because of the volume, it is impossible to manage data in traditional ways. Although there have been huge databases in traditional data registers, the difference with big data is that it varies in type and timeliness considerably (Hurwitz et al. 2013). In many circumstances relating big data, unsystematic failures and data loss is an issue. According to Justin Erickson, senior product manager at Cloudera 9 If I m bringing data in from many different systems, data loss could skew my analysis pretty dramatically, When you have lots of data moving across multiple networks and many machines, there s a greater chance that something will break and portions of the data won t be available. (Barlow 2013) Privacy Privacy is a currency that we all now routinely spend to purchase convenience (Frankel 2013). Along with regulation, that concerns privacy particularly in Europe, user have the option to choose what information of them is being collected when they go online. This is based on the issue of instability in the changing behavior of people using social media and other websites. 9 Cloudera Inc. is an American-based software company that provides Apache Hadoop-based software, support and services, and training to business customers. Source: wikipedia 17
22 As information collection and using big data becomes more known, people concern more about sharing private information liberally (Couper 2013). Wilson, Gosling, and Graham (2012) state the changes in Facebook privacy settings over time in their paper. They conclude that an increase in not allowing cookies, changes the amount and type of information shared. The progress of implements giving users control over what is shared makes it possible for events to hide. They even discuss how Microsoft got negative reaction from advertisers for making the do not track option as the default in the Internet Explorer 10 browser (Wilson, Gosling & Graham 2012). Big data has raised privacy concerns, which relates to the ways of collecting data and the use of data by governments for national security purposes. There are also concerns stated about the profit-making and other non-commercial uses of big data (Lenard & Rubin 2013). Edith Ramirez 10 argues the privacy of big data at her first major speech as (Ramirez 2013): the challenges big data poses to privacy are familiar, even though they may be of a magnitude we have yet to see. Further, she adds (Ramirez 2013): the solutions are also familiar, and, with the advent of big data, they are now more important than ever. Chairwoman Ramirez s speech brings out the question of if big data is associated with new privacy issues and a related surge in the need for government action. She argues the risks associated with identity faking and data stealing which increases with big data (Lenard & Rubin 2013). Lenard & Robin (2013) interpreted Ramirez speech as: It suggests that we should look to the familiar solutions the Fair Information Privacy Practices (FIPPs) involving notice and choice, use specification and limits, and data minimization to solve any privacy problems brought about by big data (Lenard & Rubin 2013). In theory, identity faking and information breaking could be raised or be dropped by big data. These safety issues specify a market fiasco because of the difficulty of imposing charges on 10 Edith Ramirez was sworn in as a Commissioner of the Federal Trade Commission on April 5, 2010, to a term that expires on September 25, She was designated to serve as Chairwoman of the Federal Trade Commission effective March 4, 2013, by President Barack H. Obama. Källa: 18
23 the committers. However, data holders have strong incentives to protect their data, while these data themselves could be useful in avoiding fraud (Lenard & Rubin 2013). However, according to Lenard and Robin (2013), there is no sign that identity fraud has gone up with the appearance of big data. Actually, it is more expected that the use of big data would decrease identity fraud. For instance credit card companies, who endure most of the costs, make it more secure for their consumers by observing their purchases and informing them when their consumptions seem to be outside of normal activities, which is exactly a type of analysis of big data. It is important to notice that this policing includes use of data for purposes other than the original reason of data collection (Lenard & Rubin 2013) Analysis The use of big data comes with a number of analytical challenges. The weight and severity of those analytical challenges differs depending on the type of data, the type of analysis being conducted, and on the type of outcome (UN Global Pulse 2012). The main aspect of researches and hypothesis-based study is the question what is the data really telling us? (UN Global Pulse 2012), but when analyzing big data, the first question that surges is what problems are we trying to solve? It is hard to know what to look for in this huge amount of data that can give valuable insight without doubt, but patterns can appear from that data before understanding why they are there (Hurwitz, Nugent, Halper & Kaufman 2013). There is an comprehension that new digital data sources poses more challenges. Therefore, these concerns must be brought out in an entirely clear way (UN Global Pulse 2012). UN Global Pulse (2012), in their article Big Data for Development: Challenges & Opportunities puts the challenge into three separate categories: Observing the whole picture right, i.e. summarizing the data Understanding and interpreting the data through inferences defining and detecting anomalies Big data is at its finest when analyzing common things, but often not as perfect when analyzing unordinary things. For example, programs such as search engines and translation programs that deal with text using big data, often are dependent on something called 19
24 trigrams 11. As they often appear in texts, consistent statistical material can be collected about common trigrams. But it is impossible that an existing form of data will ever be sufficient to include all the possible trigrams that people use and that is mostly because of the lasting ingenuity of language (Marcus & Davis 2014). Another challenge when analyzing big data is that the data processing cycles in most the cases happens in real time. A fast evolving universe of fresh technologies has made the time in which data is processed reduce dramatically. That allows exploring and experimenting data in ways that would never been practical or even possible before. Despite the availability of new implements and methods for dealing with enormous volumes of data at unbelievable speeds, the real promise of advanced data analytics lies beyond technology (Barlow 2013). Real-time big data isn t just a process for storing petabytes or exabytes of data in a data warehouse, It s about the ability to make better decisions and take meaningful actions at the right time. It s about detecting fraud while someone is swiping a credit card, or triggering an offer while a shopper is standing on a checkout line, or placing an ad on a website while someone is reading a specific article. It s about combining and analyzing data so you can take the right action, at the right time, and at the right place. states Michael Minelli, co-author of the book Big Data, Big Analytics (Minelli, Chambers & Dhiraj 2013). But How fast is fast? What is the meaning of real-time? Definition can vary depending on the situation in which big data is used. In theory, real-time stands for the ability of processing data the same time it arrives meaning the present rather than in the future. This makes the step of storing the data and saving it for later use disappear. However, the present also is defined differently in different perspectives (Barlow 2013). Joe Hellerstein, chancellor s professor of computer science at UC Berkeley says, Real time is for robots, if you have people in the loop, it s not real time. Most people take a second or two to react, and that s plenty of time for a traditional transactional system to handle input and output. Barlow (2013) argues that the impression of Mr. Hellerstein does not mean that the pursuit for speed is abandoned. For instance, Spark, an open source cluster computing system that is possible to be programmed fast and runs quickly, relies on resilient distributed datasets (RDDs) and it is used to search 1 to 2 terabytes of data in not more than 11 Sequences of three words in a row (like in a row ). 20
25 one second. Twitter, a social site, uses Storm, an open source low latency processing stream, in order to detect correlation in almost real time. The idea is to know some individual s interest in almost real time. For example, if someone tweets about himself going snowboarding, Storm helps Twitter to figure out the most appropriate ad for that person at one that right time (Barlow 2013). 21
26 7. Discussion & Conclusion There is a big data revolution says Weatherhead University Professor Gary King. But it is not about the quantity of data. The big data revolution is that now we can do something with the data (Shaw 2014). Big data yet does not have a fixed definition. In fact, it is still debated. It is usually defined by the Vs: Volume, which is mainly the jump from Terabyte to Gigabyte! Velocity means high speed of data in and out! Meaning in what speed the data is being received and with which speed it will be used. Different businesses might have different requests depending on the type of industry. Variety as the data being unstructured, coming in different formats, and is not easily integrated. This is considered as the biggest issue with big data because every data source requires its very own set of managing. Veracity as big data flows highly unreliable. Big data is complex. It is messy. It requires data cleansing, linking, and matching the data across systems. Training is required in order to get best results when using big data. Since the birth of big data, it is been focused on getting a definition for this new exciting phenomenon. The Vs define what big data is, but do not show a solution about the capacity of big data. Maybe now is the time to master it and to emphasis on the how and the why of big data. The amount of data will continue to grow and so will the tolls processing it; but big data has been considered as a technological substance, focusing on the hardware and software. The weight now needs to be put on what is the data telling us. With such a new non-normal type of data, there will be needs for new computational and analytical methods. It is important to consider if this data overflow makes scientific methods outdated. Today official statistics depends on classical statistical methods. The question that rises is if social science data models and methods are obsolescent for the age of big data. So is big data big news for official statistics or rather a big mess? What happens when big data and official statistics meets? Could big data replace traditional data sources? The answer is not clear yet as big data is not a reliable source at this moment. It has weaknesses in for example non-representativeness and/or unreliability. However, big data has this huge potential of being faster and cheaper. This new data source could replace traditional sources; it is just a matter of time. 22
27 Data-mining with multiple sources of data give new understandings and are improving data sources in Official Statistics. New emphasis as multi-mode data collection, in data sources have been done in official statistics around the world, like the combination of Internet based surveys and administrative sources. There are been emphasis on surveys and traditional approaches. Dunne (2013) believes that big data is coming into Official statistics. He argues that it can begin with sensor and transactional type of data and the reason is that the populations in sensor and transaction type data are possible to be defined, just in the same way as the underlying populations in administrative data sources are defined. One of the other main concerns with big data is if informational privacy law survives big data. Big data will be innovative, but this revolution must be consistent with standards that have long valued privacy. The privacy issue applies to everything from customer information to predicting sensitive health situations. Big data will stick out more and more in years to come. It is here to stay! Big data is not important on itself; slightly it is a byproduct and it might be an end to how we solve problems. Therefore, it is essential to observe big data and real-time analytics as an important resource and not as some modern magic silver bullet for the traditional improvement challenges. However, the flow of data learning establishes a frank opportunity to carry powerful new gears to global development. It is the statisticians and social scientists job to take benefit of this new data source. A makeover on computation and quantitative analytical skills is needed. The ultimate goal is to get a better vision and understanding to help the global development to fight against poverty, hunger and disease. It is a camp between truth and falsehood. National statistic offices have always been challenged to improve their statistics for a better society. Now as big data is about to enter the official statistical world, the task is to be accelerative and open to use nontraditional data sources. There has been a lot of hype about big data, which is used mainly in commercial, and security applications. However, as Andreessen argued, Software is eating the world in his paper in 2011, I would now say that in short, big data will rule the world!! 23
28 References Anderson, Chris. (2008). The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired Magazine, 16.07, July magazine/16-07/pb_theory. (accessed ). Andreessen, M. (2011). Why Software Is Eating The World. Essay, August 20, (accessed ) Barlow, Mike. (2013). Real-Time Big Data Analytics: Emerging Architecture. Published by O Reilly Media, Inc. Feb (accessed ) Bethlehem, J., Cobben, F. & Schouten, B. (2011). Handbook of nonresponse in household surveys. Wiley. Buelens, B., Boonstra, H.J., Van den Brakel, J., Daas, P. (2012). Shifting paradigms in official statistics: from design-based to model-based to algorithmic inference. Discussion paper, Statistics Netherlands, The Hague/Heerlen. Buelens, B., Dass, P., Burger, J., Puts, M., van den Brakel, J. (2014). Selectivity of Big data. Discussion paper, Statistics Netherlands A8E8316CFEF0/0/201411x10pub.pdf (accessed ). Cheung, P. (2012). Big Data, Official Statistics and Social Science Research: Emerging Data Challenges. Presentation at the December 19th World Bank meeting, Washington. (accessed ) Couper, Mick P. (2013). Is the Sky Falling? New Technology, Changing Media, and the Future of Surve. Survey Research Methods (2013) Vol. 7, No. 3, pp Survey Research Center, University of Michigan Cukier, Kenneth. & Mayer-Schonberger, Viktor. (2013). Big Data: A Revolution That Will Transform How We Live, Work, and Think. Daas, P.J.H., Puts, M.J. Buelens, B. and van den Hurk, P (2012). Big Data and Official Statistics. Sharing Advisory Board, Software Sharing Newsletter, 7, (accessed ) Director Generals of the National Statistical Institutes (DGINS) (2013). Scheveningen Memorandum Big Data and Official Statistics M%20Final%20version.pdf (accessed ) Dunne, J. (2013). Big data coming soon to an NSI near you. 59th ISI World Statistics Congress, Hong Kong August (accessed ) Edith Ramirez. (2013). The Privacy Challenges of Big Data: A View from the Lifeguard s Chair. Speech at Technology Policy Institute s Aspen Forum. (accessed ) Energy-Facts.org (2012). Rise of the Machines & the Explosion of Data. (accessed ) 24
29 Frankel, Max. (2013). Where Did Our Inalienable Rights Go?. [Published: June 22, 2013]. (accessed ) Hastie, T., Tibshirani, R. & Friedman, J. (2003). The elements of statistical learning; data mining, inference, and prediction, Second Ed., Springer. Heerschap, N. (2013). Internet as a new source of information for the production of official statistics. Experiences of Statistics Netherlands. 59 th ISI World Statistics Congress, Hong Kong August (accessed ) Hurwitz, J., Nugent, A., Halper, F. & Kaufman, M. (2013). Big Data For Dummies. (accessed ) IBM (2012). Big data: Why it matters to the midmarket. artpdf (accessed ) Jansson, Ingegerd. & Isaksson, Annica. (2013). Big Data in Official Statistics Production. Paper for discussion, Advisory Scientific Board, SCB Lenard Thomas M. & Rubin Paul H., The Big Data Revolution: Privacy Considerations (dec 2013) (accessed ) Manyika, James., Chui, Michael., Brown, Brad., Bughin, Jacques., Dobbs, Richard., Roxburgh, Charles. & Hung Byers, Angela. (2011). Big data: The next frontier for innovation, competition, and productivity. (accessed ). Marcus, Gary. & Davis, Ernest. (2014). Eight (No, Nine!) Problems With Big Data. (accessed ) Minelli, Michael., Chambers, Michele. & Dhiraj, Ambiga. (2013). Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses Parise, Salvatore., Iyer, Bala. & Vesset, Dan. (2012). Four strategies to capture and create value from big data. (accessed ) Schutt, Rachel. & O'Neil, Cathy. (2013). Doing Data Science; Straight Talk from the Frontline. Shaw, Jonathan. (2014). Why Big Data Is a Big Deal. (accessed ) Silipo, Rosaria. & Winters, Phil. (2013). Big Data, Smart Energy, and Predictive Analytics: Time Series Prediction of Smart Energy Data. (accessed ) Statistics New Zealand (2012). Using cellphone data to measure population movements. Wellington: Statistics New Zealand. (accessed ) 25
30 UNECE - United Nations Economics Commisions for Europe, Conference of European Statistions(2013). What does big data mean for official statistics?, (accessed ). UN Global Pulce (2012). Big Data for Development: Challenges & Opportunities, (accessed ). United Nations Economics Commisions for Europe - UNECE. (2012). (accessed ) Wakefield, Kylie Jane. (2014). How alternative energy companies use big data. The latest monitors can help homeowners track their energy consumption in greater detail than before. Tech Page One. (accessed ) Wilson, R., Gosling, S. & Graham, L. (2012). A review of Facebook research in the social sciences. Perspectives on Psychological Science, 7(3),
Selectivity of Big data
Discussion Paper Selectivity of Big data The views expressed in this paper are those of the author(s) and do not necessarily reflect the policies of Statistics Netherlands 2014 11 Bart Buelens Piet Daas
Big Data, Official Statistics and Social Science Research: Emerging Data Challenges
Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of
Collaborations between Official Statistics and Academia in the Era of Big Data
Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI [email protected] What
Meeting with the Advisory Scientific Board of Statistics Sweden November 12, 2013
Advisory Scientific Board Ingegerd Jansson Suad Elezović Notes November 12, 2013 Meeting with the Advisory Scientific Board of Statistics Sweden November 12, 2013 Board members Stefan Lundgren, Statistics
Big Data (and official statistics) *
Distr. GENERAL Working Paper 11 April 2013 ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE (ECE) CONFERENCE OF EUROPEAN STATISTICIANS ORGANISATION FOR ECONOMIC COOPERATION AND DEVELOPMENT (OECD)
Statistical Challenges with Big Data in Management Science
Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Research Note What is Big Data?
Research Note What is Big Data? By: Devin Luco Copyright 2012, ASA Institute for Risk & Innovation Keywords: Big Data, Database Management, Data Variety, Data Velocity, Data Volume, Structured Data, Unstructured
BIG DATA FUNDAMENTALS
BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
Big Data Introduction, Importance and Current Perspective of Challenges
International Journal of Advances in Engineering Science and Technology 221 Available online at www.ijaestonline.com ISSN: 2319-1120 Big Data Introduction, Importance and Current Perspective of Challenges
Big Data. What is Big Data? Over the past years. Big Data. Big Data: Introduction and Applications
Big Data Big Data: Introduction and Applications August 20, 2015 HKU-HKJC ExCEL3 Seminar Michael Chau, Associate Professor School of Business, The University of Hong Kong Ample opportunities for business
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Outline. What is Big data and where they come from? How we deal with Big data?
What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
WHAT DOES BIG DATA MEAN FOR OFFICIAL STATISTICS?
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS 10 March 2013 WHAT DOES BIG DATA MEAN FOR OFFICIAL STATISTICS? At a High-Level Seminar on Streamlining Statistical Production
Data Centric Computing Revisited
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
Big Data Hope or Hype?
Big Data Hope or Hype? David J. Hand Imperial College, London and Winton Capital Management Big data science, September 2013 1 Google trends on big data Google search 1 Sept 2013: 1.6 billion hits on big
Opportunities and Limitations of Big Data
Opportunities and Limitations of Big Data Karl Schmedders University of Zurich and Swiss Finance Institute «Big Data: Little Ethics?» HWZ-Darden-Conference June 4, 2015 On fortune.com this morning: Apple's
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
!!!!! BIG DATA IN A DAY!
BIG DATA IN A DAY December 2, 2013 Underwritten by Copyright 2013 The Big Data Group, LLC. All Rights Reserved. All trademarks and registered trademarks are the property of their respective holders. EXECUTIVE
Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013
Annex: Concept Note Friday Seminar on Emerging Issues Big Data for Policy, Development and Official Statistics New York, 22 February 2013 How is Big Data different from just very large databases? 1 Traditionally,
Big Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait
CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems
Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014
What is Big Data? Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 Data in the Twentieth Century and before In 1663,
Analyzing Big Data: The Path to Competitive Advantage
White Paper Analyzing Big Data: The Path to Competitive Advantage by Marcia Kaplan Contents Introduction....2 How Big is Big Data?................................................................................
PLA 7 WAYS TO USE LOG DATA FOR PROACTIVE PERFORMANCE MONITORING. [ WhitePaper ]
[ WhitePaper ] PLA 7 WAYS TO USE LOG DATA FOR PROACTIVE PERFORMANCE MONITORING. Over the past decade, the value of log data for monitoring and diagnosing complex networks has become increasingly obvious.
Big Data Readiness. A QuantUniversity Whitepaper. 5 things to know before embarking on your first Big Data project
A QuantUniversity Whitepaper Big Data Readiness 5 things to know before embarking on your first Big Data project By, Sri Krishnamurthy, CFA, CAP Founder www.quantuniversity.com Summary: Interest in Big
The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics
The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY WITH PRACTICAL OUTCOMES
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
Why your business decisions still rely more on gut feel than data driven insights.
Why your business decisions still rely more on gut feel than data driven insights. THERE ARE BIG PROMISES FROM BIG DATA, BUT FEW ARE CONNECTING INSIGHTS TO HIGH CONFIDENCE DECISION-MAKING 85% of Business
At a recent industry conference, global
Harnessing Big Data to Improve Customer Service By Marty Tibbitts The goal is to apply analytics methods that move beyond customer satisfaction to nurturing customer loyalty by more deeply understanding
What happens when Big Data and Master Data come together?
What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information
Information Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
BIG DATA I N B A N K I N G
$ BIG DATA IN BANKING Table of contents What is Big Data?... How data science creates value in Banking... Best practices for Banking. Case studies... 3 7 10 1. Fraud detection... 2. Contact center efficiency
Generating the Business Value of Big Data:
Leveraging People, Processes, and Technology Generating the Business Value of Big Data: Analyzing Data to Make Better Decisions Authors: Rajesh Ramasubramanian, MBA, PMP, Program Manager, Catapult Technology
The big data revolution
The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
A Future Without Secrets. A NetPay Whitepaper. www.netpay.co.uk www.netpay.ie. more for your money
A Future Without Secrets A NetPay Whitepaper A Future Without Secrets The new business buzz word is Big Data - everyone who is anyone in business is talking about it, but is this terminology just another
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
The Promise of Industrial Big Data
The Promise of Industrial Big Data Big Data Real Time Analytics Katherine Butler 1 st Annual Digital Economy Congress San Diego, CA Nov 14 th 15 th, 2013 Individual vs. Ecosystem What Happened When 1B
Business white paper. Lower risk and cost with proactive information governance
Business white paper Lower risk and cost with proactive information governance Table of contents 3 Executive summary 4 Information governance: the new business imperative 4 A perfect storm of information
The 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Anuradha Bhatia, Faculty, Computer Technology Department, Mumbai, India
Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Real Time
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
The Directors Cut. The power of data: What directors need to know about Big Data, analytics and the evolution of information. www.pwc.
www.pwc.com/ca/acconnect The Directors Cut The power of data: What directors need to know about Big Data, analytics and the evolution of information December 201 This newsletter is brought to you by PwC
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
BUY BIG DATA IN RETAIL
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
Taming Big Data. 1010data ACCELERATES INSIGHT
Taming Big Data 1010data ACCELERATES INSIGHT Lightning-fast and transparent, 1010data analytics gives you instant access to all your data, without technical expertise or expensive infrastructure. TAMING
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
Why Modern B2B Marketers Need Predictive Marketing
Why Modern B2B Marketers Need Predictive Marketing Sponsored by www.raabassociatesinc.com [email protected] www.mintigo.com [email protected] Introduction Marketers have used predictive modeling
Big Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
ANALYTICS BUILT FOR INTERNET OF THINGS
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
Big Data Mining: Challenges and Opportunities to Forecast Future Scenario
Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Poonam G. Sawant, Dr. B.L.Desai Assist. Professor, Dept. of MCA, SIMCA, Savitribai Phule Pune University, Pune, Maharashtra, India
Airline Applications of Business Intelligence Systems
Airline Applications of Business Intelligence Systems Mihai ANDRONIE* *Corresponding author Spiru Haret University Str. Ion Ghica 13, Bucharest 030045, Romania [email protected] DOI: 10.13111/2066-8201.2015.7.3.14
CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE
CORRALLING THE WILD, WILD WEST OF SOCIAL MEDIA INTELLIGENCE Michael Diederich, Microsoft CMG Research & Insights Introduction The rise of social media platforms like Facebook and Twitter has created new
The 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS
BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
Big Data Strategies Creating Customer Value In Utilities
Big Data Strategies Creating Customer Value In Utilities National Conference ICT For Energy And Utilities Sofia, October 2013 Valery Peykov Country CIO Bulgaria Veolia Environnement 17.10.2013 г. One Core
White Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
Big Data and Open Data
Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! [email protected] dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of
Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach
Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are
Big Data how it changes the way you treat data
Big Data how it changes the way you treat data Oct. 2013 Chung-Min Chen Chief Scientist Info. Analysis Research & Services The views and opinions expressed in this presentation are those of the author
Turning Big Data into Big Decisions Delivering on the High Demand for Data
Turning Big Data into Big Decisions Delivering on the High Demand for Data Michael Ho, Vice President of Professional Services Digital Government Institute s Government Big Data Conference, October 31,
Understanding the impact of the connected revolution. Vodafone Power to you
Understanding the impact of the connected revolution Vodafone Power to you 02 Introduction With competitive pressures intensifying and the pace of innovation accelerating, recognising key trends, understanding
Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo
Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: [email protected].
BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB Email: [email protected] 1 *Views expressed do not reflect those at NSCB Outline
MCCM: An Approach to Transform
MCCM: An Approach to Transform the Hype of Big Data into a Real Solution for Getting Better Customer Insights and Experience Muhammad Salman Sami Khan, Chief Research Analyst, Global Marketing Team, ZTEsoft
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges
Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and
The Cloud for Insights
The Cloud for Insights A Guide for Small and Midsize Business As the volume of data grows, businesses are using the power of the cloud to gather, analyze, and visualize data from internal and external
TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES
Techniques For Optimizing The Relationship Between Data Storage Space And Data Retrieval Time For Large Databases TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL
WHITE PAPER OCTOBER 2014. Unified Monitoring. A Business Perspective
WHITE PAPER OCTOBER 2014 Unified Monitoring A Business Perspective 2 WHITE PAPER: UNIFIED MONITORING ca.com Table of Contents Introduction 3 Section 1: Today s Emerging Computing Environments 4 Section
Data Analytics in Organisations and Business
Data Analytics in Organisations and Business Dr. Isabelle E-mail: [email protected] 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:
