The emergence of big data technology and analytics



Similar documents
Big Data Using Cloud Computing

How To Handle Big Data With A Data Scientist

Big Data Are You Ready? Thomas Kyte

An Approach to Implement Map Reduce with NoSQL Databases

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

WELCOME TO THE WORLD OF BIG DATA. NEW WORLD PROBLEMS, NEW WORLD SOLUTIONS

Applications for Big Data Analytics

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Transforming the Telecoms Business using Big Data and Analytics

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

BIG DATA-AS-A-SERVICE

Big Data and Analytics (Fall 2015)

INTRODUCTION TO CASSANDRA

How To Use Big Data Effectively

Getting Started Practical Input For Your Roadmap

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Chapter 7. Using Hadoop Cluster and MapReduce

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Big Data: Big Challenge, Big Opportunity A Globant White Paper By Sabina A. Schneider, Technical Director, High Performance Solutions Studio

BIG DATA IN SUPPLY CHAIN MANAGEMENT: AN EXPLORATORY STUDY

A U T H O R S : G a n e s h S r i n i v a s a n a n d S a n d e e p W a g h Social Media Analytics

Are You Ready for Big Data?

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Architecting for the Internet of Things & Big Data

Big Data: Are You Ready? Kevin Lancaster

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Data Modeling for Big Data

So What s the Big Deal?

The Big Data Revolution And How to Extract Value from Big Data

NoSQL for SQL Professionals William McKnight

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Where is... How do I get to...

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Big Impacts from Big Data UNION SQUARE ADVISORS LLC

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Deploying Big Data to the Cloud: Roadmap for Success

International Journal for Research in Applied Science & Engineering Technology (IJRASET) A Review on Big Data Cloud Computing

Big Data Storage Architecture Design in Cloud Computing

TECHNOLOGY ANALYSIS FOR INTERNET OF THINGS USING BIG DATA LEARNING

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Secure Data Transmission Solutions for the Management and Control of Big Data

BIG Data. An Introductory Overview. IT & Business Management Solutions

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

REAL-TIME BIG DATA ANALYTICS

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Industry Impact of Big Data in the Cloud: An IBM Perspective

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Complex, true real-time analytics on massive, changing datasets.

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Analyzing Big Data: The Path to Competitive Advantage

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems

Problems to store, transfer and process the Big Data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

Big Data Use Cases Update

ANALYTICS BUILT FOR INTERNET OF THINGS

Introduction to Apache Cassandra

Scala Storage Scale-Out Clustered Storage White Paper

The Next Wave of Data Management. Is Big Data The New Normal?

NetApp Big Content Solutions: Agile Infrastructure for Big Data

Big Data a threat or a chance?

Proact whitepaper on Big Data

Reaping the Rewards of Big Data

Development of CEP System based on Big Data Analysis Techniques and Its Application

Hadoop Big Data for Processing Data and Performing Workload

Search and Real-Time Analytics on Big Data

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

A Brief Outline on Bigdata Hadoop

IoT and Big Data- The Current and Future Technologies: A Review

Big Data and Your Data Warehouse Philip Russom

Integrating Big Data into the Computing Curricula

Customized Report- Big Data

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Dr. John E. Kelly III Senior Vice President, Director of Research. Differentiating IBM: Research

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Big Data and Analytics in Government

What is Big Data? BCS Aberdeen Branch 6 November 2014

Investigative Research on Big Data: An Analysis

Transcription:

ABSTRACT The emergence of big data technology and analytics Bernice Purcell Holy Family University The Internet has made new sources of vast amount of data available to business executives. Big data is comprised of datasets too large to be handled by traditional database systems. To remain competitive business executives need to adopt the new technologies and techniques emerging due to big data. Big data includes structured data, semistructured and unstructured data. Structured data are those data formatted for use in a database management system. Semistructured and unstructured data include all types of unformatted data including multimedia and social media content. Big data are also provided by myriad hardware objects, including sensors and actuators embedded in physical objects, which are termed the Internet of Things. Data storage techniques used for big data include multiple clustered network-attached storage (NAS) and object-based storage. Clustered NAS employs storage devices attached to a network. Groups of storage devices attached to different networks are then clustered together. Object-based storage systems distribute sets of objects over a distributed storage system. Hadoop, used to process unstructured and semistructured big data, uses the map-reduce paradigm to locate all relevant data then select only the data directly answering the query. NoSQL, MongoDB, and TerraStore process structured big data. NoSQL data is characterized by being basically available, soft state (changeable), and eventually consistent. MongoDB and TerraStore are both NoSQL-related products used for document-oriented applications. The advent of the age of big data poses opportunities and challenges for businesses. Previously unavailable forms of data can now be saved, retrieved, and processed. However, changes to hardware, software, and data processing techniques are necessary to employ this new paradigm. Keywords: big data, scale-out network attached storage, data analytics, Hadoop, NoSQL Copyright statement: Authors retain the copyright to the manuscripts published in AABRI journals. Please see the AABRI Copyright Policy at http://www.aabri.com/copyright.html. The emergence of big data, page 1

BIG DATA IMPACTS BUSINESS ENTERPRISES Data are generated in a growing number of ways. Use of traditional transactional databases has been supplemented by multimedia content, social media, and myriad types of sensors (Manyika et al., 2011). Advances in information technology allow users to capture, communicate, aggregate, store and analyze enormous pools of data, known as big data (Manyika et al., 2011). However, the new data collection methodologies pose a dilemma for businesses that have depended upon database technology to store and process data. Big data derives its name from the fact that the datasets are large enough that typical database systems are unable to capture, save, and analyze these datasets (Manyika et al., 2011). The actual size of big data varies by business sector, software tools available in the sector, and average dataset sizes within the sector (Manyika et al., 2011). Best estimates of size range from a few dozen terabytes to many petabytes (Manyiak et al., 2011). In order to benefit from big data, new storage technologies and analysis methods need to be adopted. Business executives must determine the new technologies and methodologies best suited to their information needs. Business executives ignoring the growing field of big data will eventually become non-competitive. TYPES AND SOURCES OF BIG DATA Executives need to be cognizant of the types of data they need to deal with. There are three main types of data, regardless of whether or not a company is using big data unstructured data, structured data, and semistructured data. Unstructured data are data in the format in which they were collected; no formatting is used (Coronel, Morris, & Rob, 2013). Some examples of unstructured data are PDF s, e-mails, and documents (Baltzan, 2012). Structured data are formatted to allow storage, use, and generation of information (Coronel, Morris, & Rob, 2013). Traditional transactional databases store structured data (Manyika et al., 2011). Semistructured data have been processed to some extent (Coronel, Morris, & Rob, 2013). XML or HTMLtagged text are examples of semistructured data (Manyika et al., 2011). Business executives with traditional database management systems need to broaden their data horizons to include collection, storage, and processing of unstructed and semistructured data Data collection of unstructured and semistructured data is done through several internetbased technologies. Chui, Löffler, and Roberts (2010) describe sensors providing big data as being part of the Internet of Things. The Internet of Things is described as sensors and actuators that are embedded in physical objects that provide data through wired and wireless networks (Chui, Löffler, & Roberts, 2010). Some industries that are creating and using big data are those that have recently begun digitization of their data content; these industries include entertainment, healthcare, life sciences, video surveillance, transportation, logistics, retail, utilities, and telecommunications (Chui, Löffler, & Roberts, 2010). Devices generating data in these The emergence of big data, page 2

industries include IPTV cameras, GPS transceiver, RFID tag readers, smart meters, and cell phones (Chui, Löffler, & Roberts, 2010). BIG DATA STORAGE TECHNOLOGIES The ability to store massive amounts of data is a necessity for business executives to use big data. Two major means of storing big data are clustered network-attached storage (NAS), also called scale-out NAS, and object-based storage systems (Sliwa, 2011). Without a change to data storage technology, executives will not be able to collect big data. Scale-out NAS is built upon a traditional NAS system. NAS is a storage device that is based on a computer with no keyboard or mouse; this computer only serves as a device to retrieve data for users (White, 2011). To support the demands of big data, several NAS devices are connected, or clustered, and each NAS device can search through devices attached to the other NAS devices. As indicated in Figure 1 (Appendix), each NAS is attached to several storage devices, which the NAS is able to search. In turn this NAS pod is connected by a switch to another NAS pod which does the same function. Because the pods are connected through the switch, both pods can be searched for data by any client. Clients may be directly connected on a local network, a VPN, or somewhere on the cloud attached through a network. In object-based storage systems, users deal not with files but with sets of objects which are distributed over several devices (Wang, Brandt, Miller, & Long, 2004). Object-based storage systems provide high capacity and throughput as well as reliability and scalability, which are all needed for big data storage (Wang, Brandt, Miller, & Long, 2004). It is the layout of the objects themselves is what provides the efficiency of the storage and searching, rather than the configuration of the storage system as in scale-out NAS. BIG DATA ANALYTICS Storing big data is only part of the picture. Special techniques are needed to analyze big data. Executives need to become familiar with the big data methodologies, adopt the technology appropriate for their business, and ensure that employees develop skill with the technology. Data storage techniques differ depending on whether the data are unstructured or structured. Unstructured and semistructured data can be analyzed using software like Hadoop. Users analyzing structured big data can use software such as NoSQL, MongoDB, and TerraStore. Hadoop is based on a programming paradigm called MapReduce, as discussed in Google s 2004 paper on Hadoop (Eaton, Deroos, Deutsch, Lapis, & Zikopoulos, 2012). The name MapReduce comes from the two distinct tasks that the Hadoop program will perform using key-value pairs when a query is made (Eaton, Deroos, Deutsch, Lapis, & Zikopoulos, 2012). The mapping task is given a piece of data known as a key to search on, finds relevant values based on this key, and converts the key and values into another dataset query (Eaton, Deroos, The emergence of big data, page 3

Deutsch, Lapis, & Zikopoulos, 2012). The reducing task takes the final resultant output (the key and value combinations) from the mapping and reduces the output into a small dataset which answers the query (Eaton, Deroos, Deutsch, Lapis, & Zikopoulos, 2012). Hadoop works well in a scale-out NAS environment. The mapping task will search all possible datasets for the data being queried. Due to the size of the environment, this will produce a huge dataset for the output. The reduce task will analyze the dataset output from mapping and check that only data the directly answers the query is returned. For example, if the user queries the system for the highest sales amount for each of four sales people, the map task will search the system for all sales for the four sales people, and the reduce task will limit the output to the highest sales amount for each sales person. Researchers from Techaisle found that 73% of businesses in their study preferred using Hadoop because of its capability to process large volumes of big data (Business & Finance Week editors, 2012). Due to the volume of data stored, structured data can also be considered big data depending upon how it is stored (scale-out NAS or object-based storage). There are several different software options commonly used to analyze structured big data. NoSQL, which can mean either no SQL or not only SQL, is characterized by data that is Basically Available, Soft state, and Eventually consistent (BASE), rather than the traditional database data characteristics of Atomicity, Consistency, Isolation, and Durability (ACID) (Oracle, 2011). Data analyzed using NoSQL, therefore, is at times in a state of transition and may not be directly available; the data is in flux rather than set as in traditional database environments. MongoDb and TerraStore are both NoSQL-related products that are used for document-oriented applications such as storage and searching of whole invoices rather than the individual data fields from the invoice (Sasirekha, 2011). THE IMPORTANCE OF BIG DATA TO THE BUSINESS WORLD The importance of big data to business executives is derived from the data collected. Previously, executives relied solely on structured data collected and stored in a traditional database. Data collected from social media and the Internet of Things provides unstructured data that is constantly updated (Chui, Löffler, & Roberts, 2010). Analysis of these data will provide new information for executives that will enable them to maintain a competitive stance in their business environment. Thirty-four percent of business executives currently using business intelligence plan to employ big data analytics (Business & Finance Week editors, 2012). Manyika et al. (2011) propose five major contributions big data can make to businesses: 1) transparency creation, 2) performance improvement, 3) population segmentation, 4) decision making support, and 5) innovative business models, products, and services. Creating data transparency within a business enables data to be shared more easily among departments. For example, data from research and development, engineering, and manufacturing units within a business can be integrated to enable concurrent product engineering, reducing time to market and improving quality (Manyika et al., 2011). Big data can provide more accurate and detailed The emergence of big data, page 4

performance data in real-time or near real-time, allowing managers to analyze performance variability and understand causes of the variability (Manyika et al., 2011). While market segmentation has been used for years, big data can provide highly specific segmentations enabling production of tailored products and services (Manyika et al., 2011). Increasingly sophisticated analytics can be employed using big data to support decision makers in minimizing risks and finding new insights, thus improving the decision making process (Manyika et al., 2011). New products, services, and even business models can emerge from analysis of big data (Manyika et al., 2011). One example is use of real-time location-based data enabling property and causality insurance adjusters to price policies based on where and how people drive (Manyika et al., 2011). CONCLUSION Big data poses opportunities and challenges for businesses. Previously untapped sources of data are able to be stored and processed. Unstructured data previously available, such as invoice data, can be stored in a new, more convenient and meaningful format, and can employ text searching techniques. Data analytics will supplant the use of only structured queries of relational database management system. Benefits of big data use to business executives include enhanced data sharing through transparency, improved performance though analysis, augmented market segmentation, increased decision support through advanced analytics, and greater ability to innovate products, services and business models. Business owners need to follow trends in big data carefully to make the decision that fits their businesses. REFERENCES Baltzan, P. (2012). Business driven information systems, (3 rd ed.). New York: McGraw-Hill. Business & Finance Week Editors. (2012, 12 May). Data analytics: 34 percent of mid-market businesses using business intelligence are planning to adopt big data analytics; Lack of expertise among SMBs is main barrier. Business & Finance Week. Retrieved from http://search.proquest.com.proxy1.ncu.edu/docview/1010520318?accountid=28180 Chui, M., Löffler, M., & Roberts, R. (2010, March). The Internet of things. McKinsey Quarterly. Retrieved from https://www.mckinseyquarterly.com /The_Internet_of_Things_2538 Coronel, C., Morris, S., & Rob, P. (2013). Database Systems: Design, Implementation, and Management, (10 th Ed.). Boston: Cengage Learning. Eaton, Deroos, Deutsch, Lapis, & Zikopoulos. (2012). Understanding big data: Analytics for enterprise class Hadoop and streaming data. New York: McGraw-Hill. Manyika, J., Chui,. M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011, June). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute. Retrieved from http://www.mckinsey.com/insights /MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_ innovation The emergence of big data, page 5

Oracle. (2011, September). Oracle NoSQL Database. Redwood Shores, CA: Oracle Corporation. Retrieved from http://www.oracle.com/webapps/dialogue/ns /dlgwelcome.jsp?p_ext=y&p_dlg_id=11739928&src=7328025&act=24&sckw=wwm K11054205MPP001.GCM.8318.1020 Sasirekha, R. (2011). NoSQL, the database for the cloud. New York: Tata Consultancy Service. Retrieved from http://www.tcs.com/sitecollectiondocuments/white%20papers /Consulting_Whitepaper_No-SQL-Database-For-The-Cloud_04_2011.pdf Sliwa, C. (2011, June 16). Scale-out NAS, object storage, cloud gateways replacing traditional NAS. Retrieved from http://searchstorage.techtarget.com/feature/scale-out-nas-objectstorage-cloud-gateways-replacing-file-storage Wang, Brandt, Miller, & Long. (2004, April). OBFS: A file system for object-based storage devices. Design (2004), 283 300. Retrieved from http://www.mendeley.com /research/obfs-a-file-system-for-objectbased-storage-devices/ White, C. (2011). Data Communications and Computer Networks: A business user s approach, (6 th ed.). Boston: Cengage Learning. APPENDIX Figure 1 The emergence of big data, page 6