1 Ojulari Moshood Cameron University - IT4444 Capstone 2013 "BIG DATA A PROLIFIC USE OF INFORMATION" Abstract: The idea of big data is to better use the information generated by individual to remake and improve our businesses, Security, health care, and economy. While some people consider big data as a meme and a marketing term that opens door to new approach to understanding the world and decision making . Big data present vastly new opportunities to us. Giving individuals the right to their data enables data to become an asset that people own which in turn gives them ability to trade for service or whatsoever . This creates a new environment of people who treat their data like how they treat their money. It also enables the next generation of interactive data analysis with real time answers . The goal of this paper is to enlighten its readers about Big Data and its benefits. Keywords: Big Data, Privacy, Security, Big Data Analytic, Data Mining, Data Warehousing, Hadoop. Introduction: Big data is data too big, too fast, or too complex for existing tools to capture, manage, keep, and process. People often refer big data as data from social media or search engines but that is not big data, the real big data are data like credit cards, photograph, mobile phones, GPS logs, web-browsing trails, network data, sensor, , and so on . These are things that we tend to neglect that show trends in people's behavior, which is one of the major key in big data. Big data promises is to help better engineer the systems we have now in our society to work more efficiently with the use information obtained from data analysis, Which helps brings nonnegotiable facts into mix, enabling managers to base vital decisions on solid information accumulated from a rich variety of sources and delivered in real time . Big data is faced with challenges due to the current technology not be able to handle the velocity, volume, and variety of data and the algorithm for analyzing such massive amount of data and also there is a lot of privacy concern as to who, whom would be using the information. Section 2 discuss big data analytics and warehousing and gives examples of data analytics can help a business maximize profit, In section 3, discuss big data mining and its importance, section 4 discuss big data security, section 5 discuss big data & privacy issues that rises do to individual information been shared by multiple sources around the world. Section 6 gives an insight to the future of big data and how it will help better engineer a more informative driven society. Big Data Analytics & Warehousing: Big data analytic is a fast growing and influential practice . It analyzes various data types i.e. (video, gps tracking, web-browser
2 trails, sensors, , social media) with the use of advance analytic techniques to process, clean, and transform structured unstructured acquired data to unearth patterns, hidden correlations and other fruitful information. Such information can be use in decision-making, improve business performance, security, and so on. Big data analytics can be done using apache Hadoop, an open source software framework that enables the distributed processing of large data sets across clusters of commodity servers . Instead of a single server to thousands of machines with a very high degree of fault tolerance that does not relying on high-end hardware, rather the resiliency of the clusters comes from the software s ability to handle and detect failures at the application layer . Hadoop provides two basic services namely, MapReduce and Hadoop Distributed File System (HDFS). MapReduce is a programming model use to simplify data processing across large datasets, While HDFS is a distributed file system with high fault tolerance and is designed to be deployed on low cost hardware, and it also provides highthroughput access to application for large data set . Retail Company like Wal-Mart and kohl's uses data analytic in sales, pricing, demo graph and weather data to tailor product selections at particular stores to determine the timing of price markdowns . Data warehousing is a database use for storing, reporting, and data analysis . It stores current as well as historical data, which are used for creating trending reports used in decision-making, or for future prediction. It is usually a central repository of data, crafted by combining data form one or more sources organized to facilitate management decision making. Data warehouses a constructed through data cleaning, data integration, data transformation, data loading, and periodic data refreshing .
3 Image source:http://whatsthebigdata.com/2012/12/06/the-future-of-big-data-infographic/ Big Data Mining: Is a sophisticated technique that analyzes large variety, and volumes of data, for determining patterns and relations, using advanced statistical analysis, and modeling techniques. The main objective is to find relation in patterns that can be leverage for improving the business . It helps unveil puzzling but useful associations and to better understand known association . For example, a retailer discovered that almost half of the customers who bought cigarette on Friday also bought cologne an association that led the retailer to display the twoproduct side-by-side or when it rains people tend to sign in to various social networks. This helps retailers determine the most effective sales floor layout . Big Data Security: Big data security analytics is simply a collection of security data sets so large and complex that it becomes difficult (or impossible) to process using on-hand database management tools or traditional security data processing applications . It can help lower cyber security risks through analyzing a large amount of behavioral data to distinguish between legitimate or malicious activities through in-depth analysis of forensics and network traffic, fraud detection report or logs, and customer data, to identify the known and unknowns to create or better enhance our current security systems. Big Data analytic can also help detect DDoS attacks by creating a MapReduce based detection algorithm in Hadoop, to simple count the total volume of the number of web page requests from a client or by calculating the spending time and the
4 bytes count for each request of the URL and comparing the access sequence and spending time among other clients trying to access the same server to detect if a clients is infected . For example, James used 6 IP addresses and five user IDs and 14 different accounts. With big data security analytics techniques security experts will be able to make the most accurate security decisions from the information extracted from real-time analysis of various data set or to create an entire new set of security capabilities. Big Data & Privacy: Protecting data privacy becomes harder as information is been shared widely among different parties around the world, and protecting data stored in distributed systems or data been shared is very important because there can be serious consequences if such data is released without the data owners knowledge . As more information regarding individuals health, financials, location, and online activity spreads, concerns arise about profiling, tracking, discrimination, exclusion, government surveillance and loss of control of such information. Big data challenges some of the most fundamental concepts of privacy law, including the definition of personally identifiable information, the role of individual control, and the principles of data minimization and purpose limitation . Future of Big Data: The future of big data is to potential re-engineer the marketplace, security, schools, businesses, public health, and the society at large with its power of prediction, which helps reveal obscured trend in structured and unstructured data set in real time which helps business correlate with customer activity online with right promotions and marketing campaigns, tracking sales transactions and also to help prediction system failure beforehand or to help the government in areas like security, economic advancement, public school, census, and much more to foster a better and more informative driving society. For more information, check "http://datadrivendetroit.org/projects/". Challenges in big data will help foster or create new career fields and technology such as data scientist, data analyst, new analytic platforms, products, systems and many more. Putting Big Data to work, to drive innovation or to reform current innovation process will not be trouble-free but if appropriate investment is made towards it, I believe that Big Data will usher a new surge of technological advancement that will help change the world. Conclusion: As the world is becoming more information driven, Big Data will continue to grow at a fast pace and will help engineer a new universe driven by data. Big Data holds potential for making an enormous advancement in many scientific disciplines and expanding the profitability and success of many small and big enterprises . However, some companies are already using some of the power of Big Data analytics in crucial decision-making to gain competitive advantage over other competitors. The challenges of Big Data is not just the data scale, but it also a diverse range of others, such as lack of structure, algorithm, privacy, provenance, timeliness, and visualization. Big Data will help us uncover knowledge that no one has discovered before. I encourage everyone to participate in this great journey to a world where data means vivacity. References
5 1. Liyakasa, Kelly. "Big Data Analytics Can Help Improve Information Security." CRM Magazine (2012): 11. Academic Search Premier. Web. 1 Feb Johnson, Jeanne E. "Big Data + Big Analytics = Big Opportunity." Financial Executive 28.6 (2012): Business Source Premier. Web. 1 Feb Huwe, Terence K. "Big Data, Big Future." Computers in Libraries 32.5 (2012): Academic Search Premier. Web. 1 Feb Ritchey, Diane. "Big Data, Big Security." Security: Solutions for Enterprise Security Leaders 49.7 (2012): Business Source Premier. Web. 1 Feb Russom, Philip. "Big Data Analytics." Tdwi.org. Tdwi Research, Web. 15 Feb "Challenges and Opportunities with Big Data." N.p., n.d. Web. 09 Feb <http://www.cra.org/ccc/docs/init/bigdatawhitepaper.pdf>. 7. The Age of Big Data. Steve Lohr. New York Times, Feb 11, "What Is Hadoop?" IBM. N.p., n.d. Web. 25 Mar <http://www- 01.ibm.com/software/data/infosphere/Hadoop/>. 9. Apache Hadoop Han, Jiawei, Kamber, Micheline. Data Mining: Concepts and Techniques. Boston, Mass: Elsevier, Web. 1 Apr Khan, Arshad. Data Warehousing 101: Concepts and Implementation. San Jose, Calif: Khan Consulting and Publishing, Web. 1 Apr Kamath, Chandrika. "Large Scale Data Mining and Pattern Recognition: Overview." Large Scale Data Mining and Pattern Recognition: Overview. Lawrence Livermore National Laboratory, 30 Aug Web. 01 Apr <https://computation.llnl.gov/casc/sapphire/overview/overview.html>. 13. Jon, Oltsik. "Defining Big Data Security Analytics." Network World. Network World, Inc., 01 Apr Web. 02 Apr <http://www.networkworld.com/community/node/82758>. 14. Y. Lee and Y. Lee, Detecting DDoS Attacks with Hadoop, ACM CoNEXT Student Workshop, Dec Tene, Omer. "Big Data for All: Privacy and User Control in the Age of Analytics." Center for Internet and Society. N.p., 20 Sept Web. 16 Mar <http://cyberlaw.stanford.edu/blog/2012/09/big-data-all-privacy-and-user-control-ageanalytics>. 16. Wong, R. C. "Big Data Privacy." J Inform Tech Softw Eng 2 (2012): e114.
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania firstname.lastname@example.org, email@example.com
For Big Data Analytics There s No Such Thing as Too Big The Compelling Economics and Technology of Big Data Computing March 2012 By: 4syth.com Emerging big data thought leaders Forsyth Communications 2012.
An Oracle White Paper March 2013 Big Data Analytics Advanced Analytics in Oracle Database Advanced Analytics in Oracle Database Disclaimer The following is intended to outline our general product direction.
How to embrace Big Data A methodology to look at the new technology Contents 2 Big Data in a nutshell 3 Big data in Italy 3 Data volume is not an issue 4 Italian firms embrace Big Data 4 Big Data strategies
Trends in Cloud Computing and Big Data Nikita Bhagat, Ginni Bansal, Dr.Bikrampal Kaur firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Abstract - BIG data refers to the
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R B i g D a t a : W h a t I t I s a n d W h y Y o u S h o u l d C a r e Sponsored
BUY BIG DATA IN RETAIL Table of contents What is Big Data?... How Data Science creates value in Retail... Best practices for Retail. Case studies... 3 7 11 1. Social listening... 2. Cross-selling... 3.
fs viewpoint www.pwc.com/fsi 02 15 19 21 27 31 Point of view A deeper dive Competitive intelligence A framework for response How PwC can help Appendix Where have you been all my life? How the financial
April 2013 Operational Intelligence: What It Is and Why You Need It Now Sponsored by Splunk Contents Introduction 1 What Is Operational Intelligence? 1 Trends Driving the Need for Operational Intelligence
IBM Software Big Data & Analytics Thought Leadership White Paper Better business outcomes with IBM Big Data & Analytics The insights to transform your business with speed and conviction 2 Better business
IBM Software Thought Leadership White Paper June 2013 The top five ways to get started with big data 2 The top five ways to get started with big data Big data: A high-stakes opportunity Remember what life
COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY The business world is abuzz with the potential of data. In fact, most businesses have so much data that it is difficult for them to process
May 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company s business
Five predictive imperatives for maximizing customer value Applying predictive analytics to enhance customer relationship management Contents: 1 Introduction 4 The five predictive imperatives 13 Products
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Emergence and Taxonomy of Big Data as a Service Benoy Bhagattjee Working Paper CISL# 2014-06 May 2014 Composite Information Systems Laboratory (CISL) Sloan School of Management, Room E62-422 Massachusetts
Compliments of 2nd IBM Limited Edition Business Analytics in Retail Learn to: Put knowledge into action to drive higher sales Use advanced analytics for better response Tailor consumer shopping experiences
ISSN (Online): 2409-4285 www.ijcsse.org Page: 78-85 A Survey of Big Data Cloud Computing Security Elmustafa Sayed Ali Ahmed 1 and Rashid A.Saeed 2 1 Electrical and Electronic Engineering Department, Red
July 2013 Contents 1. Introduction 3 2. What is Big Data? 4 3. Big Data Adoption 5 4. Drivers and Barriers 11 5. Opportunities for Digital Entrepreneurship 14 5.1. Supply-side Business opportunities 14
Volume 2, Issue 9, September 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
CIO Roundtable - Big March 13, 2013 Big and its Dimensions Big refers to internal and external data that is multi-structured, generated from diverse sources in near real-time and in large volumes making
Insurance Analytics Driving insight to gain advantage 11 March 2014 Agenda What is Analytics? Using analytics to overcome challenges in the Insurance industry Retention Customer Segmentation Overcoming
white paper Boosting Retail Revenue and Efficiency with Big Data Analytics A Simplified, Automated Approach to Big Data Applications: StackIQ Enterprise Data Management and Monitoring Abstract Contents
American Journal of Engineering Research (AJER) e-issn : 2320-0847 p-issn : 2320-0936 Volume-03, Issue-05, pp-266-270 www.ajer.org Research Paper Open Access Convergence of Big Data and Cloud Sreevani.Y.V.
Big Data Analytics ALTERYX SPECIAL EDITION by Michael Wessler, OCP & CISSP Big Data Analytics For Dummies, Alteryx Special Edition Published by John Wiley & Sons, Inc. 111 River St. Hoboken, NJ 07030-5774
MASARYK UNIVERSITY FACULTY OF INFORMATICS Best Practices in Scalable Web Development MASTER THESIS Martin Novák May, 2014 Brno, Czech Republic Declaration Hereby I declare that this paper is my original