Understanding Big Data & DV 2 law

Size: px
Start display at page:

Download "Understanding Big Data & DV 2 law"

Transcription

1 Understanding Big Data & DV 2 law A. V. N. S. Jyothirmayee 1, Dr. G. Sreenivasula Reddy 2, K. Akbar 3 1 Student, M.Tech CSE 1 st year, 2 Professor, Vignana Bharathi Institute of Technology, Proddatur Asst.Professor, Chaitanya Bharathi Institute of Technology, Proddatur Abstract--Big data is a rapidly expanding research area spanning the multi disciplinary fields. As the rapidity of data growth increases exceeding the expectations, its accessibility levels also should compete with the growth. One such growth of data leads our talk to the concept of Big Data, which is providing a great challenge to the research scholars and scientists to handle such large data. The term large may refer the data in several fields such as biological sector, financial area, computer science technology, knowledge engineering, and so on. Our paper includes the presentation of insight of several big data issues, along with the proposal of new Big Data Characteristics. Keywords-- Big Data, DV 2 law, levity, accuracy, worthfulness I. INTRODUCTION Several drastic increases in the data collection have been addressed from recent years. It specifies that the ability to collect the data from various sources, in different formats have been increased. Such a data flood provided challenging issues to our capability in processing the data, collecting data from multiple sources, storing such a large data, analyzing, and understanding these datasets. Consider the Internet data. In 1998, the web pages that were indexed by Google were around one million, but by 2000 it has quickly reached to one billion, and moreover in 2008 it exceeded one trillion. Such a rapid expansion occurred by the dramatic increase of amplifying huge data volumes on the web, which is done by several social networking applications such as Weibo, Twitter, Facebook, etc. A simple real time example that can be considered is our mobile phone. It carries real time data from different people with vast data volumes. Each and every entity in this world is connected in some way or the other. Men and machines are all connected; composition of trillions of all such connections generates huge data flood. Such a data can be called as Big Data. Big data challenge is one of the most exciting opportunities in the years to come. II. BIG DATA The word Big Data originated due to the fact that huge amounts of data is being created. Every day, about 2.5 quintillion bytes of data is being created so much that 90% of the created data in the world today has been created in the past two years alone. Gartner [9] summarizes the definition of Big Data as- high volume, velocity, and variety of information assets that demand the cost-effective, innovative forms of information processing for enhanced insight and decision making. According to IBM, [14] 80 % of data that is being captured today-from various domains like- sensors, social media posts, digital pictures, videos, cell phone s GPS signals etc, is unstructured. All of these unstructured data is nothing but what to be called as: Big Data. III. BIG DATA CHARACTERISTICS In a 2001 research report, META Group analyst (now Gartner), Doug Laney [10] was the first one to talk about the big data as 3 three dimensional i.e. 3 V s: increasing volume, variety, and velocity in Big Data Management. Later in 2012, Gartner updated his definition as high volume, high velocity and high variety. The following figure provides an idea on big data characterization. Fig: characterization of Big Data with 3V s Volume: The sheer volume of data is enormous, and a very large contributor to the over expanding volumes of data than ever before. Its quantity is going increasingly higher than expectations. 688

2 As almost 90% of all data ever created was created in past two years; by the time of 2020, we can expect 50 times the amount of data that we had in Performing a proper sentiment analyses and depicting the exact meaning of the word in respective contexts is still very difficult. Visualization: Visualization is the hard part of Big Data. Visualizations (of course) do not mean ordinary graphs or pie charts. They mean complex graphs that include several variables, still being understandable data and readable data. Visualization may not be the difficult part from technological perspective, but is how ever the most challenging part. Value: Of course, the data itself is not valuable at all. But the analyses done on the data, and how the data is turned into useful information, and then eventually turning it into knowledge make all the value. Fig: Gartner s Big Data Characteristics Variety: Data today is coming in different formatsstructured, semi-structured, unstructured and even complex structured data. The wide varieties of data includes data such as text, multimedia, audio, digital pictures, video, sensor data, graph data and many more take their shape. Velocity: Velocity refers to the speed at which the data is created, stored and analyzed and visualized. In the big data era, the data is created in real time or near real time. However, the speed at which the data is created is almost unimaginable. Later on, there came four more V s into existence, which are made important to the organizations to keep in mind in order to develop a big data strategy. They are veracity, variability, visualization, and value. Veracity: veracity refers to the correctness of data. Having lots of data in high volumes and at high speed is worthless if that data is incorrect. Therefore organizations need to ensure that both the data and the analyses performed on the data are correct. Variability: Variability differs from variety. For example, let us consider a hotel with 15 different items. That is variety. Now, consider that the same item is ordered for 6 continuous days and in all the 6 days the food taste difficult. This is variability. Variability means that the meaning is changing rapidly. The data being created today is rapidly changing. A same word tweeted can have different meanings in different contexts based on different sentiment analyzes. IV. PROPOSED CHARACTERISTICS: DV2 LAW DV 2 Law: Big Data can be considered as- accumulation of data with Diverse dimensions, increasing Velocity and expanding Volume of relationships which provide levity, accuracy, and worthfulness. These characteristic features of big data makes it clear to realize that big data is boundless, and has a very tremendous future. Big Data provide the levity of relationships (changeability) with accuracy and worthfulness. Accumulation of Diverse dimensionality data: Huge data with diversity of dimensions is gathered and stored. Such a collection depends on different users of data. Several formats, types and structures of data can be collected based on the nature and requirement of the information collectors. As no single model will be sufficient to satisfy all the customers, no single dimension of data satisfies all the information collectors in parallel. For example: consider a human being. The same human being is identified as a child to his/her parent, a sibling to his/her brother/sister, a student at education, an employee at work, a patient at hospital, and a stranger to unknown. The same person is represented or identified in the variety of dimensions under different user perspectives. In the same way, in the present challenging days, data is expected to be available with diverse dimensionalities. Increasing Velocity: As it is a known fact that users expect the data to be accessible at very fast rates, it became necessary for the data collection to be done with more velocity/speed. Data as well as its accessibility status grows increasingly irrespective of time. The data being decentralized also helps for data growth. 689

3 In the cases of major big data related applications like Google, Facebook, twitter, yahoo, LinkedIn, and so on, the data growth is seen more rapidly than normal. To make the data growth compatible with recent days, speed factor is one of the most considerable issues. It is also expected that data is about to grow more in the upcoming days with high velocities. Expanding Volume of relationships: With the growth in the data being generated, the relationships among those data also take their growth. While considering real world issues in social networking media like Facebook and twitter, when a user gets registered newly, that new user can get connected to several other existing users by means of friend requests, messages, sharing posts, and tweets. Any user from any corner of the world can get related to any other user at any other corner of the globe. This shows an example of how virtual relationships grow in large volumes. The same processes holds good for different kinds of data too. The expanding volumes of relations among the data also serve its best for discovering and analyzing useful patterns. V. UN GLOBAL PULSE: BIG DATA FOR DEVELOPMENT UN Global Pulse, a United Nations initiative, launched in 2009, is working on Big Data for improving the lives of developing countries. It provides its work on Big Data for a Better World. They pursue a strategy that consists of 3 issues: (1) research on innovative methods and techniques in order to analyze the real time digital data which detects early emerging vulnerabilities; (2) assembling free and open source technology tool-kit in order to analyze real time data and sharing hypothesis; (3) establishing integrated, global network of Pulse Labs, to pilot the approach at country level. UN Global Pulse describes the main opportunities that Big Data offer to the developing countries in their white paper- Big Data for Development: Challenges & Opportunities [13]. They are as follows: Early Warning : Earlier detection of anomalies, trends and events allows earlier response Real Time Awareness : A more accurate representations of reality is maintained in designing the programs and policies, their planning and in implementations Real Time Feedback : Understanding sooner the changing real time needs or the needs that are not being met or that are not up to the reality allows for rapid and adaptive course correction It is estimated that- over 5 billion mobile phones in the world, 80% of them are in the developing countries. The fact that mobiles are spreading in developing countries much more than that in developed countries - shows that the Big Data Revolution is not restricted to the industrialized world. VI. BIG DATA OPEN SOURCE TOOLS The phenomenon of Big Data is related intrinsically to the open source software. Large organizations like Facebook, Twitter, Google, Yahoo, LinkedIn etc., benefit and contribute their work on open source projects. What is new is- how much bigger the data is, how fast and quickly it is growing, and its complexity. For the enterprises to improve their processes and performance, tools are needed that allow them to collect the data, to store the data, and to analyze the data. Interestingly, many of The Best and Best known big data tools available today are open source. The top most and best known tool of these is Hadoop. The big data tools are under different sections. The different sections include [19]: 1. Big Data Analysis platform & Tools 2. Data Bases/ Data Warehouses 3. Business Intelligence 4. Data Mining 5. File Systems 6. Programming Languages 7. Big Data Search 8. Data Aggregation & Transfer 9. Miscellaneous Big Data Tools Under these above different sections, several big data tools exist. Some of them are as short listed below: 1. Hadoop 2. Map Reduce 3. Storm 4. Cassandra 5. HBase 6. Mahout 7. Weka 8. Gluster 9. Pig Latin 10. R ; and many more [19] Hadoop [2]: Hadoop talks its word compulsorily in the discussion of big data. Hadoop and its related technologies are being supported by numerous vendors. 690

4 HDFS is the primary storage system for Hadoop. Hadoop allows writing rapidly processing applications that processes huge amounts of data simultaneously on large clusters of compute nodes. It is able to provide applications processing on about 120 different physical machines, with 123 nodes at a time. A Map reduce divides the input datasets into several independent subsets and are processed by the map tasks concurrently. Next are the steps of reducing tasks. These reduce tasks provide the final result by using the output of the maps. Storm [6]: Owned by twitter, storm offers distributed real time computation capabilities. It works with nearly all programming languages with high scalability, robustness, and fault tolerant capabilities. Cassandra [5]: Originally, Cassandra was developed by Facebook; now it is owned by Apache foundation. Cassandra is the open source distributed database management systems. Many organizations including Netflix, twitter, urban Airship, reddit, Cisco, constant contact, and dig uses Cassandra with large and active datasets. Cassandra is used as back-end database for streaming the services of Netflix. HBase [4]: HBase is the non relational distributed NoSQL data store for Hadoop. Being an Apache project, HBase is designed to run on the top of Hadoop Distributed File System (HDFS). Mahout [3]: Mahout is an OS independent Apache project. Mahout offers various algorithms for clustering, classification that run on the top of Hadoop. The main goal of Mahout is to build the scalable machine learning libraries. R [19]: It was developed by Ross Ihaka and Robert Gentleman at Auckland University, New Zealand. R is an open source programming language for statistical analysis of very large data sets and visualization. VII. HADOOP Hadoop is a top level Apache Project, written in Java. Hadoop is a computing environment built on the top of a distributed clustered file system, specifically developed and designed for very large scale data applications. Hadoop was inspired by the Google s Google (Distributed) File System (GFS), and the MapReduce programming paradigm. Hadoop is designed to analyze the large data sets, and to produce the results from highly scalable, distributed batch processing system. 7.1 Hadoop Components [15] Hadoop is not the just single tool. It is seen as a combination of two components. They are: Hadoop Distributed File System (HDFS), and Hadoop MapReduce. One of the important and key components to be considered is its redundancy. While storing the data across large clusters, the data is made redundant. This data redundancy provides the fault tolerant capability to the Hadoop cluster and makes it heal by itself from faults. This makes Hadoop possible to work out on different workloads across huge clusters, thereby making to work on big data difficulties HDFS HDFS is the Hadoop component that quickly replicates the data onto several nodes of a cluster, which is intended to provide reliable and fast performance of the application. The HDFS component of Hadoop helps in breaking the data in the clusters into smaller pieces (called blocks), and distribute them throughout the cluster. Then the map and reduce functions get executed on the smaller data subsets, and provide the high scalability required for processing the Big Data. Entire data in the Hadoop clusters arrangement logics is handled by the server called-namenode. This NameNode keeps track of all the data files in the HDFS. And the details of NameNode information are stored in the memory. In the step of either creating a new file or hiring already existing application, Hadoop initially gets communicated with the NameNode, and then start to work on -data storage and its replication over clusters; or sends the application to run locally on the nodes depending on the task of whether creating new file or accessing already existing application. Usage of NameNode in HDFS improves the scalability of the solution Map Reduce MapReduce is considered to be the heart of Hadoop. MapReduce, the component of Hadoop, is an effective programming model for efficient distributed computing. In the Hadoop cluster, MapReduce program is referred as a job. A job is executed by breaking it into pieces, called tasks. MapReduce actually refers to the two distinct tasks performed by the Hadoop program. They are: map and reduce functions. As the sequence is, the map job is performed former and later the reduce job. Map function performs the conversion of one set of data into other set of data, individual elements being broken into key-value pairs called tuples. The output of the map is inputted to the Reduce function, which reduces the data tuples into smaller data tuples. All the MapReduce programs which are run on Hadoop natively are written in Java. 691

5 7.2 Hadoop Characteristics Hadoop is not just a single tool or a particular program; it is a framework of tools. Hadoop is distributed the license of Apache. Hadoop is not that which works on a big powerful computer, but is that which works on number of small and low cost computers; referring to as- working on distributed models. 7.3 Hadoop Benefits The following represented are some of the benefits of Hadoop. 1. Hadoop is a flat scalability curve 2. Performs log Analysis 3. Cost effective 4. Low failure risks VIII. BIG DATA APPLICATIONS It is important to know your data. Big Data for Development are certainly not perfect data, but their value is tremendous if they are both properly understood and analyzed. The promise of Big Data for Development is, and will be, best fulfilled when its limitations, biases and features are adequately understood and taken into account when interpreting data. Properly analysed, Big Data offers the opportunity for an improved understanding of human behaviour that can support the field of global development: 1) Early warning, 2) Real-time awareness and 3) Real-time feedback. Fig: Big Data Processing Framework The awesome ways Big Data is used today to change our world can be summarized in the following list [21]. Understanding customers: Big Data is used to better understand the customer preferences. Several organizations understand their customer behaviours by keep track of traditional data sets corresponding to social network sites, and browser logs. Understanding & optimizing business processes: Big Data is also increasingly used to expand their business and optimize business processes by performing big data analytics on the measurement of organization culture and their staff engagement in usage of big data tools. Optimizes performance of machines: Big Data analytics helps devices and several machines improve their performance and become smarter, and more autonomous. Example: usage of big data tools in the implementation of Google s self driving car. Personal issues: Big Data is not only used for large organizations, but also to improve individual s personal development issues. Enhancements in health care: Big Data can be used for DNA mining which is used therefore to discover, analyse, monitor and improve health care of every individual. Improving sports area: Most of the sport teams track their athletes outside the sporting environment in order to monitor their emotional well-beings. Improvement in science & research: One of the top most and vast areas of development also includes Science & Research development. Big Data analyses can be used in this area as there are currently several peta bytes of data for being analysed and to transform data among other research regions. Improvement of security: As Big Data concentrates on the relationships among several data types, maintaining, managing, and providing security is also the task to be taken care of. Improvement & optimization of cities and countries: Big Data can also be used to improve several aspects of cities and countries such as weather data, and automatic signalling systems (for example: traffic). Financial trading: Big Data can also be used in analysing financial trading information, as everything today is mostly regarding money concerns. 692

6 Some more applications can be short listed in the following way: Multi Channel sales Log Analysis Telecom Search Quality Manufacturing Fraud and Risk Retail Technology IX. BIG DATA IN SOCIAL NETWORKING As the data is going on increasing in a dramatic manner, the usage and search queries about several types of data in the internet also increases. Such an increase affects the usage of social networking sites very largely. Big Data can also be termed as a cultural, technological, and scholarly phenomenon that rests on the interplay of Technology, Analysis, and Mythology. This can be known clearly by understanding the statistics of Google, Twitter and Facebook. Average searches per day on Google in 1998 (Google s official first year) was about 9800, whereas it is about 5 million searches per day during 2013 [22]. Average number of tweets per day on twitter is about 58 million [22]. For every 20 minutes, the number of links shared, friend requests sent, and messages sent on face book are 1 million, 2 million, and 3 million respectively. In order to deal with all such large data, several new tools and algorithms are required. For example consider one of the top most social network applications, Twitter. Twitter is an online social networking website and micro blogging service that follows data stream model and allows users to post and read text-based messages of up to 140 characters, known as tweets. Launched in July of 2006 by Jack Dorsey, Twitter is now in the top 10 most visited internet sites. Considering the twitter data, the data arrive at a very high velocity. The twitter data stream that supports real-time message passing was made available to the developers in This data imposes new challenges and knowledge discovery issues. As per 2010 April, the registered number of users in twitter is 106 millions, and 108 million unique users per month. And now, total number of active registered twitter users are 645,750,000; number of new twitter users signing up every day are 135,000; number of unique twitter site visitors every month are 190 million; 58 millions of average number of tweets per day; 2.1 billions of average twitter search engine queries every day; number of tweets that happen every second are 9100 tweets. 693 This itself can tell with what high velocity rates, the data is emerging in social networks. In order to handle all such emerging data, new algorithms and procedures are need to be evolved. Also, based on the emoticons or smiley in the messages, sentiment analysis can be processed and can classify messages into two types depending on whether they convey optimistic or pessimistic feelings. A survey on sentiment analysis of emoticon included data is referred in [23]. Another interesting application is NELL. NELL stands for Never Ending Language Learning. NELL is a system developed by Tom Mitchell group [24] at Carnegie Mellon University, with the main goal of building a never ending machine learning system which has the ability to extract the structured information from the unstructured web pages. NELL is made active for 24/7/365, to perform two main tasks: extract new instances every day, and learn to provide the better data instances than yesterday. X. CONTROVERSIES In the world of rapid emergence of big data, handling the big data is however the first critical question to answer. Even though there are some significant studies being done involving the big data, several critical issues are to be answered necessarily such as- what big data means, what type of users can access what type of data, and many more. The issues represented below are some of the controversial and critical issues that are to be taken into consideration: 1. Changing nature of data: As the data grows rapidly, it will never be small again. In this case, it is not necessary to distinguish big data analytics from data analytics 2. Hadoop is not always better tool for big data: Hadoop is not always the apt tool. Because it seems that the data management system sellers try to sell the Hadoop based systems which may be not always the better platform to program. This can be understood by considering medium size companies as example. 3. Claims to objectivity are getting mislead: Real time working on big data is still subjective, and what it quantifies is not having a closer claim on the objective truth; especially, considering the messages of social networking websites. 4. Bigger data is not always the better data: It must be clearly known that big data and whole data are not the same always. Maintaining large data sets without knowing whether the data is up to the usage levels or not does not make the data ready to use. Also, the data to be used must be non noisy, and easily understandable, and easily analyzable.

7 5. Big data loses its meaning when taken out of context: since the large data sets can be modeled, data is often reduced such that they fit into the selected model. Yet, taken out of context, the data loses its meaning, and may even lose its value. 6. Accessibility does not make the data ethical: Accessing the data without knowing it does not make the meaning. 7. Limited access creates new digital divides: The present status of ecosystem around the big data is such that it is in the situation of creating new variety of digital divides- the Big Data Rich and the Big Data Poor. Such digital divides have the chance of providing ambiguity to access the big data, whether the organizations extract knowledge with or without involvement of the big data. XI. RESULTS While the term Big Data is literally concerned with big volumes of data, our current proposal- DV 2 LAW represents the specialized characteristics of big data as: (1) accumulation of data with diverse dimensions, (2) increasing velocity, and (3) expanding volume of relationships. Such a combination of characteristics also helps in the data to be collected and stored very carefully, without any collision of relationships and their dimensions and also to cope up with the increasing data velocities. The term expanding volume of relationships also helps the data collectors to recognize the various relationships among different dimensions of data that have been growing with a very high velocity. Data in such a context also might be able for better understanding and better usage levels as per the user expectations. XII. UPCOMING Big data is similar to small data, but bigger, with an aim to solve new problems, and even old problems in a new way. The era of Big Data has only just begun, but it is important that we already had started questioning the assumptions, values, and biases of this new wave of research. There are many future important challenges waiting for the researchers to handle. These challenges are about to arise due to the nature of data- huge, heterogeneous, and evolving. XIII. CONCLUSION The era of Big Data has only just begun. But it is already important that we have started questioning the assumptions, values, criticalities, and future of this new wave of research. Our current paper presents an insight of Big Data, and its main concerns, along with the proposed theorem for Big Data characteristics. Mining the Big Data provides us with new knowledge discovered never ever before. REFERENCES [1] Wei Fan, Albert Bifet. Mining Big Data: Current Status, and Forecast to the future, SIGKDD Explorations, Volume 14, Issue 2 [2] Apache Hadoop, [3] Apache Mahout, [4] Apache HBase, [5] Apache Cassandra, [6] Storm, [7] d. Boyd and K.Crawford. Critical questions for Big Data. Information, Communication and Society, 15(5):662:679,2012 [8] F. Diebold. On the Origin(s) and Development of the Term "Big Data". Pier working paper archive, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, [9] Gartner, [10] D. Laney. 3-D Data Management: Controlling Data Volume, Velocity and Variety. META Group Research Note, February 6, [11] V. Gopalkrishnan, D. Steier, H. Lewis, and J. Guszcza. Big data, big business: bridging the gap. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, Big- Mine '12, pages 7-11, New York, NY, USA, ACM. [12] Intel. Big Thinkers on Big Data, [13] E.Letouze. Big Data for Development: Opportunities & Challenges. May [14] C. Parker. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Work-shop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine '12, pages 1-6, New York, NY, USA, ACM. [15] P. Zikopoulos, C. Eaton, D. deroos, T. Deutsch, and G. Lapis. IBM Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill Companies, Incorporated, 2011.

8 [16] UN Global Pulse, [17] Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding. Data Mining with Big Data [18] Mrs. Deepali Kishor Jadhav. Big Data: The New Challenges in Data Mining. International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: , Volume-1, Issue-2, September, 2013 [19] Big Data-Open source tools : [20] 3 V s of Big Data: [21] The awesome ways big data is used to change our world: [22] [23] B.Pang and L.Lee. Opinion mining and sentiment analysis. Foundations and trends in Information Retrieval, 2(1-2):1-135, [24] A. Carlson, J.Betteridge, B.Kisiel, B.Settles, E.R.H.Jr., and T.M.Mitchell. Toward an architecture for never-ending language learning. In proceedings of the Twenty-Fourth conference on Artificial Intelligence (AAAI 2010),

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Big Data Mining: Challenges and Opportunities to Forecast Future Scenario Poonam G. Sawant, Dr. B.L.Desai Assist. Professor, Dept. of MCA, SIMCA, Savitribai Phule Pune University, Pune, Maharashtra, India

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON BIG DATA MANAGEMENT AND ITS SECURITY PRUTHVIKA S. KADU 1, DR. H. R.

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Research Note What is Big Data?

Research Note What is Big Data? Research Note What is Big Data? By: Devin Luco Copyright 2012, ASA Institute for Risk & Innovation Keywords: Big Data, Database Management, Data Variety, Data Velocity, Data Volume, Structured Data, Unstructured

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics

Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Finding Insights & Hadoop Cluster Performance Analysis over Census Dataset Using Big-Data Analytics Dharmendra Agawane 1, Rohit Pawar 2, Pavankumar Purohit 3, Gangadhar Agre 4 Guide: Prof. P B Jawade 2

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

The big data revolution

The big data revolution The big data revolution Friso van Vollenhoven (Xebia) Enterprise NoSQL Recently, there has been a lot of buzz about the NoSQL movement, a collection of related technologies mostly concerned with storing

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

USING BIG DATA FOR INTELLIGENT BUSINESSES

USING BIG DATA FOR INTELLIGENT BUSINESSES HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT

More information

Mining Big Data to Predicting Future

Mining Big Data to Predicting Future 27 Mining Big Data to Predicting Future Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, INDIA amitkrtyagi025@gmail.com Abstract Due to technological

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Outline. What is Big data and where they come from? How we deal with Big data?

Outline. What is Big data and where they come from? How we deal with Big data? What is Big Data Outline What is Big data and where they come from? How we deal with Big data? Big Data Everywhere! As a human, we generate a lot of data during our everyday activity. When you buy something,

More information

Generic Log Analyzer Using Hadoop Mapreduce Framework

Generic Log Analyzer Using Hadoop Mapreduce Framework Generic Log Analyzer Using Hadoop Mapreduce Framework Milind Bhandare 1, Prof. Kuntal Barua 2, Vikas Nagare 3, Dynaneshwar Ekhande 4, Rahul Pawar 5 1 M.Tech(Appeare), 2 Asst. Prof., LNCT, Indore 3 ME,

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6 Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

Big Data Driven Knowledge Discovery for Autonomic Future Internet

Big Data Driven Knowledge Discovery for Autonomic Future Internet Big Data Driven Knowledge Discovery for Autonomic Future Internet Professor Geyong Min Chair in High Performance Computing and Networking Department of Mathematics and Computer Science College of Engineering,

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department

More information

Approaches for parallel data loading and data querying

Approaches for parallel data loading and data querying 78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper

More information

ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING

ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING ISSUES, CHALLENGES, AND SOLUTIONS: BIG DATA MINING Jaseena K.U. 1 and Julie M. David 2 1,2 Department of Computer Applications, M.E.S College, Marampally, Aluva, Cochin, India 1 jaseena.mes@gmail.com,

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Introduction to Big Data the four V's

Introduction to Big Data the four V's Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Popularity Analysis on Social Network: A Big Data Analysis

Popularity Analysis on Social Network: A Big Data Analysis Popularity Analysis on Social Network: A Big Data Analysis Sufal Das Brandon Victor Syiem Hemanta Kumar Kalita ABSTRACT A social network is a social structure made up of a set of social actors. These actors

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty

More information

Volume 3, Issue 8, August 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 8, August 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 8, August 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com An

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

REVIEW PAPER ON BIG DATA USING HADOOP

REVIEW PAPER ON BIG DATA USING HADOOP International Journal of Computer Engineering & Technology (IJCET) Volume 6, Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=6&itype=12

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Big Data Analytic and Mining with Machine Learning Algorithm

Big Data Analytic and Mining with Machine Learning Algorithm International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 1 (2014), pp. 33-40 International Research Publications House http://www. irphouse.com /ijict.htm Big Data

More information

Big Data. Fast Forward. Putting data to productive use

Big Data. Fast Forward. Putting data to productive use Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize

More information

A Survey on Big Data Concepts and Tools

A Survey on Big Data Concepts and Tools A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Navigating Big Data business analytics

Navigating Big Data business analytics mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

MINING BIG DATA: STUDY OF TOOLS OF OPEN SOURCE REVOLUTION

MINING BIG DATA: STUDY OF TOOLS OF OPEN SOURCE REVOLUTION MINING BIG DATA: STUDY OF TOOLS OF OPEN SOURCE REVOLUTION Ms. Palak Vaish 1, Dr. Saurabh Srivastava 2 1 Research Scholar, Mewar University, Chittorgarh, Rajasthan, (India) 2 Department of Mathematical

More information