Recent Issues and Challenges on Big Data in Cloud Computing

Size: px
Start display at page:

Download "Recent Issues and Challenges on Big Data in Cloud Computing"

Transcription

1 Recent Issues and Challenges on Big Data in Cloud Computing 1 Dr. Jangala. Sasi Kiran, 2 M.Sravanthi, 3 K.Preethi, 4 M.Anusha 1,2,3,4 Dept. of CSE, Vidya Vikas Institute of Technology, Chevella, R.R. Dt Telangana, INDIA Abstract We live in on-demand, on-command digital universe with data rapid reproducing by Institutions, Individuals and tools at very high rate. This data is categorized as Big Data due to its absolute Volume, Variety, Velocity and Veracity. Most of the data is partly structured, unstructured or semi structured and it is heterogeneous in nature. Due to its specific nature, Big Data is stored in distributed file system architectures. Hadoop and HDFS by Apache are widely used for storing and managing Big Data. Analyzing it, is a challenging task as it involves large distributed file systems which should be fault tolerant, flexible and scalable. Cloud computing plays a very vital role in protecting the data, applications and the related infrastructure with the help of policies, new technologies, controls, and big data tools. Moreover, cloud computing, applications of Big data, and its advantages are likely to represent the most promising new frontiers in science. The technology issues, like Storage and data transport are seem to be solvable in the near-term, but represent long term challenges that require research and new paradigms. Analyzing the issues and challenges comes first as we begin a collaborative research program into methodologies for big data analysis and design. Keywords Big Data, Cloud Computing and Map Reduce I. Introduction The term Big Data appeared for first time in 1998 in a Silicon Graphics (SGI) slide deck by John Mashey with the title of Big Data and the NextWave of Infra Stress. It is the term for data sets so large and complicated that it becomes difficult to process using traditional data management tools or processing applications. The origin of the term Big Data is due to the fact that we are creating a huge amount of data every day. At the KDD BigMine 12 Workshop Usama Fayyad in his invited talk presented amazing data numbers about internet usage, among them are the following: each day Google has more than 1 billion queries, Twitter has more than 250 million tweets per day, Per day Face book has more than 800 million updates, and YouTube has more than 4 billion views per day. Big Data is a heterogeneous mix of data both structured (traditional datasets in rows and columns like DBMS tables, CSV s and XLS s) and unstructured data like PDF documents, attachments, images, manuals, medical records such as x-rays, ECG and MRI images, forms, rich media like graphics, video and audio, contacts, forms and documents. Businesses are primarily concerned with managing unstructured data, because about 80 percent of enterprise data is unstructured [1]. Google has introduced MapReduce [2] framework for processing large amounts of data on commodity hardware. Apache s Hadoop distributed file system (HDFS) is evolving as a superior software component for cloud computing combined along with integrated parts such as MapReduce. Hadoop, which is an open-source implementation of Google MapReduce, including a distributed file system, provides to the application programmer the abstraction of the map and the reduce. Map Reduce by itself is capable for analyzing large distributed data sets; but due to the heterogeneity, 98 International Journal of Computer Science And Technology velocity and volume of Big Data, it is a challenge for traditional data analysis and management tools [3][4]. For analysis of Big Data, database integration and cleaning is much harder than the traditional mining approaches [5]. Parallel processing and distributed computing is becoming a standard procedure which are nearly non-existent in RDBMS. A. Importance of Big Data The government s emphasis is on how big data creates value both within and across disciplines and domains. Value arises from the ability to analyze the data to develop actionable information. The survey of the technical literature [6] suggests five generic ways that big data can support value creation for organizations. 1. Creating transparency by making big data openly available for business and functional analysis (quality, lower costs, reduce time to market, etc.) 2. Supporting experimental analysis in individual locations that can test decisions or approaches, such as specific market programs. 3. Assisting, based on customer information, in defining market segmentation at more narrow levels. 4. Supporting Real-time analysis and decisions based on sophisticated analytics applied to data sets from customers and embedded sensors. 5. Facilitating computer-assisted innovation in products based on embedded product sensors indicating customer responses. B. Big Data Characteristics One view, espoused by Gartner s Doug Laney describes Big Data as having three dimensions: volume, variety, and velocity. Thus, IDC defined it: Big data technologies describe a new generation of technologies and architectures designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis. [7] Two other characteristics seem relevant: value and complexity. We summarize these characteristics as given below. 1. Data Volume Data volume measures the amount of data available to an organization, which does not necessarily have to own all of it as long as it can access it. As data volume increases, the value of different data records will decrease in proportion to age, type, richness, and quantity among other factors. 2. Data Velocity Data velocity measures the speed of data creation, streaming, and aggregation. Ecommerce has rapidly increased the speed and richness of data used for different business transactions (for example, web-site clicks). Data Variety: Data variety is a measure of the richness of the data representation text, images video, audio, etc. 3. Data Value Data value measures the usefulness of data in making decisions. It has been noted that the purpose of computing is insight, not

2 ISSN : (Online) ISSN : (Print) numbers. Data science is exploratory and useful in getting to know the data, but analytic science encompasses the predictive power of big data. 4. Complexity Complexity measures the degree of interconnectedness (possibly very large) and interdependence in big data structures such that a small change (or combination of small changes) in one or a few elements can yield very large changes or a small change that ripple across or cascade through the system and substantially affect its behavior, or no change at all. In addition to big data challenges induced by traditional data generation, consumption, and analytics at a much larger scale, newly emerged characteristics of big data has shown important trends on mobility of data, faster data access and consumption, as well as ecosystem capabilities [8]. In this paper, We studied a system that can scale to handle a large number of sites and also be able to process large and massive amounts of data. However, state of the art systems utilizing HDFS and Map Reduce are not quite enough/sufficient because of the fact that they do not provide required security measures to protect sensitive data. Moreover, Hadoop framework is used to solve problems and manage data conveniently by using different techniques. C. Types of Big Data and Sources There are two types of big data: structured and unstructured. 1. Structured Data Structured Data are numbers and words that can be easily categorized and analyzed. These data are generated by things like network sensors embedded in electronic devices, smart phones, and global positioning system (GPS) devices. Structured data also include things like sales figures, account balances, and transaction data. 2. Unstructured Data Unstructured Data include more complex information, such as customer reviews from commercial websites, photos and other multimedia, and comments on social networking sites. These data cannot easily be separated into categories or analyzed numerically. The explosive growth of the Internet in recent years means that the variety and amount of big data continue to grow. Much of that growth comes from unstructured data. Fig. 1: Sources of Big Data II. Security and Challenges In certain domains, such as social media and health information, as more data is accumulated about individuals, there is a fear that certain organizations will know too much about individuals. Developing algorithms that randomize personal data among a large data set enough to ensure privacy is a key research problem. Perhaps the biggest threat to personal security is the unregulated accumulation of data by numerous social media companies. This data represents a severe security concern, especially when many IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 individuals so willingly surrender such information. Questions of accuracy, dissemination, expiration, and access abound. Clearly, some big data must be secured with respect to privacy and security laws and regulations. International Data Corporation suggested five levels of increasing security [7]: privacy, compliance-driven, custodial, confidential, and lockdown. Further research is required to clearly define these security levels and map them against both current law and current analytics. For example, in Face book, one can restrict pages to friends. But, if Face book runs an analytic over its databases to extract all the friend s linkages in an expanding graph, at what security level should that analytic operate? e.g., how many of an individual s friends should be revealed by such an analytic at a given level if the individual (has the ability to and) has marked those friends at certain security levels? With the increase in the use of big data in business, many companies are wrestling with privacy issues. Data privacy is a liability, thus companies must be on privacy defensive. But unlike security, privacy should be considered as an asset; therefore it becomes a selling point for both customers and other stakeholders. There should be a balance between data privacy and national security. Meeting the challenges presented by big data will be difficult. The variety of data being generated is also expanding, and organizations capability to capture and process this data is limited. Current technology, architecture management and analysis approaches are unable to cope with the flood of data, and organizations will need to change the way they think about, plan, govern, manage, process and report on data to realize the potential of big data. In the distributed systems world, Big Data started to become a major issue in the late 1990 s due to the impact of the world-wide Web and a resulting need to index and query its rapidly mushrooming content. Database technology (including parallel databases) was considered for the task, but was found to be neither well-suited nor cost-effective [9] for those purposes. Google s technical response to the challenges of Web-scale data management and analysis was simple, by database standards, but kicked off what has become the modern Big Data revolution in the systems world [10]. To handle the challenge of Web-scale storage, the Google File System (GFS) was created [11]. To handle the challenge of processing the data in such large files, Google pioneered its Map Reduce programming model and platform [3] [11]. This model, characterized by some as parallel programming for dummies, enabled Google developers to process large collections of data by writing two user-defined functions, map and reduce, that the Map Reduce framework applies to the instances (map) and sorted groups of instances that share a common key (reduce) similar to the sort of partitioned parallelism utilized in shared-nothing parallel query processing. Taking Google s GFS and Map Reduce papers as rough technical specifications, opensource equivalents were developed, and the Apache Hadoop Map Reduce platform and its underlying file system (HDFS, the Hadoop Distributed File System) were born [3][12]. Popular languages include Pig from Yahoo! [13], Jaql from IBM [14], and Hive from Facebook [13]. Microsoft s technologies include a parallel runtime system called Dryad and two higher-level programming models, Dryad LINQ and the SQLlike SCOPE language [15], which utilizes Dryad under the covers. Interestingly, Microsoft has also recently announced that its future Big Data strategy includes support for Hadoop [16]. The challenges of security in cloud computing environments can be categorized into network level, user authentication level, data level, and generic issues. International Journal of Computer Science And Technology 99

3 A. Network level The challenges that can be categorized under a network level deal with network protocols and network security, such as distributed nodes, distributed data, Internode communication. B. Authentication Level The challenges that can be categorized under user authentication level deals with encryption/decryption techniques, authentication methods such as administrative rights for nodes authentication of applications and nodes, and logging. C. Data Level The challenges that can be categorized under data level deals with data integrity and availability such as data protection and distributed data. D. Generic Types The challenges that can be categorized under general level are traditional security tools, and use of different technologies. III. Progress of Bigdata and Forecast to the Future Cloud computing as an important application environment for big data has attracted tremendous attentions from the research community. Remarkable progress of big data networking has also been reported in this area. In this section, we studied the following topics: cloud resource management of big data and performance optimization of big data in Cloud Computing. A. Overview and Resource Management Agarwal et al. [17] focused on systems for supporting update heavy applications and ad-hoc analytics and decision support. Multi-tenant system model with different level of resource sharing is shown in Fig.2. Figure 2 depict representative forms of the challenging multi-tenant model and trade-offs associated with different forms of sharing. Since models share resources at different levels of abstraction, isolation guarantees can be achieved differently accordingly. Resource management plays a fundamental role in big data applications in the cloud. We next review important progress in this regard. A general introduction to resource management and allocation in multi-cluster clouds were introduced in [18]. [19] Introduces virtualization planning and cloud computing methods in IBM data center networking. Key operational challenges such as support cost-saving technologies, rapid deployment, support for mobile and pervasive access, development of enterprise-grade network design has been discussed extensively. Lu, Sifei et al [20] presented their work of a framework for cloud-based large-scale data analytics and virtualization; a case study on climate data of various scales were introduced too. Specifically for reducing cooling energy cost for big data analytics cloud, a data-centric approach was introduced in [21]. Instead of relying on thermal-aware computational job placement/migration, the method in [21] takes a data-centric approach, which is now popular in big data applications. In sum, pervasive computing of big data in the cloud, computational resource and data complexity management, and energy consumption manipulations for big data in the cloud are fundamentally important aspects. The studied works have made logical progress in terms of system design and implementation, but much remains to be done with consideration of system validation in larger, real-world applications. Fig. 2: Multi-Tenant Model: Left To Right Shared Table, Shared Database, Shared OS & Shared Hardware. B. Performance Optimization Performance optimization is yet another classic and important topic in cloud computing because appropriate optimization techniques will provide better application experiences with comparable or even less system resource consumption, compared to non-optimized cases. A dataflow-based performance analysis tool for big data cloud, i.e., Hitune, was presented in [22]. Hitune is shown to be effective in assisting users doing Hadoop performance analysis and system parameter tuning. A few interesting case studies on big data processing in cloud computing environment was depicted in [23]. Efforts of the Fijitsu laboratory are based on data store and complex event processing, as well as workflow description in distributed data processing. A recent online cost-minimization algorithm was depicted in [24]. The two online algorithms have achieved competitive cost reduction ratios. The Algorithms need to be further evaluated at larger and more competitive scales, e.g., data streaming applications with larger topologies. In sum, Hitune and the Fijitsu laboratory approaches have been focused on promoting user experiences by using fundamental big data techniques such as event processing and work flow description. Tools and case studies like this are informational and offer more choices to users. Moreover, online cost-minimizing as another promising direction has been proved to be effective in big data applications. We expect a lot more scalable and efficient algorithms to be proposed in the near future. C. Future Challenges There are many future important challenges in Big Data management and analytics that arise from the nature of data: large, diverse, and evolving. These are some of the challenges that researchers and practitioners will have to deal during the next years: 1. Analytics Architecture It is not clear yet how an optimal architecture of an analytics system should be to deal with historic data and with real-time data at the same time. An interesting proposal is the Lambda architecture of Nathan Marz. The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, the serving layer, and the speed layer. It combines in the same system Hadoop for the batch layer, and Storm for the speed layer. 100 International Journal of Computer Science And Technology

4 ISSN : (Online) ISSN : (Print) 2. Statistical Significance It is important to achieve significant statistical results, and not be fooled by randomness. AsEfron explains in his book about Large Scale Inference it is easy to go wrong with huge data sets and thousands of questions to answer at once. 3. Distributed Mining Many data mining techniques are not trivial to paralyze. To have distributed versions of some methods, a lot of research is needed with practical and theoretical analysis to provide new methods. IV. Advancements & Conclusion Streaming algorithms [25] represent an alternative programming model for dealing with large volumes of data with limited computational and storage resources. Stream processing is very attractive for working with time-series data (news feeds, tweets, sensor readings, etc.), which is difficult in MapReduce (once again, given its batch-oriented design). Another system worth mentioning is Pregel [26], which implements a programming model inspired by Valiant s Bulk Synchronous Parallel (BSP) model. Pig [14], which is inspired by Google [11], can be described as a data analytics platform that provides a lightweight scripting language for manipulating large datasets. Similarly, Hive [27], another open-source project, provides an abstraction on top of Hadoop that allows users to issue SQL queries against large relational datasets stored in HDFS. Therefore, the system provides a data analysis tool for users who are already comfortable with relational databases, while simultaneously taking advantage of Hadoop s data processing capabilities [28]. MapReduce is certainly no exception to this generalization, even within the Hadoop/HDFS/ MapReduce ecosystem; it is already observed the development of alternative approaches for expressing distributed computations. For example, there can be a third merge phase after map and reduce to better support relational operations. Join processing mentioned n the paper can also tackle the Map Reduce tasks effectively. Big data is the new business and social science frontier. The amount of information and knowledge that can be extracted from the digital universe is continuing to expand as users come up with new ways to massage and process data. Moreover, it has become clear that more data is not just more data, but that more data is different. Big data is just the beginning of the problem. Technology evolution and placement guarantee that in a few years more data will be available in a year than has been collected since the dawn of man. If Facebook and Twitter are producing, collectively, around 50 gigabytes of data per day, and tripling every year, within a few years (perhaps 2-4) we are indeed facing the challenge of big data becoming really big data. In this work, we have done in-depth reviews on recent efforts dedicated to big data and big data networking. We have reviewed the progresses in fundamental big data technologies, important aspects of big data networking, and security in cloud computing such as new challenges and opportunities, resource management and performance optimizations are also introduced and discussed with independent viewpoints. This paper initiates a collaborative research effort to begin examining big data issues and challenges. We identified some of the major issues in big data storage, management, and processing. We also identified some of the major challenges going forward that we believe must be addressed within the next decade. Our future research will concentrate on developing a more complete understanding of the issues associated with big data, and those factors that may contribute to a need for a big data analysis IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 and design methodology. We will begin to explore solutions to some of the issues that we have raised in this paper through our collaborative research effort. V. Acknowledgements We would like to express our cordial thanks to Sri. CA. Basha Mohiuddin, Chairman, Smt. Rizwana Begum-Secretary and Sri. Touseef Ahmed-Vice Chairman -, Dr.M.Anwarullah Principal, Vidya Group of Institutions, Hyderabad for providing moral support, encouragement and advanced research facilities. Authors would like to thank the anonymous reviewers for their valuable comments. And they would like to thank Dr.V. Vijaya Kumar, Anurag Group of Institutions for his invaluable suggestions and constant encouragement that led to improvise the presentation quality of this paper. Refrences [1] Agrawal, Amr El Abbadi et al., Big data and cloud computing: current state and future opportunities, Proceedings of the 14th International Conference on Extending Database Technology.ACM, 2011 [2] Apache Hive, [Online] Available: [3] Brad Brown, Michael Chui, James Manyika, Are you ready for the era of big data, McKinsey Quaterly, Mckinsey Global Institute, October [4] Carlos Ordonez,"Algorithms and Optimizations for Big Data Analytics: Cubes", Tech Talks, University of Houston, USA. [5] Cisco White Paper,"Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update ", [Online] Available: /ekits/cisco_vni_ Global_Mobile_Data_Traffic_Forecast_2010_2015.pdf. [6] Dai, Jinquan, et al., Hitune: dataflow-based performance analysis for big data cloud, Proc. of the 2011 USENIX ATC (2011), pp [Online] Available: https://www.usenix. org/legacy/event/atc11/tech/final_files/dai.pdf. [7] Dryad- Microsoft Research, [Online] Available: research.microsoft.com/en-us/projects/dryad [8] DunrenChe, MejdlSafran, ZhiyongPeng,"From Big Data to Big Data Mining: Challenges, Issues, and Opportunities", DASFAA Workshops 2013, LNCS 7827, pp. 1 15, [9] DunrenChe, MejdlSafran, ZhiyongPeng,"From Big Data to Big Data Mining: Challenges, Issues, and Opportunities", DASFAA Workshops 2013, LNCS 7827, pp. 1 15, [10] Girola, Michele, et al., IBM Data Center Networking: Planning for virtualization and cloud computing, GOOGLE/ IP. COM/IBM Redbooks (2011). [Online] Available: [11] GrzegorzMalewicz, Matthew H. Austern, Aart J. C. Bik, James C.Dehnert, Ilan Horn, NatyLeiser, Grzegorz Czajkowski, Pregel, "A System for Large-Scale Graph Processing", SIGMOD 10, June 6 11, 2010, pp [12] IBM-What.is.Jaql, [Online] Available: software/data/ infosphere /hadoop/jaql. [13] Information System & Management, ISM Book, 1st Edition 2010, EMC2, Wiley Publishing. [14] J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, In USENIXSymposium on Operating Systems Design and Implementation, San Francisco, CA, Dec. 2004, pp [15] Jefry Dean, Sanjay Ghemwat,"Mapreduce: A Flexible Data Processing Tool", Communications of the ACM, Vol. 53, International Journal of Computer Science And Technology 101

5 Issuse 1, January 2010, pp [16] Jefry Dean, Sanjay Ghemwat,"Mapreduce: Simplified Data Processing on Large Clusters", Communications of the ACM, Vol. 51 pp , [17] Ji, Changqing, et al., Big data processing in cloud computing environments, Pervasive Systems, Algorithms and Networks (ISPAN), th International Symposium on. IEEE, [18] Kaushik, Rini T., Klara Nahrstedt., T: A data-centric cooling energy costs reduction approach for big data analytics cloud, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer SocietyPress,2012. [Online] Available: pdf [19] Lakew, Ewnetu Bayuh,"Managing Resource Usage and Allocations in Multi-Cluster Clouds 2013, [Online] Available: [20] Lu, Sifei, et al., A framework for cloud-based large-scale data analytics and visualization: Case study on multiscale climate data, Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference IEEE,2011. [21] MarcinJedyk, MAKING BIG DATA, SMALL,"Using distributed systems for processing, analysing and managing large huge data sets", Software Professional's Network, Cheshire Data systems Ltd. [22] PIG Tutorial, YahooInc., [Online] Available: yahoo.com/hadoop/tutorial/pigtutorial.html. [23] Ren, Yulong, Wen Tang, A Service Integrity Assurance Framework For Cloud Computing Based On Mapreduce", Proceedings of IEEE CCIS2012. Angzhou: 2012, pp , Oct Nov [24] S. Ghemawat, H. Gobioff, S. Leung, The Google File System, In ACM Symposium on Operating Systems Principles, Lake George, NY, Oct 2003, pp [25] Stephen kaisler, F.Armour, J.Alberto Espinosa, William Money, Big data:issues and Challenges moving Forward, 46th HICSS, US /12, [26] Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, JohnGerth, Justin Talbot, Khaled Elmeleegy, Russell Sears,"Online Aggregation and Continuous Query support in MapReduce", SIGMOD 10, June 6 11, 2010, Indianapolis, Indiana, USA. [27] Windows.Azure.Storage. [Online] Available: microsoft.com/windowsazure/features/storage. [28] Zhang, Linquan, et al., Moving Big Data to The Cloud: An Online Cost-Minimizing Approach, IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS (2013): [Online] Available: papers/info13-lq-.pdf. Dr. J. Sasi Kiran Graduated in B.Tech [EIE] from JNTU Hyd. He received Masters Degree in M.Tech [CSE] from JNT University, Hyderabad. He received Ph.D degree in Computer Science from University of Mysore, Mysore. At Present he is working as Professor in CSE and Dean Administration in Vidya Vikas Institute of Technology, Chevella, R.R. Dist Telangana State, India. His research interests include Image Processing, Data Mining and Network Security. He has published 39 research papers till now in various National, International Conferences, Proceedings and Journals. He has received best Teacher award twice from Vidya Group, Significant Contribution award from Computer Society of India and Passionate Researcher Trophy from Sri. Ramanujan Research Forum, GIET, Rajuhmundry, A.P, India. Ms.M.Sravanthi Graduated in B.Tech [CSE] from JNTU Hyd.She received Masters Degree in M.Tech [CSE] from JNTU Hyd. Her interested areas are Cloud Computing, Image Processing and Networking. Currently, she is working as an Associate Professor in Ms. K. Preethi Graduated in B.Tech [CSE] from JNTU Hyd.She received Masters Degree in M.Tech [CSE] from JNTU Hyd. Her interested areas are Cloud Computing, Distributed Systems, Data Mining. Currently, she is working as an Assistant Professor in Mrs. M. Anusha Graduated in B.Tech [CSE] from JNTU Hyd. She received Masters Degree in M. Tech from JNTU Hyd. Her Interested areas are Wireless Sensor Networks, Computer Organization, Network Security and Cryptography. Currently, she is working as an Associate Professor in She has published research papers in various National, International conferences, proceedings and Journals. 102 International Journal of Computer Science And Technology

Big Data on Cloud Computing- Security Issues

Big Data on Cloud Computing- Security Issues Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now

More information

An Improved Version of Big Data Classification and Clustering using Graph Search Technique

An Improved Version of Big Data Classification and Clustering using Graph Search Technique Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 5, Issue.

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

Big Data Analysis and HADOOP

Big Data Analysis and HADOOP Big Data Analysis and HADOOP B.Jegatheswari and M.Muthulakshmi III year MCA AVC College of engineering, Mayiladuthurai. Email ID: jjega.cool@gmail.com Mobile: 8220380693 Abstract: - Digital universe with

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW ON BIG DATA SECURITY IN CLOUD COMPUTING MISS. ANKITA S. AMBADKAR 1, PROF.

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

Big Data: An Introduction, Challenges & Analysis using Splunk

Big Data: An Introduction, Challenges & Analysis using Splunk pp. 464-468 Krishi Sanskriti Publications http://www.krishisanskriti.org/acsit.html Big : An Introduction, Challenges & Analysis using Splunk Satyam Gupta 1 and Rinkle Rani 2 1,2 Department of Computer

More information

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Study of

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.

Keywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools. Volume 3, Issue 8, August 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Significant Trends

More information

CIS492 Special Topics: Cloud Computing د. منذر الطزاونة

CIS492 Special Topics: Cloud Computing د. منذر الطزاونة CIS492 Special Topics: Cloud Computing د. منذر الطزاونة Big Data Definition No single standard definition Big Data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms,

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

BIG DATA ANALYSIS USING RHADOOP

BIG DATA ANALYSIS USING RHADOOP BIG DATA ANALYSIS USING RHADOOP HARISH D * ANUSHA M.S Dr. DAYA SAGAR K.V ECM & KLUNIVERSITY ECM & KLUNIVERSITY ECM & KLUNIVERSITY Abstract In this electronic age, increasing number of organizations are

More information

ABSTRACT I. INTRODUCTION. International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 1, Jan - Feb 2016

ABSTRACT I. INTRODUCTION. International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 1, Jan - Feb 2016 RESEARCH ARTICLE OPEN ACCESS Big Data Analytics: Challenges and Solutions Using Hadoop, Map Reduce and Big Table M. Dhavapriya, N. Yasodha Department of Computer Science NGM College, Pollachi Tamil Nadu

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Enhancing Massive Data Analytics with the Hadoop Ecosystem

Enhancing Massive Data Analytics with the Hadoop Ecosystem www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha

More information

Hadoop Technology for Flow Analysis of the Internet Traffic

Hadoop Technology for Flow Analysis of the Internet Traffic Hadoop Technology for Flow Analysis of the Internet Traffic Rakshitha Kiran P PG Scholar, Dept. of C.S, Shree Devi Institute of Technology, Mangalore, Karnataka, India ABSTRACT: Flow analysis of the internet

More information

A REVIEW REPORT ON DATA MINING

A REVIEW REPORT ON DATA MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Approaches for parallel data loading and data querying

Approaches for parallel data loading and data querying 78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Ashish R. Jagdale, Kavita V. Sonawane, Shamsuddin S. Khan

Ashish R. Jagdale, Kavita V. Sonawane, Shamsuddin S. Khan International Journal of Scientific & Engineering Research, Volume 5, Issue 7, July-2014 1156 Data Mining and Data Pre-processing for Big Data Ashish R. Jagdale, Kavita V. Sonawane, Shamsuddin S. Khan

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

A Study on Big-Data Approach to Data Analytics

A Study on Big-Data Approach to Data Analytics A Study on Big-Data Approach to Data Analytics Ishwinder Kaur Sandhu #1, Richa Chabbra 2 1 M.Tech Student, Department of Computer Science and Technology, NCU University, Gurgaon, Haryana, India 2 Assistant

More information

A Study on Data Analysis Process Management System in MapReduce using BPM

A Study on Data Analysis Process Management System in MapReduce using BPM A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,

More information

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE

UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE UNDERSTANDING THE BIG DATA PROBLEMS AND THEIR SOLUTIONS USING HADOOP AND MAP-REDUCE Mr. Swapnil A. Kale 1, Prof. Sangram S.Dandge 2 1 ME (CSE), First Year, Department of CSE, Prof. Ram Meghe Institute

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Evaluating partitioning of big graphs

Evaluating partitioning of big graphs Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Taking Data Analytics to the Next Level

Taking Data Analytics to the Next Level Taking Data Analytics to the Next Level Implementing and Supporting Big Data Initiatives What Is Big Data and How Is It Applicable to Anti-Fraud Efforts? 2 of 20 Definition Gartner: Big data is high-volume,

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

A Survey on Big Data Concepts and Tools

A Survey on Big Data Concepts and Tools A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering

More information

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania olga.banica@upit.ro 2 Faculty

More information

USING BIG DATA FOR INTELLIGENT BUSINESSES

USING BIG DATA FOR INTELLIGENT BUSINESSES HENRI COANDA AIR FORCE ACADEMY ROMANIA INTERNATIONAL CONFERENCE of SCIENTIFIC PAPER AFASES 2015 Brasov, 28-30 May 2015 GENERAL M.R. STEFANIK ARMED FORCES ACADEMY SLOVAK REPUBLIC USING BIG DATA FOR INTELLIGENT

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION

TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION TV INSIGHTS APPLICATION OF BIG DATA TO TELEVISION AN ARRIS WHITE PAPER BY: BHAVAN GANDHI, ALFONSO MARTINEZ- SMITH, & DOUG KUHLMAN TABLE OF CONTENTS ABSTRACT... 3 INTRODUCTION INTERSECTION OF TV & BIG DATA...

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Application Execution on Cloud using Hadoop Distributed File System

Big Application Execution on Cloud using Hadoop Distributed File System Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Big Data: Using ArcGIS with Apache Hadoop

Big Data: Using ArcGIS with Apache Hadoop 2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.

More information

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b

Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b International Conference on Materials Engineering and Information Technology Applications (MEITA 2015) Framework and key technologies for big data based on manufacturing Shan Ren 1, a, Xin Zhao 2, b 1

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information

Efficient Analysis of Big Data Using Map Reduce Framework

Efficient Analysis of Big Data Using Map Reduce Framework Efficient Analysis of Big Data Using Map Reduce Framework Dr. Siddaraju 1, Sowmya C L 2, Rashmi K 3, Rahul M 4 1 Professor & Head of Department of Computer Science & Engineering, 2,3,4 Assistant Professor,

More information

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE

NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com The amount of data that is traveling across

More information

Mining Big Data to Predicting Future

Mining Big Data to Predicting Future 27 Mining Big Data to Predicting Future Department of Computer Science and Engineering, Pondicherry Engineering College, Puducherry-605014, INDIA amitkrtyagi025@gmail.com Abstract Due to technological

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information