1 IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 ISSN : (Online) ISSN : (Print) A Survey on Challenges and Advantages in Big Data LENKA VENKATA SATYANARAYANA Dept. of CSE, Aditya Institute of Technology and Management, Tekkali, AP, India Abstract Big data, which refers to the data sets that are too big to be handled using the existing database management tools, are emerging in many important applications, such as Internet search, business informatics, social networks, social media, genomics, and meteorology. Big data presents a grand challenge for database and data analytics research. The exciting activities addressing the big data challenge. The central theme is to connect big data with people in various ways. Particularly, This paper will showcase our recent progress in user preference understanding, context-aware, on-demand data mining using crowd intelligence, summarization and explorative analysis of large data sets, and privacy preserving data sharing and analysis.the primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. This paper surveys different hardware platforms available for big data analytics and assesses the advantages and drawbacks of Big Data Keywords Data Volume, Data Velocity, Data Variety, Data Value, Data Management Issues, Challenges in Big Data, Big Data Technologies I. Introduction Big data is defined as large amount of data which requires new technologies and architectures to make possible to extract value from it by capturing and analysis process.new sources of big data include location specific data arising from traffic management, and from the tracking of personal devices such as Smartphones. Big Data has emerged because we are living in a society which makes increasing use of data intensive technologies. Due to such large size of data it becomes very difficult to perform effective analysis using the existing traditional techniques. Since Big data is a recent upcoming technology in the market which can bring huge benefits to the business organizations, it becomes necessary that various challenges and issues associated in bringing and adapting to this technology are need to be understood. Big Data concept means a datasets which continues to grow so much that it becomes difficult to manage it using existing database management concepts & tools. The difficulties can be related to data capture, storage, search, sharing, analytics and visualization etc. Big data due to its various properties like volume, velocity, variety, variability, value and complexity put forward many challenges. The various challenges faced in large data management include scalability, unstructured data, accessibility, real time analytics, fault tolerance and many more. In addition to variations in the amount of data stored in different sectors, the types of data generated and stored i.e., encoded video, images, audio, or text/numeric information; also differ markedly from industry to industry II. Big Data Characteristics A. Data Volume The Big word in Big data itself defines the volume. At present the data existing is in petabytes (1015) and is supposed to increase to zettabytes (1021) in nearby future. Data volume measures the amount of data available to an organization, which does not necessarily have to own all of it as long as it can access it. B. Data Velocity Velocity in Big data is a concept which deals with the speed of the data coming from various sources. This characteristic is not being limited to the speed of incoming data but also speed at which the data flows and aggregated. C. Data Variety Data variety is a measure of the richness of the data representation text,images video, audio, etc. Data being produced is not of single category as it not only includes the traditional data but also the semi structured data from various resources like web Pages, Web Log Files, social media sites, , documents. D. Data Value Data value measures the usefulness of data in making decisions. Data science is exploratory and useful in getting to know the data, but analytic science encompasses the predictive power of big data. User can run certain queries against the data stored and thus can deduct important results from the filtered data obtained and can also rank it according to the dimensions they require. These reports help these people to find the business trends according to which they can change their strategies. E. Complexity Complexity measures the degree of interconnectedness (possibly very large) and interdependence in big data structures such that a small change (or combination of small changes) in one or a few elements can yield very large changes or a small change that ripple across or cascade through the system and substantially affect its behavior, or no change at all III. Issues in Big Data The issues in Big Data are some of the conceptual points that should be understood by the organization to implement the technology effectively. Big data Issues are need not be confused with problems but they are important to know and crucial to handle. Fig. 1: Big Data Architecture International Journal of Computer Science And Technology 115
2 IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 ISSN : (Online) ISSN : (Print) Fig. 2: Explosion in size of Data A. Issues related to the Characteristics Data Volume As data volume increases, the value of different data records will decrease in proportion to age, type, richness, and quantity among other factors. The social networking sites existing are themselves producing data in order of terabytes everyday and this amount of data is definitely difficult to be handled using the existing traditional systems. Data Velocity Our traditional systems are not capable enough on performing the analytics on the data which is constantly in motion. E-Commerce has rapidly increased the speed and richness of data used for different business transactions (for example, web-site clicks). Data velocity management is much more than a bandwidth issue. Data Variety All this data is totally different consisting of raw, structured, semi structured and even unstructured data which is difficult to be handled by the existing traditional analytic systems. From an analytic perspective, it is probably the biggest obstacle to effectively using large volumes of data. Incompatible data formats, non-aligned data structures, and inconsistent data semantics represents significant challenges that can lead to analytic sprawl. Data Value As the data stored by different organizations is being used by them for data analytics. It will produce a kind of gap in between the Business leaders and the IT professionals the main concern of business leaders would be to just adding value to their business and getting more and more profit unlike the IT leaders who would have to concern with the technicalities of the storage and processing. Data Complexity One current difficulty of big data is working with it using relational databases and desktop statistics/visualization packages, requiring massively parallel software running on tens, hundreds, or even thousands of servers. It is quite an undertaking to link, match, cleanse and transform data across systems coming from various sources. It is also necessary to connect and correlate relationships, hierarchies and multiple data linkages or data can quickly spiral out of control. B. Storage and Transport Issues The quantity of data has exploded each time we have invented a new storage medium. The difference about the most recent data explosion, mainly due to social media, is that there has been no new storage medium. Moreover, data is being created by everyone and everything, (from Mobile Devices to Super Computers) not just, as here to fore, by professionals such as scientist, journalists, writers etc. Current disk technology limits are about 4 terabytes (1012) per disk. So, 1 Exabyte (1018) would require 25,000 disks. Even if an 116 International Journal of Computer Science And Technology Exabyte of data could be processed on a single computer system, it would be unable to directly attach the requisite number of disks. Access to that data would overwhelm current communication networks. Assuming that a 1 gigabyte per second network has an effective sustainable transfer rate of 80%, the sustainable bandwidth is about 100 megabytes. Thus, transferring an Exabyte would take about 2800 hours, if we assume that a sustained transfer could be maintained. It would take longer time to transmit the data from a collection or storage point to a processing point than the time required to actually process it. To handle this issue, the data should be processed in place and transmit only the resulting information. In other words, bring the code to the data, unlike the traditional method of bring the data to the code.. C. Data Management Issues Data Management will, perhaps, be the most difficult problem to address with big data. Resolving issues of access, utilization, updating, governance, and reference (in publications) have proven to be major stumbling blocks. The sources of the data are varied - by size, by format, and by method of collection. Individuals contribute digital data in mediums comfortable to them likedocuments, drawings, pictures, sound and video recordings, models, software behaviors, user interface designs, etc., with or without adequate metadata describing what, when, where, who, why and how it was collected and its provenance.unlike the collection of data by manual methods, where rigorous protocols are often followed in order to ensure accuracy and validity, Digital data collection is much more relaxed. Given the volume, it is impractical to validate every data item. New approaches to data qualification and validation are needed. The richness of digital data representation prohibits a personalized methodology for data collection. To summarize, there is no perfect big data management solution yet. This represents an important gap in the research literature on big data that needs to be filled. D. Processing Issues Assume that an Exabyte of data needs to be processed in its entirety. For simplicity, assume the data is chunked into blocks of 8 words, so 1 Exabyte = 1K petabytes. Assuming a processor expends 100 instructions on one block at 5 gigahertz, the time required for endto- end processing would be 20 nanoseconds. To process 1K petabytes would require a total end-to-end processing time of roughly 635 years. Thus, effective processing of Exabyte of data will require extensive parallel processing and new analytics algorithms in order to provide timely and actionable. IV. Challenges in Big Data The challenges in Big Data are usually the real implementation hurdles which require immediate attention. Any implementation without handling these challenges may lead to the failure of the technology implementation and some unpleasant results. A. Privacy and Security It is the most important challenges with Big data which is sensitive and includes conceptual,technical as well as legal significance. 1. The personal information (e.g. in database of a merchant or social networking website) of a person when combined with external large data sets, leads to the inference of new facts about that person and it s possible that these kinds of facts about the person are secretive and the person might not want the data owner to know or any person to know about them. 2. Information regarding the people is collected and used in
3 ISSN : (Online) ISSN : (Print) order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of. Another important consequence arising would be Social stratification where a literate person would be taking advantages of the Big data predictive analysis and on the other hand underprivileged will be easily identified and treated worse. Big Data used by law enforcement will increase the chances of certain tagged people to suffer from adverse consequences without the ability to fight back or even having knowledge that they are being discriminated. B. Data Access and Sharing of Information If the data in the companies information systems is to be used to make accurate decisions in time it becomes necessary that it should be available in accurate, complete and timely manner. This makes the data management and governance process bit complex adding the necessity to make data open and make it available to government agencies in standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements.expecting sharing of data between companies is awkward because of the need to get an edge in business. Sharing data about their clients and operations threatens the culture of secrecy and competitiveness. C. Analytical Challenges The main challenging questions are as: 1. What if data volume gets so large and varied and it is not known how to deal with it? 2. Does all data need to be stored? 3. Does all data need to be analyzed? 4. How to find out which data points are really important? 5. How can the data be used to best advantage? Big data brings along with it some huge analytical challenges. The type of analysis to be done on this huge amount of data which can be unstructured, semi structured or structured requires a large number of advance skills. Moreover the type of analysis which is needed to be done on the data depends highly on the results to be obtained i.e. decision making. This can be done by using one of two techniques: either incorporate massive data volumes in analysis or determine upfront which Big data is relevant. D. Human Resources and Manpower Since Big data is at its youth and an emerging technology so it needs to attract organizations and youth with diverse new skill sets. These skills should not be limited to technical ones but also should extend to research, analytical, interpretive and creative ones. These skills need to be developed in individuals hence requires training programs to be held by the organizations. Moreover the Universities need to introduce curriculum on Big data to produce skilled employees in this expertise. E. Technical Challenges 1. Fault Tolerance With the incoming of new technologies like Cloud computing and Big data it is always intended that whenever the failure occurs the damage done should be within acceptable threshold rather than beginning the whole task from the scratch. Fault-tolerant computing is extremely hard, involving intricate algorithms. IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 It is simply not possible to devise absolutely foolproof, 100% reliable fault tolerant machines or software. Thus the main task is to reduce the probability of failure to an acceptable level. Unfortunately, the more we strive to reduce this probability, the higher the cost. Two methods which seem to increase the fault tolerance in Big data are as: First is to divide the whole computation being done into tasks and assign these tasks to different nodes for computation. Second is, one node is assigned the work of observing that these nodes are working properly. If something happens that particular task is restarted. But sometimes it s quite possible that that the whole computation can t be divided into such independent tasks. There could be some tasks which might be recursive in nature and the output of the previous computation of task is the input to the next computation. Thus restarting the whole computation becomes cumbersome process. This can be avoided by applying Checkpoints which keeps the state of the system at certain intervals of the time. In case of any failure, the computation can restart from last checkpoint maintained. 2. Scalability The scalability issue of Big data has lead towards cloud computing, which now aggregates multiple disparate workloads with varying performance goals into very large clusters. This requires a high level of sharing of resources which is expensive and also brings with it various challenges like how to run and execute various jobs so that we can meet the goal of each workload cost effectively. It also requires dealing with the system failures in an efficient manner which occurs more frequently if operating on large clusters. These factors combined put the concern on how to express the programs,even complex machine learning tasks. There has been a huge shift in the technologies being used. Hard Disk Drives (HDD) are being replaced by the solid state Drives and Phase Change technology which are not having the same performance between sequential and random data transfer. Thus, what kinds of storage devices are to be used; is again a big question for data storage. 3. Quality of Data Collection of huge amount of data and its storage comes at a cost. More data if used for decision making or for predictive analysis in business will definitely lead to better results. Business Leaders will always want more and more data storage whereas the IT Leaders will take all technical aspects in mind before storing all the data. Big data basically focuses on quality data storage rather than having very large irrelevant data so that better results and conclusions can be drawn. This further leads to various questions like how it can be ensured that which data is relevant, how much data would be enough for decision making and whether the stored data is accurate or not to draw conclusions from it etc. 4. Heterogeneous Data Unstructured data represents almost every kind of data being produced like social media interactions, to recorded meetings, to handling of PDF documents, fax transfers, to s and more. Working with unstructured data is cumbersome and of course costly too. Converting all this unstructured data into structured one is also not feasible. Structured data is always organized into highly mechanized and manageable way. It shows well integration with database but unstructured data is completely raw and unorganized. International Journal of Computer Science And Technology 117
4 IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015 ISSN : (Online) ISSN : (Print) V. Big Data Technologies Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Technologies being applied to big data include Massively Parallel Processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems. Real or near-real time information delivery is one of the defining characteristics of Big Data Analytics. A wide variety of techniques and technologies has been developed and adapted to aggregate, manipulate, analyze, and visualize big data. These techniques and technologies draw from several fields including statistics, computer science, applied mathematics, and economics. This means that an organization that intends to derive value from big data has to adopt a flexible, multidisciplinary approach A. Big Data Technologies The International Data Corporation (IDC) study predicts that overall data will grow by 50 times by 2020, driven in large part by more embedded systems such as sensors in clothing, medical devices and structures like buildings and bridges. The study also determined that unstructured information - such as files, and video - will account for 90% of all data created over the next decade. But the number of IT professionals available to manage all that data will only grow by 1.5 times today s levels (World s data will grow by 50X in next decade, IDC study predicts, 2011). The digital universe is 1.8 trillion gigabytes in size and stored in 500 quadrillion files. And its size gets more than double in every two years time frame. If we compare the digital universe with our physical universe then it s nearly as many bits of information in the digital universe as stars in our physical universe Given a very large heterogeneous data set, a major challenge is to figure out what data one has and how to analyze it. Unlike the previous two sections, hybrid data sets combined from many other data sets hold more surprises than immediate answers. To analyze these data will require adapting and integrating multiple analytic techniques to see around corners, e.g., to realize that new knowledge is likely to emerge in a non-linear way. It is not clear that statistical analysis methods, as Ayres argues, are or can be the whole answer. A Big Data platform should give a solution which is designed specifically with the needs of the enterprise in the mind. The following are the basic features of a Big Data offering Comprehensive - It should offer a broad platform and address all three dimensions of the Big Data challenge -Volume, Variety and Velocity. Enterprise-ready - It should include the performance, security, usability and reliability features. Integrated - It should simplify and accelerates the introduction of Big Data technology to enterprise. It should enable integration with information supply chain including databases, data warehouses and business intelligence applications. Open source based - It should be open source technology with the enterprise-class functionality and integration. Low latency reads and updates Robust and fault-tolerant Scalability Extensible 118 International Journal of Computer Science And Technology Allows adhoc queries Minimal maintenance Table 1: Tools Available for Big Data Tool IBM InfoSphereBigInsights WX2 Kognitio Analytical ParAccel Analytic SAND Analytic Tool Description It is open source Apache Hadoop with IBM Big Sheet which is able to analyze data in its native format without imposing any schema/structure and enables fast adhoc analysis. It is a fast and scalable inmemory analytic database platform It is a columnar, massively parallel processing (MPP) analytic database platform. It has strong features for the query optimization and compilation, compression and network interconnect. It is a columnar analytic database platform that achieves linear data scalability through massively parallel processing (MPP). VI. Big Data Advantages The Big Data has numerous advantages on society, science and technology. It is unto the way that how it is used for the human beings. Some of the advantages (Marr, 2013)are described below: A. Understanding and Targeting Customers This is one of the biggest and most publicized areas of big data use today. Here, big data is used to better understand customers and their behaviors and preferences. Companies are keen to expand their traditional data sets with social media data, browser logs as well as text analytics and sensor data to get a more complete picture of their customers. The big objective, in many cases, is to create predictive models. B. Understanding and Optimizing Business Process Big data is also increasingly used to optimize business processes. Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts. One particular business process that is seeing a lot of big data analytics is supply chain or delivery route optimization. HR business processes are also being improved using big data analytics. This includes the optimization of talent acquisition. C. Improving Science and Research Science and research is currently being transformed by the new possibilities big data brings. Take, for example, CERN, the Swiss nuclear physics lab with its Large Hadron Collider, the world s largest and most powerful particle accelerator. Experiments to unlock the secrets of our universe how it started and works - generate huge amounts of data. The CERN datacentre has 65,000 processors to analyse its 30 petabytes of data.
5 ISSN : (Online) ISSN : (Print) D. Improving Healthcare and Public Health The computing power of big data analytics enables us to decode entire DNA strings in minutes and will allow us to find new cures and better understand and predict disease patterns. Just think of what happens when all the individual data from smart watches and wearable devices can be used to apply it to millions of people and their various diseases. The clinical trials of the future won t be limited by small sample sizes but could potentially include everyone. E. Optimizing Machine and Device Performance Big data analytics help machines and devices become smarter and more autonomous. For example, big data tools are used to operate Google s self-driving car. The Toyota Prius is fitted with cameras, GPS as well as powerful computers and sensors to safely drive on the road without the intervention of human beings. Big data tools are also used to optimize energy grids using data from smart meters. We can even use big data tools to optimize the performance of computers and data warehouses F. Improving Security and Law Enforcement Big data is applied heavily in improving security and enabling law enforcement. The revelations are that the National Security Agency (NSA) in the U.S. uses big data analytics to foil terrorist plots (and maybe spy on us). Others use big data techniques to detect and prevent cyber-attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data use it to detect fraudulent transactions. VII. Conclusion Here some of the important issues are covered that are needed to be analyzed by the organization while estimating the significance of implementing the Big Data technology and some direct challenges to the infrastructure of the technology. The commercial impacts of the Big data have the potential to generate significant productivity growth for a number of vertical sectors. They should also create healthy demand for talented individuals who are capable to help organizations in making sense of this growing volume of raw data. In short, Big Data presents opportunity to create unprecedented business advantages and better service delivery. A regulatory framework for big data is essential. That framework must be constructed with a clear understanding of the ravages that have been wrought on personal interests by the reduction of information to data, its centralization, and its expropriation but the biggest gap is the lack of the skilled managers to make decisions based on analysis by a factor of 10x. Growing talent and building teams to make analytic-based decisions is the key to realize the value of Big Data. IJCST Vo l. 6, Is s u e 2, Ap r i l - Ju n e 2015  Kaisler, S., Armour, F., Espinosa, J. A., Money, W.,"Big Data: Issues and Challenges Moving Forward", International Confrence on System Sciences (pp ). Hawaii: IEEE Computer Soceity,  Katal, A., Wazid, M., Goudar, R. H.,"Big Data: Issues, Challenges, Tools and Good Practices", IEEE, pp ,  Marr, B. (2013, November 13),"The Awesome Ways Big Data is used Today to Change Our World", Retrieved November 14, 2013, from LinkedIn: https://www.linkedin.com/today/ post/article/ the-awesomeways-big-data-is-used-today-tochange-our-world  Patel, A. B., Birla, M., Nair, U.,"Addressing Big Data Problem Using Hadoop and Nirma University", Gujrat: Nirma University,  Singh, S., Singh, N.,"Big Data Analytics", International Conference on Communication, Information & Computing Technology (ICCICT) (pp. 1-4). Mumbai: IEEE,  The 2011 Digital Universe Study: Extracting Value from Chaos. (2011, November 30). Retrieved from EMC:  World s data will grow by 50X in next decade, IDC study predicts. (2011, June 28). Retrieved from Computer World: World_s_data_will_grow_by_50X_in_next_decade_IDC_ study_predicts  Dumbill E: What Is Big Data? An Introduction to the Big Data Landscape. In Strata 2012: Making Data Work. O Reilly, Santa Clara, CA O Reilly; References  Hansen, C., Big Data: A Scientific Visualization Perspective, SCI Institute Professor of Computer Science, University of Utah,  Emmanuel Letouzé (May 2012), Big Data for Development: Challenges & Opportunities, [Online] Available: BigDataforDevelopment-GlobalPulseMay2012.pdf.  Aveksa Inc.,"Ensuring Big Data, Security with Identity and Access Management", Waltham,MA: Aveksa,  Hewlett-Packard Development Company,"Big Security for Big Data. L.P.: Hewlett-Packard Development Company", International Journal of Computer Science And Technology 119
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer
AN EFFICIENT SELECTIVE DATA MINING ALGORITHM FOR BIG DATA ANALYTICS THROUGH HADOOP Asst.Prof Mr. M.I Peter Shiyam,M.E * Department of Computer Science and Engineering, DMI Engineering college, Aralvaimozhi.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Nguyễn Thị Thúy Hoài, College of technology _ Danang University Abstract The threading development of IT has been bringing more challenges for administrators to collect, store and analyze massive amounts
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Exploiting Data at Rest and Data in Motion with a Big Data Platform Sarah Brader, email@example.com What is Big Data? Where does it come from? 12+ TBs of tweet data every day 30 billion RFID tags
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: firstname.lastname@example.org
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
A COMPARITIVE AND ANALYTICAL STUDY ON BIG DATA Rishav Shaw School of Computing Science and Engineering VIT University Vellore 632014 ABSTRACT:- In this paper, we are going to review and present the existing
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
What happens when Big Data and Master Data come together? Jeremy Pritchard Master Data Management fgdd 1 What is Master Data? Master data is data that is shared by multiple computer systems. The Information
9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Big Data and Its Impact on IT Consultancy 1 Habab Mohamed, 2 Nur Nadiah Md Hasim, 3 Alfiya Zangirova, 4 Jamaludin Ibrahim 1,2,3,4 Department of Information Systems, Kulliyyah of Information and Communication
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
BIG Data Analytics Move to Competitive Advantage where is technology heading today Standardization Open Source Automation Scalability Cloud Computing Mobility Smartphones/ tablets Internet of Things Wireless
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
What is big data? Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada 1 2011 IBM Corporation Agenda The world is changing What
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India
Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
From Internet Data Centers to Data Centers in the Cloud This case study is a short extract from a keynote address given to the Doctoral Symposium at Middleware 2009 by Lucy Cherkasova of HP Research Labs
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
Big Data and Hadoop Sreedhar C, Dr. D. Kavitha, K. Asha Rani Abstract Big data has become a buzzword in the recent years. Big data is used to describe a massive volume of both structured and unstructured
Smarter Planet evolution 13/03/2012 2012 IBM Corporation Ignacio Pérez González Enterprise Architect email@example.com @ignaciopr Mike May Technologies of the Change Capabilities Tendencies Vision
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
DATA STORAGE Martin Strohbach, AGT International (R&D) THE DATA VALUE CHAIN Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
Taming Big Data 1010data ACCELERATES INSIGHT Lightning-fast and transparent, 1010data analytics gives you instant access to all your data, without technical expertise or expensive infrastructure. TAMING
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
Big Application Execution on Cloud using Hadoop Distributed File System Ashkan Vates*, Upendra, Muwafaq Rahi Ali RPIIT Campus, Bastara Karnal, Haryana, India ---------------------------------------------------------------------***---------------------------------------------------------------------
DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
Next Internet Evolution: Getting Big Data insights from the Internet of Things Internet of things are fast becoming broadly accepted in the world of computing and they should be. Advances in Cloud computing,
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
Data Intensive Scalable Computing Harnessing the Power of Cloud Computing Randal E. Bryant February, 2009 Our world is awash in data. Millions of devices generate digital data, an estimated one zettabyte
Volume 3, Issue 9, September 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Real Time
Solutions for Communications with IBM Netezza Analytics Accelerator The all-in-one network intelligence appliance for the telecommunications industry Highlights The Analytics Accelerator combines speed,
Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation firstname.lastname@example.org Trend Too much information is a storage issue, certainly, but too much information is also
Why is BIG Data Important? March 2012 1 Why is BIG Data Important? A Navint Partners White Paper May 2012 Why is BIG Data Important? March 2012 2 What is Big Data? Big data is a term that refers to data
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
IBM Software Hadoop in the cloud Leverage big data analytics easily and cost-effectively with IBM InfoSphere 1 2 3 4 5 Introduction Cloud and analytics: The new growth engine Enhancing Hadoop in the cloud
Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources
Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Next Generation Business Performance Management Solution Why Existing Business Intelligence (BI) Products are Inadequate Changing Business Environment In the face of increased competition, complex customer
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 5, Issue. 1, January 2016,
ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate
Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,
Your consent to our cookies if you continue to use this website.