A Survey on Data Aggregation in Big Data and Cloud Computing

Size: px
Start display at page:

Download "A Survey on Data Aggregation in Big Data and Cloud Computing"

Transcription

1 A Survey on Data Aggregation in Big Data and Cloud Computing N.Karthick 1 and X.Agnes Kalrani 2 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu India. 2 Department of Computer Application, Karpagam University, Coimbatore, Tamilnadu India. Abstract -- Cloud computing, rapidly emerging as a new computation concept, offers agile and scalable resource access in a utility-like fashion, particularly for the processing of big data. An important open problem here is to effectively progress the data, from various geographical locations more time, into a cloud for efficient processing. Big Data introduces to datasets whose sizes are beyond the capability of typical database software tools to capture, accumulate, maintain and examined. Big Data is not just about the size of data but also contains data variety and data velocity. Simultaneously, these three attributes known as volume, velocity and variety form the three Vs of Big Data. The application of Big Data differs across verticals since of the several challenges that bring about the various use cases. The principle is that data aggregation is the response to maintaining up with the ever improving demands of big data. Data aggregation is a kind of data and information mining progression where data is explored, collected and presented in a report-based, shortened format to accomplish specific business purposes or processes and/or perform human analysis. Such information aggregation appears with natural issues, such as provision of poor quality, incorrect, inappropriate or fraudulent information. In this survey we discuss various methods of data aggregation in big data and cloud. Keywords-- Big Data, Cloud Computing, Data Management, Data Aggregation I. INTRODUCTION Big data and big data analytics participate an important role in today's fast-paced data-driven businesses [1]. The general characteristic of reallife applications is that they frequently have to agreement with a tremendous amount of data to obtain useful information. Achieving analytics and delivering accurate query results on such large amounts of data can be computationally expensive (long time for processing) [2] and resource intensive [3]. In common, overloaded systems and high delays are unsuited with a high-quality user experience, and the early estimated answers that are accurate sufficient are often of much greater value to users than tardy exact results [4]. Big data is certainly one of the biggest buzz phrases in IT today. Combined with virtualization and cloud computing, big data is a technological capacity that will force data centers to importantly transform and develop within the next five years. Related to virtualization, big data infrastructure is exclusive and can create an architectural upheaval in the way systems, storage, and software infrastructure are associated and maintained. Unlike preceding business analytics results, the real-time capacity of novel big data solutions can offer mission important business intelligence that can transform the shape and speed of enterprise decision building forever. Therefore, the way in which IT infrastructure is connected and distributed licences a fresh and significant analysis. At the same time, cloud technology is emerging as an infrastructure appropriate for building large and complex systems. Storage and compute resources provisioned from converged infrastructure and distributed resource pools present a cost-effective different to the traditional in-house data center. The cloud offers new levels of scalability, flexibility and availability, and permits easy access to data from several locations and any device. In addition, the cloud representations a data model of objects that contain data integrated among its user-defined and system-defined metadata as a single part. Thus, the cloud is an attractive model for a recent form of scalable predetermined content applications that involve rich metadata. The Cloud computing model has happen to an established option to contribute on foundations for data storage and computation resources. Together trends, ubiquitous sensing and Cloud computing, balance each other in a normal way. Sensor networks gather information regarding the physical environment, but in general require the resources to store and process the collected data over long periods of time. Cloud computing elastically presents the missing storage and computing resources. Specifically, it permits storing, process, and accessing the gathered sensor data efficiently via Cloud-based services. To demonstrate this fact, consider the following instance: Private weather stations do not only provide a local view on current sensor readings, but additionally broadcast their measurements to forecast services running in the Cloud. The Cloud services progression and aggregates the collected sensor readings and utilizes them in a Cloud-based weather simulation that produces an accurate forecast for a particular region. This estimate is then fed back to the private ISSN: Page28

2 weather stations in direct to support its owner with an added value service. The present technologies namely grid and cloud computing have all proposed to access massive amounts of computing power with aggregating resources and providing a single system view. Between these technologies, cloud computing is attractive a powerful architecture to achieve largescale and complex computing, and has transformed the way that computing infrastructure is theoretically and utilized. Additionally, a significant goal of these technologies is to distribute computing as a result for tackling big data, such as large scale, multi-media and high dimensional data sets. Big data and cloud computing are together the rapidly-moving technologies acknowledged in Gartner Inc. s 2012 Hype Cycle for Emerging Technologies. Cloud computing is connected with new concept for the provision of computing infrastructure and big data processing technique for all types of resources. Additionally, several new cloud-based technologies include to be approved since dealing with big data for concurrent processing is complex. One important quality of cloud computing is in aggregation of resources and data into data centers on the Internet. The present cloud services (IaaS, PaaS and SaaS) take in better execution effectiveness by aggregating application execution environments at different levels involving server, OS and middleware levels for distributing them. Meanwhile, a different approach of aggregating data into clouds has also been initiated, and it is to analyze such data with the efficient computational capability of clouds. In this approach, cloud is nowadays in the part of enlarging from application aggregation and distributing to data aggregation and utilization. To formulate full utilize of data, tens of terabytes (TBs) or tens of petabytes (PBs) of data require to be switched and a new kind of technology various from ordinary information and communications technology (ICT) is necessary. Big data processing on clouds can include hundreds of units namely application servers obtaining data and this direct to the generation of a large amount of read and write requests. Intended for that reason, it is assumed to be required situate tens to hundreds of servers for data storage in place to suitably allocate the read and write load and to make sure that failure of any of huge number of servers does not discontinue the entire service. Big data transfers to the collection and subsequent study of any significantly large collection of data that may include hidden insights or intelligence (user data, sensor data, machine data). When considered properly, big data can convey new business insights, open new markets, and create competitive benefits. Differentiated to the structured data in business applications, big data (following to IBM) contains of the following three key attributes: Variety Enlarges beyond structured data and involves semi-structure or unstructured data of all categorize, such as text, audio, video, click streams, log files, and more. Volume Comes in one size: large. Organizations are awash with data, simply accumulating hundreds of terabytes and petabytes of information. Velocity Sometimes should be analyzed in real time as it is streamed to an organization to enlarge the data s business value. In this model, there is a huge input data set shared over various servers. Every server processes it s allocated of the data, and produces a local intermediate result. The collection of intermediate solutions included on all the servers is then aggregated to produce the final outcome. Often the intermediate data is huge so it is separated across many servers which achieve aggregation on a subset of the data to produce the final solution. If there are N servers in the cluster, then utilizing all N servers to achieve the aggregation offers the maximum parallelism and it is often the default selection. In several cases there is less choice. For example, choosing the top k items of a set requires that the final solutions be produced with a single server. Another example is a distributed user query, which requires the result at a single server to enable low latency responses. MapReduce-based cloud technique is well proved for aggregation for various data allocations and simultaneous data processing for large-scale framework. II. RELATED WORKS Yu-Xiang Wang.et.al [5] presents Partition- Based Online Aggregation for cloud. Online aggregation is an attractive sampling-based knowledge to response aggregation queries by estimation to the final solution with the assurance interval which is becoming harder over time. It has been constructed into a MapReduce-based cloud system for analytics of big data, which permits users to monitor the query development, and keep money by killing the computation early once enough accuracy has been acquired. Moreover, there are some restrictions that contain the performance of online aggregation produced from the gap among in progress mechanism of MapReduce concept and the fundamentals of online aggregation, namely: 1) the low efficiency of sampling because of the absence of consideration for skewed data allocation for online aggregation in MapReduce, and 2) the massive redundant I/O expenditure of online aggregation ISSN: Page29

3 produced with the independent job execution method of MapReduce. In [5] introduce an online aggregation scheme in the cloud known as OLACloud, which is modified for MapReduce structure, to progress the overall performance for running OLA in cloud. This work develops a content-aware repartition technique to improve the sampling efficiency, and current a fair-allocation block placement policy, which is appropriate for our content-aware repartition scheme, to promise the storage and computation load balancing proficiently. It obtains a probabilistic model of block allocation to consider the fault-tolerance property of OLACloud and the unique MapReduce structure, and express the availability and efficiency of our OLA-Cloud. Furthermore present the query processing method with a distributed sampling strategy in OLACloud, which demonstrates how the distributed samples are composed for multi-queries accuracy evaluation, to minimize the redundant disk I/O cost for overlapped queries. Hadassa Daltrophe, Shlomi Dolev and Zvi Lotker [6] introduces data interpolation based Aggregation. Given a large set of measurement sensor data, in direct to recognize an easy function that captures the significance of the data assembled by the sensors, we recommend for representing the data with (spatial) functions, in specific with polynomials. Given a (exampled) set of values, we interpolate the datapoints to describe a polynomial that would signify the data. The interpolation is importance, because in exercise the data is able to be noisy and even Byzantine, where the Byzantine data stand for an adversarial value that is not restricted to being close to the correct measured data. The managing of big data structure also provides interest for the distributed interpolation technique. The concept of big data happens to one of the most important tasks in the occurrence of the enormous amount of data generated by nowadays. Communicating and analyzing the whole data does not extent, even when data aggregation procedures are employed. This [6] recommends a technique to represent the distributing big data by an easy conceptual function (known as polynomial) which will direct to efficient utilize of that data. To overcome the above restriction, in [6] produce two solutions, one that expands the Welch-Berlekamp method in the case of multidimensional data, and copes with discrete noise and Byzantine data, and the next one is based on Arora and Khot methods, expanding them in the case of multidimensional noisy and Byzantine data. During the research we include illustrious two different measures for the polynomial fitting to the Byzantine noisy data difficulty: the primary being the Welsh-Berlekamp simplification for discrete-noise multidimensional data and the second being the linear-programming estimate for multivariate polynomials. Approached by the error-correcting code techniques, we have recommended a way to signify a noisy malicious input with a multivariate polynomial. This technique assumes that the noise is discrete. When the noise is unrestricted, based on Bernstein- Markov Theorem and Arora & Khot algorithm, we have recommended a technique to restructure algebraic or trigonometric polynomial that traverses ρ fraction of the noisy multidimensional data. Linquan Zhang.et.al [7] offer two online algorithms: an online lazy migration (OLM) algorithm and a randomized fixed horizon control (RFHC) algorithm, for improving at any specified time the option of the data center for data aggregation and processing, in addition to the routes for transmitting data around. In this [7] consider the detailed cost composition and recognize the performance bottleneck for moving data into the cloud, and originate an offline optimal data migration issue. The optimization calculates efficient data routing and aggregation approach at any specified time, and minimizes the overall system expenditure and data transfer delay, over an extensive run of the system. A cloud user requires making a decision (i) via which VPN connections to upload its data to the cloud, and (ii) to which data center to aggregate data, for progression by a MapReduce-like framework, such that the economic charges acquired, in addition to the latency for the data to reach the aggregation point, are jointly reduced. Tomas Knap, Jan Michelfeit [8] presents linked data aggregation model. It s explained the data aggregation algorithm which assists data consumers to aggregate large amounts of linked data. In this work, concentrates on the description of the data aggregation procedure in OpenDataCleanStore (ODCS) model; in particular we explain (1) how the individual data conflicts are resolved and (2) how the aggregate quality of the aggregated data is calculated. The impact of the data aggregation method is then established by measuring the completeness and consistency of the original data sets and the data set produced by aggregated data. Simona Rabinovici-Cohen.et.al [9] introduces Preservation DataStores Cloud has a hierarchical data representation providing independent tenants whose benefits are managed in various aggregations support on content and value. Each aggregation has a distinct preservation profile that is reconfigurable dynamically and transparently as constraints continue varying. The preserved content can be accessed using virtual machines provisioned with data objects from the storage cloud mutually by the delegated rendering software. PDS Cloud utilizes a logical data representation and identical hierarchical resource naming path for unit in a preservation storage system. Aspects of data management configuration are hidden from the ISSN: Page30

4 user, yet combined into the model. The model is logical in the sense that it is not tied to any specific implementation. It provides itself to various realizations, depending on the capacities of the cloud storage platforms being utilized. The downwards hierarchy contains of: tenants, aggregations, dockets, and objects. Tenant is an enterprise or organization that connects in storing data in the cloud. Aggregation is a configuration profile, describing the policies and capacities for maintaining the data in storage. It identifies the features of one or more cloud models (address, identifications, etc.) that are being utilized for physical storage. It also allocates different fundamentals for managing and accessing data objects, namely integrity checking process or rendering properties, as applicable for the particular use case. Every aggregation belongs to a single tenant, and its configuration is tailored to the tenant s constraint and regulations. Docket is a collection of objects corresponding to a directory in a file system. A docket name is not unique, and can be reprocessed under various aggregations. Object is the essential preserved entity. Users access the data exclusive being aware of configuration features in the aggregation. A storage service layer, namely PDS Cloud, is answerable for interpreting the aggregation profile and engaging the significant data management capability. This involves accessing the particular cloud models delegated by the aggregation and mapping the logical dockets and objects to the physical name space of each particular cloud. Changes in aggregation configuration over time affect the handling in the storage service layer, but remain transparent to the user application interface. Paolo Costa.et.al [10] presents Camdoop, a MapReduce based system processing on Cam Cube which is considered as a cluster presents and makes use of a direct-connect network topology among servers immediately connected to other servers. Camdoop develops the property that CamCube server s direct traffic to achieve in-network aggregation of data through the shuffle phase. Camdoop makes aggregation trees with the sources of the intermediate data as the children and roots at the servers accomplishing the final reduction. Camdoop pushes aggregation into the network and parallelizes the shuffle and reduce phases. To realize this, it utilizes a custom transport service that supports reliable communication, applicationspecific scheduling of packets and packet aggregation across streams. Theoretically, for every reduce task, the transport service outlines a spanning tree that joins all servers, with the root being the server running the reduce task. When a server obtains the job requirement, it locally calculates the list of the vertex-ids of the job and recognizes the subset that is mapped to itself. All through the shuffle phase, disappears of the tree (i.e., the maptaskids) just greedily transmit the sorted intermediate data to their parent identifiers, via the CamCube key based routing. Each internal vertex combines and aggregates the data arriving with its children. To carry out this effectively, per child, a small packet buffer is managed by a pointer to the next value in the packet to be aggregated. When at least one packet is processed from every child, the server begins aggregating the (key, value) pairs across the packets via the merger function. The aggregate (key,value) pairs are then transmit to the parent. At the root, the solutions are aggregated via the reduce function and the results saved. Satoshi Tsuchiya.et.al [11] proposed high speed aggregation with distributed key value store (KVS). Distributed KVS accomplishes this high performance and availability with distributing data between various servers, however distributing data produces several specific types of processing complex at the same time. A typical instance of this kind of processing is aggregation processing, in which various data are aggregated to generate results. With distributed KVS, where data are allocated between servers, a great several pieces of communication among servers are produced for aggregation, which acquires time. To address this problem [11] developing a method to recognize high-speed aggregation with distributed KVS, and realized a research model presenting about eight times higher speed than that of the previous technique. With the innovative method, several basic operations (such as rekey, map, filter and reduce) that effectively run in distributed KVS have been utilized and combined to recognize aggregation processing. The individual basic operations are considered to effectively run in parallel in distributed KVS and the entire aggregation processing is effectively run in distributed KVS. Fig.1 Effect of datsa size In Fig.1 compare the execution time for varies methods; we vary the data size from 10G to 100G ISSN: Page31

5 and evaluate the performance of five basic methods for different data distributions, which include HAS- KVS, Camdoop, PDS Cloud, Linked Data Aggregation and Partition-Based Online Aggregation. Execution time refers as Algorithm running times are recorded. The running time of algorithm is proportional to the length of data size. III. CONCLUSION This survey paper presents the theoretic study of different data aggregation techniques in big data and cloud environment. The detail explanation of the methods is summarizing and also outlines the advantages of the different techniques in big data and cloud computing environment. In this survey discussed about Partition-Based Online Aggregation, data interpolation based Aggregation, OLM, RFHC, linked data aggregation, PDS Cloud, Camdoop and high-speed aggregation with distributed KVS. Each of the above surveyed techniques demonstrates and illustrates improved in some categories and not in some other categories. At the end of this survey, conclude that efficient data aggregation method is proposed by reducing the redundant statistical computation cost. Consistency of Data, Provided by Charles University, Jun [9] Rabinovici-Cohen.S, Marberg.J, Nagin. K and Pease. D, PDS Cloud: Long Term Digital Preservation in the Cloud, IC2E '13 Proceedings IEEE International Conference on Cloud Engineering, pp.38-45, [10] COSTA. P, DONNELLY. A, ROWSTRON. A and O SHEA.G, Camdoop: exploiting in-network aggregation for big data applications, In USENIX NSDI (2012). [11] Satoshi Tsuchiya, Yoshinori Sakamoto, Yuichi Tsuchimoto and Vivian Lee, Big data processing in cloud environments, FUJITSU Sci. Tech. J., Vol. 48, No. 2, pp (April 2012). IV. REFERENCES [1] Herodotou H, Lim H, Luo G et al. Starsh: A selftuning system for big data analytics. In Proc. the 15th CIDR, Apr. 2011, pp [2] Wu S, Ooi B C, Tan K L. Continuous sampling for online aggregation over multiple queries. In Proc. the 2010 International Conference on Management of Data (SIGMOD), June 2010, pp [3] Chaudhuri S, Das G, Datar M et al. Overcoming limitations of sampling for aggregation queries. In Proc. the 17th Int.Conf. Data Engineering, Apr. 2001, pp [4] Laptev N, Zeng K, Zaniolo C. Early accurate results for advanced analytics on MapReduce. PVLDB, 2012, 5(10): [5] Yu-Xiang Wang, Jun-Zhou Luo, Ai-Bo Song, Fang Dong, Partition-Based Online Aggregation with Shared Sampling in the Cloud, Journal of Computer Science and Technology, November 2013, Volume 28, Issue 6, pp [6] Hadassa Daltrophe, Shlomi Dolev and Zvi Lotker, Data Interpolation: An Efficient Sampling Alternative for Big Data Aggregation, CoRR abs/ (2012). [7] Linquan Zhang, Chuan Wu, Zongpeng Li, Chuanxiong Guo, Minghua Chen, and Francis C.M. Lau, Moving Big Data to The Cloud: An Online Cost- Minimizing Approach, IEEE Journal On Selected Areas In Communications, VOL. 31, NO. 12, DEC [8] Tomas Knap, Jan Michelfeit, Linked Data Aggregation Algorithm: Increasing Completeness and ISSN: Page32

Big Data Processing in Cloud Environments

Big Data Processing in Cloud Environments Big Data in Cloud Environments Satoshi Tsuchiya Yoshinori Sakamoto Yuichi Tsuchimoto Vivian Lee In recent years, accompanied by lower prices of information and communications technology (ICT) equipment

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

Massive Cloud Auditing using Data Mining on Hadoop

Massive Cloud Auditing using Data Mining on Hadoop Massive Cloud Auditing using Data Mining on Hadoop Prof. Sachin Shetty CyberBAT Team, AFRL/RIGD AFRL VFRP Tennessee State University Outline Massive Cloud Auditing Traffic Characterization Distributed

More information

Big Data Interpolation: An Effcient Sampling Alternative for Sensor Data Aggregation

Big Data Interpolation: An Effcient Sampling Alternative for Sensor Data Aggregation Big Data Interpolation: An Effcient Sampling Alternative for Sensor Data Aggregation Hadassa Daltrophe, Shlomi Dolev, Zvi Lotker Ben-Gurion University Outline Introduction Motivation Problem definition

More information

Cloud Based Distributed Databases: The Future Ahead

Cloud Based Distributed Databases: The Future Ahead Cloud Based Distributed Databases: The Future Ahead Arpita Mathur Mridul Mathur Pallavi Upadhyay Abstract Fault tolerant systems are necessary to be there for distributed databases for data centers or

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Survey on Load Rebalancing for Distributed File System in Cloud

Survey on Load Rebalancing for Distributed File System in Cloud Survey on Load Rebalancing for Distributed File System in Cloud Prof. Pranalini S. Ketkar Ankita Bhimrao Patkure IT Department, DCOER, PG Scholar, Computer Department DCOER, Pune University Pune university

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud

Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud Load Balancing and Maintaining the Qos on Cloud Partitioning For the Public Cloud 1 S.Karthika, 2 T.Lavanya, 3 G.Gokila, 4 A.Arunraja 5 S.Sarumathi, 6 S.Saravanakumar, 7 A.Gokilavani 1,2,3,4 Student, Department

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Network Virtualization for Large-Scale Data Centers

Network Virtualization for Large-Scale Data Centers Network Virtualization for Large-Scale Data Centers Tatsuhiro Ando Osamu Shimokuni Katsuhito Asano The growing use of cloud technology by large enterprises to support their business continuity planning

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Development and Runtime Platform and High-speed Processing Technology for Data Utilization

Development and Runtime Platform and High-speed Processing Technology for Data Utilization Development and Runtime Platform and High-speed Processing Technology for Data Utilization Hidetoshi Kurihara Haruyasu Ueda Yoshinori Sakamoto Masazumi Matsubara Dramatic increases in computing power and

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

COST MINIMIZATION OF RUNNING MAPREDUCE ACROSS GEOGRAPHICALLY DISTRIBUTED DATA CENTERS

COST MINIMIZATION OF RUNNING MAPREDUCE ACROSS GEOGRAPHICALLY DISTRIBUTED DATA CENTERS COST MINIMIZATION OF RUNNING MAPREDUCE ACROSS GEOGRAPHICALLY DISTRIBUTED DATA CENTERS Ms. T. Cowsalya PG Scholar, SVS College of Engineering, Coimbatore, Tamilnadu, India Dr. S. Senthamarai Kannan Assistant

More information

An Active Packet can be classified as

An Active Packet can be classified as Mobile Agents for Active Network Management By Rumeel Kazi and Patricia Morreale Stevens Institute of Technology Contact: rkazi,pat@ati.stevens-tech.edu Abstract-Traditionally, network management systems

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

Optimal Service Pricing for a Cloud Cache

Optimal Service Pricing for a Cloud Cache Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,

More information

An Advanced Commercial Contact Center Based on Cloud Computing

An Advanced Commercial Contact Center Based on Cloud Computing An Advanced Commercial Contact Center Based on Cloud Computing Li Pengyu, Chen Xin, Zhang Guoping, Zhang Boju, and Huang Daochao Abstract With the rapid development of cloud computing and information technology,

More information

IMPACT OF DISTRIBUTED SYSTEMS IN MANAGING CLOUD APPLICATION

IMPACT OF DISTRIBUTED SYSTEMS IN MANAGING CLOUD APPLICATION INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE IMPACT OF DISTRIBUTED SYSTEMS IN MANAGING CLOUD APPLICATION N.Vijaya Sunder Sagar 1, M.Dileep Kumar 2, M.Nagesh 3, Lunavath Gandhi

More information

Load Distribution in Large Scale Network Monitoring Infrastructures

Load Distribution in Large Scale Network Monitoring Infrastructures Load Distribution in Large Scale Network Monitoring Infrastructures Josep Sanjuàs-Cuxart, Pere Barlet-Ros, Gianluca Iannaccone, and Josep Solé-Pareta Universitat Politècnica de Catalunya (UPC) {jsanjuas,pbarlet,pareta}@ac.upc.edu

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Big Data on Cloud Computing- Security Issues

Big Data on Cloud Computing- Security Issues Big Data on Cloud Computing- Security Issues K Subashini, K Srivaishnavi UG Student, Department of CSE, University College of Engineering, Kanchipuram, Tamilnadu, India ABSTRACT: Cloud computing is now

More information

Overview of Next-Generation Green Data Center

Overview of Next-Generation Green Data Center Overview of Next-Generation Green Data Center Kouichi Kumon The dramatic growth in the demand for data centers is being accompanied by increases in energy consumption and operating costs. To deal with

More information

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing

Keywords Distributed Computing, On Demand Resources, Cloud Computing, Virtualization, Server Consolidation, Load Balancing Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Survey on Load

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Experimental

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Scaling 10Gb/s Clustering at Wire-Speed

Scaling 10Gb/s Clustering at Wire-Speed Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES

QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES QUALITY OF SERVICE METRICS FOR DATA TRANSMISSION IN MESH TOPOLOGIES SWATHI NANDURI * ZAHOOR-UL-HUQ * Master of Technology, Associate Professor, G. Pulla Reddy Engineering College, G. Pulla Reddy Engineering

More information

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured

More information

Efficient Data Replication Scheme based on Hadoop Distributed File System

Efficient Data Replication Scheme based on Hadoop Distributed File System , pp. 177-186 http://dx.doi.org/10.14257/ijseia.2015.9.12.16 Efficient Data Replication Scheme based on Hadoop Distributed File System Jungha Lee 1, Jaehwa Chung 2 and Daewon Lee 3* 1 Division of Supercomputing,

More information

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM Macha Arun 1, B.Ravi Kumar 2 1 M.Tech Student, Dept of CSE, Holy Mary

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS Ensuring Reliability and High Availability in Cloud by Employing a Fault Tolerance Enabled Load Balancing Algorithm G.Gayathri [1], N.Prabakaran [2] Department of Computer

More information

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast

Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast International Conference on Civil, Transportation and Environment (ICCTE 2016) Research of Railway Wagon Flow Forecast System Based on Hadoop-Hazelcast Xiaodong Zhang1, a, Baotian Dong1, b, Weijia Zhang2,

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May-2015 1681 Software as a Model for Security in Cloud over Virtual Environments S.Vengadesan, B.Muthulakshmi PG Student,

More information

Survey on Scheduling Algorithm in MapReduce Framework

Survey on Scheduling Algorithm in MapReduce Framework Survey on Scheduling Algorithm in MapReduce Framework Pravin P. Nimbalkar 1, Devendra P.Gadekar 2 1,2 Department of Computer Engineering, JSPM s Imperial College of Engineering and Research, Pune, India

More information

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service A Sumo Logic White Paper Introduction Managing and analyzing today s huge volume of machine data has never

More information

A Game Theory Modal Based On Cloud Computing For Public Cloud

A Game Theory Modal Based On Cloud Computing For Public Cloud IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. XII (Mar-Apr. 2014), PP 48-53 A Game Theory Modal Based On Cloud Computing For Public Cloud

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

A Survey Paper: Cloud Computing and Virtual Machine Migration

A Survey Paper: Cloud Computing and Virtual Machine Migration 577 A Survey Paper: Cloud Computing and Virtual Machine Migration 1 Yatendra Sahu, 2 Neha Agrawal 1 UIT, RGPV, Bhopal MP 462036, INDIA 2 MANIT, Bhopal MP 462051, INDIA Abstract - Cloud computing is one

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

A Comparative Study of cloud and mcloud Computing

A Comparative Study of cloud and mcloud Computing A Comparative Study of cloud and mcloud Computing Ms.S.Gowri* Ms.S.Latha* Ms.A.Nirmala Devi* * Department of Computer Science, K.S.Rangasamy College of Arts and Science, Tiruchengode. s.gowri@ksrcas.edu

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com

Big Data Challenges. Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania adrian.tole@yahoo.com The amount of data that is traveling across

More information

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL This chapter is to introduce the client-server model and its role in the development of distributed network systems. The chapter

More information

On-Demand Virtual System Service

On-Demand Virtual System Service On-Demand System Service Yasutaka Taniuchi Cloud computing, which enables information and communications technology (ICT) capacity to be used over the network, is entering a genuine expansion phase for

More information

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis A Web-based Interactive Data Visualization System for Outlier Subspace Analysis Dong Liu, Qigang Gao Computer Science Dalhousie University Halifax, NS, B3H 1W5 Canada dongl@cs.dal.ca qggao@cs.dal.ca Hai

More information

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Optimizing Data Center Networks for Cloud Computing

Optimizing Data Center Networks for Cloud Computing PRAMAK 1 Optimizing Data Center Networks for Cloud Computing Data Center networks have evolved over time as the nature of computing changed. They evolved to handle the computing models based on main-frames,

More information

Figure 1. The cloud scales: Amazon EC2 growth [2].

Figure 1. The cloud scales: Amazon EC2 growth [2]. - Chung-Cheng Li and Kuochen Wang Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 shinji10343@hotmail.com, kwang@cs.nctu.edu.tw Abstract One of the most important issues

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing

More information

Towards Analytical Data Management for Numerical Simulations

Towards Analytical Data Management for Numerical Simulations Towards Analytical Data Management for Numerical Simulations Ramon G. Costa, Fábio Porto, Bruno Schulze {ramongc, fporto, schulze}@lncc.br National Laboratory for Scientific Computing - RJ, Brazil Abstract.

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems

Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems 215 IEEE International Conference on Big Data (Big Data) Computing Load Aware and Long-View Load Balancing for Cluster Storage Systems Guoxin Liu and Haiying Shen and Haoyu Wang Department of Electrical

More information

Lecture 02b Cloud Computing II

Lecture 02b Cloud Computing II Mobile Cloud Computing Lecture 02b Cloud Computing II 吳 秀 陽 Shiow-yang Wu T. Sridhar. Cloud Computing A Primer, Part 2: Infrastructure and Implementation Topics. The Internet Protocol Journal, Volume 12,

More information

Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1

Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1 , pp. 331-342 http://dx.doi.org/10.14257/ijfgcn.2015.8.2.27 Study on Architecture and Implementation of Port Logistics Information Service Platform Based on Cloud Computing 1 Changming Li, Jie Shen and

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Paolo Costa costa@imperial.ac.uk

Paolo Costa costa@imperial.ac.uk joint work with Ant Rowstron, Austin Donnelly, and Greg O Shea (MSR Cambridge) Hussam Abu-Libdeh, Simon Schubert (Interns) Paolo Costa costa@imperial.ac.uk Paolo Costa CamCube - Rethinking the Data Center

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

marlabs driving digital agility WHITEPAPER Big Data and Hadoop marlabs driving digital agility WHITEPAPER Big Data and Hadoop Abstract This paper explains the significance of Hadoop, an emerging yet rapidly growing technology. The prime goal of this paper is to unveil

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

ADAPTIVE LOAD BALANCING FOR CLUSTER USING CONTENT AWARENESS WITH TRAFFIC MONITORING Archana Nigam, Tejprakash Singh, Anuj Tiwari, Ankita Singhal

ADAPTIVE LOAD BALANCING FOR CLUSTER USING CONTENT AWARENESS WITH TRAFFIC MONITORING Archana Nigam, Tejprakash Singh, Anuj Tiwari, Ankita Singhal ADAPTIVE LOAD BALANCING FOR CLUSTER USING CONTENT AWARENESS WITH TRAFFIC MONITORING Archana Nigam, Tejprakash Singh, Anuj Tiwari, Ankita Singhal Abstract With the rapid growth of both information and users

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com.

Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. A Big Data Hadoop Architecture for Online Analysis. Suresh Lakavath csir urdip Pune, India lsureshit@gmail.com. Ramlal Naik L Acme Tele Power LTD Haryana, India ramlalnaik@gmail.com. Abstract Big Data

More information

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics

Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics P.Saravana kumar 1, M.Athigopal 2, S.Vetrivel 3 Assistant Professor, Dept

More information

Government Technology Trends to Watch in 2014: Big Data

Government Technology Trends to Watch in 2014: Big Data Government Technology Trends to Watch in 2014: Big Data OVERVIEW The federal government manages a wide variety of civilian, defense and intelligence programs and services, which both produce and require

More information

The Power Marketing Information System Model Based on Cloud Computing

The Power Marketing Information System Model Based on Cloud Computing 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.96 The Power Marketing Information

More information

Grid Computing Vs. Cloud Computing

Grid Computing Vs. Cloud Computing International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 6 (2013), pp. 577-582 International Research Publications House http://www. irphouse.com /ijict.htm Grid

More information

Machine Data Analytics with Sumo Logic

Machine Data Analytics with Sumo Logic Machine Data Analytics with Sumo Logic A Sumo Logic White Paper Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth

More information