CLOUD BASED PEER TO PEER NETWORK FOR ENTERPRISE DATAWAREHOUSE SHARING



Similar documents
AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS

DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM

DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING

Deep Explore in Big Data Analytics for Business Intelligence

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Efficient Cloud Management for Parallel Data Processing In Private Cloud

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Implementation of Cloud Computing Approach Based on Mobile Agents

Implementation of P2P Reputation Management Using Distributed Identities and Decentralized Recommendation Chains

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

ISSN Vol.04,Issue.19, June-2015, Pages:

Cloud Computing Architectures and Design Issues

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

A Quality Model for E-Learning as a Service in Cloud Computing Framework

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

IMPACT OF DISTRIBUTED SYSTEMS IN MANAGING CLOUD APPLICATION

Survey on Load Rebalancing for Distributed File System in Cloud

Scalable Multiple NameNodes Hadoop Cloud Storage System

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

Sharing Of Multi Owner Data in Dynamic Groups Securely In Cloud Environment

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

Automatic Annotation Wrapper Generation and Mining Web Database Search Result

Data processing goes big

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

CitusDB Architecture for Real-Time Big Data

An Approach Towards Customized Multi- Tenancy

Data Mining with Big Data e-health Service Using Map Reduce

Cloud Computing for Agent-based Traffic Management Systems

Applied research on data mining platform for weather forecast based on cloud storage

Cloud-dew architecture: realizing the potential of distributed database systems in unreliable networks

A Comparative Study of cloud and mcloud Computing

Secured Load Rebalancing for Distributed Files System in Cloud

Cloud Storage Solution for WSN Based on Internet Innovation Union

SCHEDULING IN CLOUD COMPUTING

Dynamic Querying In NoSQL System

ISSN: (Online) Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

A UPS Framework for Providing Privacy Protection in Personalized Web Search

Data Modeling for Big Data

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

International Journal of Advanced Research in Computer Science and Software Engineering

Research and realization of Resource Cloud Encapsulation in Cloud Manufacturing

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Cloud computing doesn t yet have a

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Cluster, Grid, Cloud Concepts

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Logical Data Models for Cloud Computing Architectures

GRIDB: A SCALABLE DISTRIBUTED DATABASE SHARING SYSTEM FOR GRID ENVIRONMENTS *

AN EFFICIENT STRATEGY OF AGGREGATE SECURE DATA TRANSMISSION

NoSQL and Hadoop Technologies On Oracle Cloud

How to Enhance Traditional BI Architecture to Leverage Big Data

Big Data Storage Architecture Design in Cloud Computing

Benchmarking and Analysis of NoSQL Technologies

Optimal Service Pricing for a Cloud Cache

Big Data Analytics - Accelerated. stream-horizon.com

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

ISSN: Keywords: HDFS, Replication, Map-Reduce I Introduction:

International Journal of Scientific & Engineering Research, Volume 6, Issue 5, May ISSN

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

Enhancing Data Security in Cloud Storage Auditing With Key Abstraction

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

VOD STREAMING WITH A NETWORK CODING EQUIVALENT CONTENT DISTRIBUTION SCHEME

SECURE CLOUD STORAGE PRIVACY-PRESERVING PUBLIC AUDITING FOR DATA STORAGE SECURITY IN CLOUD

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Exploring Resource Provisioning Cost Models in Cloud Computing

How To Handle Big Data With A Data Scientist

MODIFIED BITTORRENT PROTOCOL AND ITS APPLICATION IN CLOUD COMPUTING ENVIRONMENT

Cloud JPL Science Data Systems

Cloud Based Distributed Databases: The Future Ahead

A Study of Infrastructure Clouds

An Architecture Model of Sensor Information System Based on Cloud Computing

Improving Apriori Algorithm to get better performance with Cloud Computing

EFFECTIVE DATA RECOVERY FOR CONSTRUCTIVE CLOUD PLATFORM

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

II. OLAP(ONLINE ANALYTICAL PROCESSING)

IMPLEMENTATION OF NETWORK SECURITY MODEL IN CLOUD COMPUTING USING ENCRYPTION TECHNIQUE

Transcription:

CLOUD BASED PEER TO PEER NETWORK FOR ENTERPRISE DATAWAREHOUSE SHARING Basangouda V.K 1,Aruna M.G 2 1 PG Student, Dept of CSE, M.S Engineering College, Bangalore,basangoudavk@gmail.com 2 Associate Professor., Dept of CSE, M.S Engineering College, Bangalore, aruna_mg@yahoo.com ABSTRACT Peer to peer network is a group of computers each of which acts as a node for sharing files within the group. To form a corporate network companies simply register their sites with the best peer++ service provider. The total cost of ownership is reduced and it leads to increases the revenue since companies don t require buying any hardware and software in advance. Best peer provides economical, flexible and scalable platform for corporate network applications by integrating cloud computing, p2p and data base technologies in to one system. The efficiency of best peer++ is demonstrated by benchmarking best peer against hadoop DB. At the end performance evaluation is done on Amazon EC2 cloud platform. The major contribution of this paper is design of a system in such way that it should delivers elastic data sharing services for the corporate network application in the cloud, based on the pay-as-you go business model. Keywords: Cloud computing, peer to peer, Map Reduce, Query processing and Index. --------------------------------------------------------------------------------------------------------------------------- 1. INTRODUCTION Companies which belongs to same industry are often connected to a corporate network for association purpose each company maintains its own site and selectively shares a part of its business data with others include supply chain networks where organizations such as suppliers, manufacturers, and retailers cooperate with each others to accomplish their own business goals s uch as planning production-line, making achievement strategies and choose marketing solutions. As per technical perspective, selecting right data sharing platform for corporate network is very important is very important, a system which enables the pooled data supports capable logical queries over those data. Traditionally, data sharing was achieved by building a centralized warehousing, which regularly extracts data from the internal production system (e.g. ERP) of each company. Such warehousing solution has some deficiency in real consumptions. First the corporate network needs to scale up to support thousands of participants. In the real world, most of the companies are not ready to invest heavily on additional information system until they can clearly see the potential return on investment (ROI). Second, companies want fully customize the access control rule to determine which business partner can see which part of their shared data. Most of the data warehouse solutions fail to offer such flexibilities. Finally, to increase the revenues, companies often adjust their business process and may change their business partners. Therefore, the participants may join and leave the corporate network at resolve. The data warehouse solution has not been designed to handle such dynamicity. For decrease such problem this paper design BestPeer ++ for corporate network. As an in-time response to the ever changing business demands and the appearance of the cloud computing techniques, best peer ++ has developed into its new stage of development the cloud enabled best peer ++ system. By integrating cloud computing, p2p and database technologies, BestPeer++ achieves its query processing competence in a pay-as-you-go cloud business model. This paper shows that design of BestPeer++ system that provides inexpensive, flexible solutions for corporate network. Performance of best peer++ will be demonstrated by benchmarking best peer++ against HadoopDB. In the below mentioned architecture the Bootstrap peer is run by the Best Peer++ s ervice provider, and its main functionality to manage the bets peer++ network. Bootstrap peer consist a peer manager, access control manager, metadata manager and certification manager. The Best Peer++ system can communicate with many normal peers and each normal peer is managed and controlled by the bootstrap peer system. Each normal peer having three sub modules like query engine, Heartbeat (HB) and downloader. The results show that for simple quires, the performance of BestPeer++ is significantly better than HadoopDB.

Fig- 1: The architecture of BestPeer ++ system. The rest of the paper is organized as follows; section 2 presents Literature survey of the best peer++ system including existing system and proposed system. Then next s d escribes design of BestPeer++ core components in section 3 and in section 4 the performance evaluation of the system. Advantages of the system are explained at section 5 and related work is presented in section 6, with conclusion in section7. 2. LITERATURE SURVEY This section shows the existing system and its disadvantages then overview of the proposed system. 2.1. Existing System The original BestPeer system attempt to develop peer-to- peer (P2P) technologies for Corporate Network. BestPeer was designed to work as a scalable, sharable, and secure P2P-based Data Management system with full functionalities for building corporate networks in which a part of association controlled by different executive domains work together in order to reduce operation cost and pick up efficiency.[6] corporate network applications such as supply chain management and national healthcare network. BestPeer provides an effective and efficient way to share data belong to different association and p rovide enterprise quality query facility, without the requirement to set up a big centralized server. [4] As per changing business demands and the coming out of Cloud Computing techniques, BestPeer has developed into its new stage that is cloud- enabled BestPeer system. Such a warehousing solution has some disadvantages in real consumption. First, the corporate network needs to extent support thousands of participants, while the fitting of a large-scale centralized data warehouse system entails nontrivial costs including big hardware/software investments and high preservation cost. In the environment, most companies are not dedicated to invest deeply on additional information systems until they can clearly see the potential return on investment (ROI) [12]. Second, companies want to completely modify the access control policy to determine which business partners can see which part of their shared data. Disadvantages of Existing System: Most of the data warehouse solutions fail to present such flexibilities. Solution has not been designed to grip such dynamicity. 2.2. Proposed System The main contribution of this paper is the design of Extended BestPeer system that provides wellorganized, elastic and scalable solution for corporate network. The unique challenges pose by sharing and processing data in an inter-businesses environment and designed Extended BestPeer, a system which gives elastic data sharing services, by including cloud computing, database, and peer-to-peer technologies for corporate Network. Best Peer s product is the BestPeer Platform, which combines the powerful MapReduce processing model with the predictable P2P database technologies. Best Peer s advanced technology features a hybrid architecture that brings the parallelism of MapReduce to the latest development in RDBMS research.[4] Extended BestPeer is based on our decade's research on P2P database system, and offers an accelerate data

processing engine and a more flexible portability via the approval of MapReduce framework and Soft ware-asa-service(saas) paradigm. In compare to the Hadoop Connector approach employed by many MPP investigative database vendor, Extended BestPeer uses Hadoop as the parallelization layer to make possible its universal query processing, with each node running a database occasion.[5] consolidate predictable database query processing and MapReduce into a single platform considerably reduces TCO, eliminate performance bottleneck from both mechanism, and allows for richer analytics through expenditure of different data types. BestPeer++ is deploying as a service in the cloud. To form a corporate network, companies register with the site Extended BestPeer service provider, initiate Extended BestPeer instances in the cloud and at last export data to those instances for sharing. BestPeer++ adopt the pay-as-you-go business model popularized by cloud computing. The total cost of possession is therefore significantly summary while companies do not have to buy any hardware/software in move on. The Extended BestPeer service provider elastically grows up the running instance and makes them always available. For occasional sustained analytical tasks, we provide a border for exporting the data from Extended BestPeer to Hadoop and allow users to analyze those data using MapReduce [3]. 3. COMPONENT OF PROPOSED SYSTEM A cloud enabled evolution of BestPeer. At the last stage of its development, BestPeer is improved with distributed access control, multiple types of indexes, and pay-as-you-go query processing for deliver elastic data sharing services in the cloud.[6] The software components of Extended BestPeer are separated into two parts: core and adapter. The Architecture is shown in fig. 2. The core contains all the data sharing functionalities and is planned to be platform independent. Fig- 2: The functional diagram of Best Peer network deployed on Amazon Cloud Offering. The adapter contains one abstract adapter which defines the elastic transportation service interface and a set of tangible adapter components which implement such an interface through APIs provided by specific cloud service providers (e.g., Amazon). We have implemented an adapter for Amazon cloud platform. In what follows, we first present this adapter and then describe the core components [6]. Highlights of BestPeer ++ systems are: a) Amazon Cloud Adapter The main approach of BestPeer is to use dedicated database servers to store data for each business and arrange those database servers through P2P network for data sharing. The Amazon Cloud Adapter provides an elastic hardware infrastructure for Extended BestPeer to operate on by using Amazon Cloud services. b) The Best Peer++ Core The BestPeer++ core contains all platform-independent logic, including query processing and P2P overlay. It runs on top of adapter and consists of two software components: bootstrap peer and normal peer.

The bootstrap peer is run by the BestPeer service provider and main functionality is to manage the BestPeer ++ network of bootstrap peer. The normal peer software having five components such as schema mapping, data loader, data indexer, access control and query executor. As shown in Fig. 3, it defines two data flows inside the normal peer as an offline data flow and an online data flow. The data are extracting periodically by a data loader from the business production system to the normal peer instance in offline data flow. c) Adaptive Query Processor Fig- 3: Data Flow in Best Peer++ system. BestPeer++ employs a amalgam design for achieve high act query processing. The major workload of a corporate network is simple, low-overhead queries. [4] This queries only occupy querying a very small number of business partners and can be processed in small moment. BestPeer++ is mainly optimized for these queries. For rare time consuming analytical tasks, it provides an interface for exporting the data from BestPeer++ to Hadoop and allows users to analyze those data using MapReduce. 4. ADVANTAGES OF PROPOSED SYSTEM This system can powerfully handle characteristic workloads in a corporate network. BestPeer++ adopt the pay-as-you-go business model famous by cloud computing. As an optional, what they use of BestPeer instance s hours and storage capacity they pay for it. BestPeer++ supports the role-based access control for the natural dispersed environment of commercial networks. It employs P2P technology to retrieve data between business partners. BestPeer++ is a great solution for data sharing within corporate networks. 5. BENCHMARKING This section shows evolution of the performance and throughput of Extended BestPeer on Amazon cloud platform. For the performance benchmark, we evaluate the query latency of Best -Peer++ with HadoopDB using five queries selected from typical corporate network applications workloads.[1] For the throughput benchmark, we produce a simple supply-chain network consisting of suppliers and retailers and study the query throughput of the system.

a) Performance Benchmarking This benchmark compares the performance of Extended BestPeer with HadoopDB. We choose Hadoop DB as our benchmark target since it is an alternative promising solution for our problem and adopts architecture similar to ours. Comparing the two systems (i.e., HadoopDB and Extended BestPeer) reveal the performance gap between a general data warehousing system and a data sharing system specially designed for corporate network applications. b) Throughput Benchmarking This section studies the query throughput of BestPeer++. HadoopDB is not designed for high query throughput; therefore, we intentionally skip the results of HadoopDB and only present the results of BestPeer. We conduct two tiers of benchmark evaluation for the performance and scalability of BestPeer++, respectively. 6. CONCLUSION This paper define exclusive challenges faced by contribution and open-handed out data in an interbusinesses environment and planned BestPeer++, a system which deliver elastic data sharing services, by Containing cloud computing, database, and peer-to-peer technologies. [5] The standard conducted on Amazon EC2 cloud platform shows that our system can powerfully handle typical workloads in a corporate network. It can move near linear query throughput as the number of normal peers grows. Therefore, BestPeer++ is great solution for capable data sharing within corporate networks. REFERENCES [1] Gang Chen, Tianlei Hu, Dawei Jiang, Peng Lu, Kian- Lee Tan, Hoang Tam Vo, and Sai Wu, Extended BestPeer: A Peer-to-Peer Based Large-Scale Data Processing Platform,VOL. 26,NO. 6, JUNE 2014. [2] H.V. Jagadish, B.C. Ooi, and Q.H. Vu, BATON: A Balanced Tree Structure for Peer-to-Peer Networks, Proc. 31st Int l Conf. Very Large Data Bases (VLDB 05), pp. 661-672, 2005. [3] W.S. Ng, B.C. Ooi, K.-L. Tan, and A. Zhou, PeerDB: A P2P-Based System for Distributed Data Sharing, Proc. 19th Int l Conf. Data Eng., pp. 633-644, 2003. [4] S. Wu, S. Jiang, B.C. Ooi, and K.-L. Tan, Distributed Online Aggregation, Proc. VLDB Endowment, vol. 2, no. 1, pp. 443-454, 2009. [5] S. Wu, J. Li, B.C. Ooi, and K.-L. Tan, Just-in-Time Query Retrieval over Partially Indexed Data on Structured P2P Overlays, Proc. ACM SIGMOD Int l Conf. Management of Data (SIGMOD 08), pp. 279-290, 2008. [6] S. Wu, Q.H. Vu, J. Li, and K.-L. Tan, Adaptive Multi- Join Query Processing in PDBMS, Proc. IEEE Int l Conf. Data Eng. (ICDE 09), pp. 1239-1242, 2009. [7] Beng Chin Ooi, Yanfeng Shu, Relational Data Sharing in Peer-based Data Management Systems. Kian- Lee Tan Sigmod Record special issue on P2P, 2003. [8] B.C. Ooi, K.L. Tan, A.Y. Zhou, C.H. Goh, Y.G. Li, C.Y. Liau, B. Ling, W.S. Ng, Y.F. Shu, X.Y. Wang, M. Zhang PeerDB: Peering into Personal Databases. The 2003 ACM SIGMOD Intl. Conf. on Management of Data (Demo). (SIGMOD 2003). [9] G. Chen, H. T. Vo, S. Wu, B. C. Ooi, T. A Framework for Supporting DBMS-like Indexes in the Cloud. Ozsu VLDB 2011. [10] Sai Wu, Dawei Jiang, Beng Chin Ooi, Kun Lun Wu Efficient B+-tree Based Indexing for Cloud Data Processing VLDB 2010. [11] Heng Tao Shen, Yanfeng Shu, and Bei Yu IEEE Trans. Knowl. Efficient Semantic-Based Content Search in P2P Network. Data