Multidimensional Arrays for Warehousing Data on Clouds
|
|
- Marcus Anderson
- 8 years ago
- Views:
Transcription
1 Multidimensional Arrays for Warehousing Data on Clouds Laurent d Orazio 1 and Sandro Bimonte 2 1 Blaise Pascal University - LIMOS, France dorazio@isima.fr 2 Cemagref, France sandro.bimonte@cemagref.fr Abstract. Data warehouses and OLAP systems are business intelligence technologies. They allow decision-makers to analyze on the fly huge volumes of data represented according to the multidimensional model. Cloud computing on the impulse of ICT majors like Google, Microsoft and Amazon, has recently focused the attention. OLAP querying and data warehousing in such a context consists in a major issue. Indeed, problems to be tackled are basic ones for large scale distributed OLAP systems (large amount of data querying, semantic and structural heterogeneity) from a new point of view, considering specificities from these architectures (pay-as-you-go rule, elasticity, and user-friendliness). In this paper we address the pay-as-you-go rules for warehousing data storage. We propose to use the multidimensional arrays storage techniques for clouds. First experiments validate our proposal. 1 Introduction Data warehouses and OLAP systems are business intelligence technologies that aim at the analysis of huge volumes of data modeled according with the multidimensional model [13]. In typical architectures, OLAP systems are deployed using Relational DBMS systems to store and analyze data. This approach is suitable for sparse data warehouses. When data is dense, the MOLAP approach can be used [28]. It stores data using a multidimensional data structure such as multidimensional arrays in order to reduce the size of stored data. High Performance Computing architectures aim at insuring increasing needs in terms of computing or storage resources of both scientific and industrial applications [6]. Among these architectures, cloud computing on the impulse of companies like Google, Microsoft and Amazon focuses the interest in particular due to low costs and the fact that they consists in good out-of-the-box solutions, even if their performances are bellow current parallel DBMS [23]. Data warehouses and OLAP systems on cloud raise several problems related to storage and query computation performance. In particular, problems to consider include basic ones from large scale distributed systems (large amount of data querying, semantic and structural heterogeneities) from a new point of A. Hameurlain, F. Morvan, and A. Min Tjoa (Eds.): Globe 2010, LNCS 6265, pp , c Springer-Verlag Berlin Heidelberg 2010
2 Multidimensional Arrays for Warehousing Data on Clouds 27 view, regarding specific behaviors of these architectures: pay-as-you go model, elasticity and user friendliness [6]. Some works support complex queries like spatial and OLAP queries on clouds [17], [25], [27]. However, to the best of our knowledge, no work defines a particular data model to store multidimensional data on clouds trying to respect the pay-as-you go model. Therefore, in this paper we provide the first step towards the implementation of a multidimensional arrays-based architecture on clouds, in order to reduce storage data costs. In particular, we present an algorithm that transforms data stored using multidimensional arrays into Pig data [17]. This allows us to perform OLAP queries using the MapReduce paradigm [9] and save storage costs. The second contribution of this paper is the description of open research issues of using cloud databases for OLAP analysis. This paper is organized as follows. Section 2 presents the context of our work. Section 3 introduces our proposition of multidimensional arrays storage on clouds. Section 4 validates our approach. Section 5 lists research opportunities. Finally, section 6 concludes this paper. 2 Context and Research Motivation This section briefly presents a case study that will be used as an illustration in subsection 2.1, data warehouses and OLAP in subsection 2.2, data management in clouds in subsection 2.3, then introduces our research motivation in subsection Case Study In order to present our work, we introduce a simulated case study that concerns the OLAP analysis of sales for stores of a supply chain, which are located in each French department. It presents two dimensions, a spatial dimension that groups departments into regions, and the temporal dimension (day < month < year), and the measure is the profit. An example of data is shown on table 1. Table 1. Case study data Year Month Day Country Region Department Profit France Auvergne Puy-de-Dôme France Auvergne Allier France Rhône-Alpes Isère Data Warehouse and OLAP Data warehouses model data according to the multidimensional model. Such a model defines the concepts of dimensions and measures. A dimension is composed of hierarchies and represents the analysis axis. A hierarchy organizes data into a hierarchical structure allowing decision-makers to analyze measures at different granularities. Indeed, measures are numerical indicators which describe the
3 28 L. d Orazio and S. Bimonte analysis subject. OLAP operators such as roll-up and drill-down allow decisionmakers to navigate into hierarchies aggregating data using SQL aggregation functions [13]. Some other operators have been defined to select a part of the data warehouse and permute dimensions [20]. MOLAP systems use multidimensional data structures such as multidimensional arrays constructed from the original data, which are typically stored in relational databases. MOLAP systems improve storage performance for dense data warehouse through its particular storage data model [27]. Indeed, using multidimensional arrays allows storing only measures values, as they are indexed using the position of dimensions members. For example, according the MOLAP representation of our study case illustrated by figure 1, the measure value at the position ARRAY[2] [1] is associated to the second member of the first dimension ( ) and the first member of the second dimension (Allier department). Dimensions Time Time Dim[0]= Time Dim[1]= Time Dim[34121]= Location Location Dim[0]=France,Auvergne,Puy-de-D^ome Location Dim[1]=France,Auvergne,Allier... Location Dim[99]=France,Rh^one-Alpes,Isère Measures Facts Profit Fact[0]=2000 Facts Profit Fact[1]= Fig. 1. MOLAP representation of data In order to store an unidimensional array, a simple formula has been provided: Let d dimensions, N k the members of the k th dimension, then the position of the measure value in the unidimensional array is: p(i 1,..., i d )= d j=1 (i j* d k=j+1 N k) where i j is the position of the member of the j th dimension. 2.3 Data Management on Cloud In order to achieve scalable and efficient databases management systems to deal with data volumes that cannot be managed by classical relational or object DBMS (for example Facebook manages more than three hundred millions users, more than two billions pictures uploaded and more than three millions events added per day), some new data management architectures on clouds have been developed. Data management in clouds usually follows a layered architecture as illustrated by figure 2. The first level is the infrastructure tier. Typically, such a tier consists in one or several data centers that are used in order for large data analysis processes to be done [4] [1]. The main behavior of this level is the associated pay-as-you-go model.
4 Multidimensional Arrays for Warehousing Data on Clouds 29 Fig. 2. Cloud data management architecture The second tier is the storage tier. Its main objective is to propose a highly scalable and fault-tolerant system. In clouds, data are stored in files managed by such systems [11] [2]. The third tier is the execution environment tier. The most known example of cloud computing execution environment is probably Google MapReduce [9] and its open source version Hadoop [3]. Such an execution tier aims at providing elasticity by enabling to adjust resources according to the application. In one way such a property avoids large invests in order for applications to sustain some peaks of use and as a consequence will lead to a global under use of the infrastructure. On the other way, it enables to ensure the good functioning of an application which popularity would have not been correctly foreseen, increasing resources if necessary. The last tier is the high querying language tier. Such a tier aims at proposing user-friendliness and transparence of the other tiers of the architecture, and the parallelism as possible. Some query languages have been proposed like Facebook Hive [25], Microsoft Scope [7], Google Sawzall [19], Map-Reduce-Merge [26], whicharebasedonparticulardatamodelssuchasthecolumnoriented[22]or extensions of the relational model [17] [25]. In particular, the Pig Latin language [17] has been designed to propose a trade-off between the declarative style of SQL, and the low-level, procedural style of MapReduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over a parallel execution environment. 2.4 Research Motivations According to the principle of pay-as-you-go, in clouds, users only pay for resources (CPU, storage, bandwidth consumption) they use. For example, with Microsoft Windows Azure [4] CPU costs 0.12 $ for one hour execution, storage costs 0.15 $ per month and per GB, bandwidth consumption costs 0.10 $ per GB in upload and 0.15 $ in download. Therefore, if on one hand all query languages for data in clouds support indirectly (since no ad-hoc operator [12] has been introduced) OLAP queries, on the other hand no one address multidimensional data storage. Then, our
5 30 L. d Orazio and S. Bimonte idea is to provide a particular organization of multidimensional data on cloud in order to reduce the storage and computation costs for OLAP queries, and at the same time take advantage of cloud data management systems characteristics: scalability and performance. 3 Multidimensional Arrays in Clouds In this section, we introduce an overview of the querying process on multidimensional arrays in clouds (subsection 3.1). Then, we present the storage and data processing in more details (subsection 3.2) and our optimization of Pig OLAP queries (subsection 3.3). 3.1 Overview of Querying Multidimensional Arrays in Clouds The querying process is composed of two steps as illustrated by figure 3: 1. Data are structured as arrays. This enables to reduce the size of the stored files, and as a consequence the price to be paid by clients. When a query (or a set of queries) is posed, arrays are translated in Pig data using a temporary file, using the algorithm presented in subsection 3.2. Such a file will removed after the analysis. 2. OLAP queries are formulated and optimized in an efficient execution plan of Pig Latin instructions. It has to be noted that this queries can be executed in a a parallel fashion, using the MapReduce paradigm, enabling elasticity. Fig. 3. Overview of the querying process 3.2 OLAP Queries Using PIG Latin and Multidimensional Arrays This section presents how multidimensional arrays can be used in Pig storage capabilities that we propose. The Pig data model is an extension of the relation model, with the following atomic concepts: Bag (set of values), Map (hash functions), nested tables and UDF (User Defined Functions).
6 Multidimensional Arrays for Warehousing Data on Clouds 31 Data are stored in logical multidimensional arrays, physically stored as a unidimensional array using the formula previously presented in subsection 2.2. Figure 4(a) illustrates multidimensional arrays for data our study case. For example, the first fact, first line of the measure part, is the measures values associated to members (first line of the time) and France,Auvergne,Puy-de-Dôme (first line of the location part). When queries are posed, data are converted in Pig data in a temporary file. Each line represents a tuple, values for a tuple being separated by semicolons. The conversion of the considering data by our study case are shown on figure 4(b) France,Auvergne,Puy-de-D^ome France,Auvergne,Allier... France,Rh^one-Alpes,Isère 2000;500;400;... (a) Multidimensional arrays 2000;01;01;France;Auvergne;Puy-de-D^ome; ;01;01;France;Auvergne;Allier; ;04;20;France;Rh^one-Alpes;Isère;2500 (b) Pig data Fig. 4. Data representation Conversion from multidimensional arrays to Pig data is done via the algorithm 1. Inputs of such an algorithm are the files which store arrays of dimensions and measures (figure 4(a)). The output consists in the file which represents warehouse data using the Pig data model (figure 4(b)). The idea of the algorithm is to build the Cartesian product using the n 1 dimensions. Then, these data are join in a Cartesian product with the n th dimension, and measure values of the measure array are added to generated tuples by this way: i th tuple with the i th value of the measure array. When the analysis is complete, the temporary file is removed in order to save storage costs. Algorithm 1. Algorithm for conversion of multidimensional arrays data to Pig data Require: Tables files Ensure: Pig file int i 1; int n; file cartprodfile; {initialized by the cartesian product of the two last dimensions} file pigfile; file mafile; array dimensions; {set of dimensions} while i<=n-1 do cartprodfile cartproduct(dimensions[i],cartprodfile); i i+1; end while i 0; while mafile not end do pigfile.insert (cartproduct(dimension(n),cartprodfile )+, +mafile(i)) ; i i+1; end while return pigfile;
7 32 L. d Orazio and S. Bimonte 3.3 Optimization of Pig Latin Instruction for OLAP Queries A classical OLAP query on these data is: what is the total profit per region during 1990 and 1991?. Such a formulated query can easily be expressed in our study case using Pig statements as illustrated by figure 5. Such an OLAP query in Pig is defined using three instructions. One selects the dimension members (1990 and 1991 in our example). One enables to group data (grouping by year and region), whereas the other is used for the aggregation process (a sum in this particular use case). s9091 = FILTER sales BY years = 1990 or years = 1991; groups = GROUP s9091 BY year, region; results = FOREACH groups GENERATE s9091.region, s9091.month, SUM(profit); Fig. 5. OLAP query with Pig: FILTER and (GROUP, Aggregation) Note that, unlike SQL where the DBMS chooses a plan through optimize hints, Pig Latin queries consist in a set of instructions where the order is let to the user and that the Pig systems only provide a logical optimizer, enabling for example logical optimizations such as projection pushdown [10]. That is why we propose a simple and yet efficient optimization of OLAP queries by rewriting Pig statements. In fact, we have seen previously that Pig queries consist in a set of instructions. Then, OLAP queries can be formulated in two ways: (i) FILTER and (GROUP and Aggregation) (see figure 5), (ii) (GROUP and Aggregation) and FILTER (see figure 6). Of course, such a sequence greatly influences response times. In such an example, if the data source is quite large, aggregating all data and then selecting the dimensions members may be costly. Therefore, one intuitive optimization of OLAP queries on Pig is to use the query pattern FILTER, GROUP, Aggregation. groups = GROUP sales BY years; avgprof = FOREACH groups GENERATE region, SUM(profits); results = FILTER avgprof BY years = 1990 or years = 1991; Fig. 6. OLAP query with Pig: (GROUP, Aggregation) and FILTER 4 Validation Our proposal has been validated with simulated data. All experiments were conducted on a 2.2GHZ Intel Core 2 duo with 4GB RAM. The main objective of these experiments was to illustrate that our proposals enable on the one hand a great reduction of storage cost with a negligible overhead and on the other hand a performance improvement. Subsection 4.1 focuses on storage, subsection 4.2 on the data conversion process, whereas subsection 4.3 studies the impact of our proposed optimization on response time.
8 4.1 Storage Consumption Multidimensional Arrays for Warehousing Data on Clouds 33 Figure 7 presents the storage consumption in GB according to the used data model, that is to say Pig or multidimensional arrays. Results clearly show that multidimensional arrays-based storage lead to a dramatic reduction (about 90%) in the amount of storage used by data sources. We can then conclude that our system is cheaper in term of storage. For example, with Amazon EC2 pricing (0.15$ per GB per month), and one TB data source based on a relational model would approximately cost 1850$ per year, whereas with our approach the cost would be around 230$. Fig. 7. Storage consumption 4.2 Data Conversion Figure 8 presents the mean response time of multidimensional arrays to Pig data conversion process according to the size of the source, given in number of tuples. Such an experiment highlights the additional cost, in particular in CPU, induced by our proposal. Results show that such a process takes less than one minute to execute for a data source containing up to ten millions of tuples. Fig. 8. Data conversion
9 34 L. d Orazio and S. Bimonte As a consequence the additional cost can be considered negligible. In fact, with Amazon EC2 pricing (0.12$ per hour for a standard instance), such an additional process would approximately cost 0,001$ on a pro rata temporis basis for a data source containing ten millions of tuples. 4.3 Query Optimization Finally, table 2 illustrates the impact of the optimization of Pig statements for OLAP queries. It presents the response time for the evaluation of a naive query and the corresponding optimized query, on a data source containing about half a million tuples. Results clearly show that the optimization accelerates the evaluation process (in that case a 30% reduction in the considered experiment). Table 2. Impact of optimization on the response time Mean response time (secondes) Optimized query FILTER, GROUP and Aggregate 130 Naive query GROUP, Aggregate, and FILTER Research Opportunities This section establishes a list of research opportunities that we consider as particularly important to consider in order to supply OLAP queries and data warehouses in clouds. We decompose these opportunities according two categories: performance optimization (subsection 5.1), then modeling and querying (subsection 5.2). 5.1 Performance Optimization In order to improve performance of OLAP queries, the following aspects should be considered: 1. Definition of OLAP PIG Latin query optimizer. Indeed,aswehavepreviously described, PIG does not use any query plan optimizer. Then, we think that very important query improvements are possible by adapting classical database management system optimizers to Pig data model and Pig Latin. 2. Data warehouse indexes implementation using MapReduce paradigm. Indexes such as bitmap, etc. are used with data warehouses in order to optimize computation time expensive queries as join [15] or aggregation [24]. As a consequence, using parallel indexes implemented in accord to the Map Reduce paradigm, we will allow us to exploit the important computation capacities offered by the cloud infrastructures. Thus, we consider mandatory the extension and/or the adaptation of these indexes to Cloud databases. 3. Definition of materialized view algorithms based on the pay-as-you go model. Materialized views are a fundamental technique for OLAP query optimization in ROLAP architectures. Several works have proposed an intelligent selection of materialized views that should be calculated [5]. These approaches
10 Multidimensional Arrays for Warehousing Data on Clouds 35 do not take into account the pay-as-you go model principles. We therefore believe to defining some materialization and dematerialization techniques that fit the changes of the user query patterns, and storage costs of the cloud computing providers. 4. Integration of caches to improve the quality of service and reduce costs. Caching is crucial to improve performance in many computing systems, and particularly in business intelligent systems [21]. Our objective is to supply sophisticated caching techniques, and more precisely semantic caching [14], [8] to enhance the quality of service (reduction of the response time, increasing of the availability) and to reduce costs. Such mechanisms would be used for copying frequently posed queries and as a consequence saves CPU and bandwidth consumption. 5.2 Modeling and Querying Considering modeling and querying of data warehouses in clouds, the aspects to be tackled should be: 1. Integration of OLAP SQL operators as Pig Latin native operators. Indeed, one of the most important characteristic of cloud infrastructures is the userfriendly. So, since, as shown on this paper, the definition of the OLAP queries with Pig Latin is not direct, we think to the integration of the principles of the Cube operator [12] in Pig Latin. This will allow us introduce some typical OLAP server functionalities directly in the cloud data manager system [10], facilitating the use and the definition of OLAP systems in the clouds. 2. Implementation of advanced modeling properties of multidimensional models using Pig data model. Multidimensional applications can present some advanced modeling properties, as many to many relations between facts and dimensions, complex measures, etc. [18] which are difficult to implement in relational DBMS [16]. Exploring the power of the Pig data model (as regards the concepts of bag, map, and nested query) for multidimensional modeling rests an important open research issue. 6 Conclusion This paper presents a starting work aiming at providing a multidimensional arrays-based architecture to be deployedonclouds.wehaveproposedtouse multidimensional arrays to store data in order to optimize storage costs. Then, we have presented an algorithm to convert these structures into Pig data. By this way, OLAP queries can be easily performed using the Pig Latin query language. Then, we have presented a simple and intuitive OLAP query optimization by ordering Pig Latin sentences. Experiments have shown the relevance of such a solution with simulated data. Results clearly show that our solution proposal saves storage consumption and as a consequence enables users of clouds to reduce their costs. Finally, we have listed research opportunities to consider in order to
11 36 L. d Orazio and S. Bimonte efficiently integrate OLAP queries and data warehouses in cloud. Currently we are working on the introduction of the MapReduce paradigm in the algorithm for conversion of multidimensional arrays data to Pig data, the introduction of the Cube operator in Pig, as well as further experiments on real cloud infrastructures. Acknowledgment Thanks to Boussad Mebarki, Ilyas Brahmia, Abdelaziz Merabet, in addition to the APIS team of the LIMOS laboratory and the COPAIN team from the Cemagref for useful discussions on datawarehouses and cloud computing. References 1. Amazon ec2, 2. Amazon s3, 3. Hadoop, 4. Microsoft azure, 5. Aouiche, K., Darmont, J.: Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information Systems 33(1), (2009) 6. Armbrust, M., Fox, A., Griffith, R., Katz, A.D.J.R.H., Konwinski, A., Lee, G., Patterson, D.A., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A berkeley view of cloud computing. Technical Report UCB/EECS , Berkeley (2009) 7. Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), (2008) 8. Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: VLDB, Bombay, India, pp (1996) 9. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), (2008) 10. Gates, A., Natkovich, O., Chopra, S., Kamath, P., Narayanam, S., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a highlevel dataflow system on top of mapreduce: The pig experience. PVLDB 2(2), (2009) 11. Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: SOSP, Bolton Landing, USA, pp (2003) 12. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In: ICDE, New Orleans, USA, pp (1996) 13. Inmon, W.: Building the Data Warehouse. Wiley, New York (1996) 14. Keller, A.M., Basu, J.: A predicate-based caching scheme for client-server database architectures. VLDB Journal 5(1), (1996) 15. Kimball, R.: The data warehouse toolkit: practical techniques for building dimensional data warehouses. John Wiley & Sons, Inc., Chichester (1996) 16. Malinowski, E., Zimnyi, E.: Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications (Data-Centric Systems and Applications. Springer Publishing Company, Incorporated, Heidelberg (2008) 17. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-soforeign language for data processing. In: SIGMOD, pp (2008)
12 Multidimensional Arrays for Warehousing Data on Clouds Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: A foundation for capturing and querying complex multidimensional data. Information Systems 26(5), (2001) 19. Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13(4), (2005) 20. Rafanelli, M.: Operators for multidimensional aggregate data. In: Multidimensional Databases: problems and solutions, pp (2003) 21. Savary, L., Gardarin, G., Zeitouni, K.: Geocache: A cache for gml geographical data. IJDWM 3(1), (2007) 22. Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O Neil, E.J., O Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented dbms. In: VLDB, pp (2008) 23. Stonebraker, M., Abadi, D.J., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Communications of the ACM 53(1), (2010) 24. Tao, Y., Papadias, D.: Historical spatio-temporal aggregation. ACM Transaction Information Systems 23(1), (2005) 25. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehousing solution over a map-reduce framework. PVLDB 2(2), (2009) 26. H.-c. Yang, A., Dasdan, R.-L., Hsiao, R.-L., Parker, D.S.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD, Beijing, China, pp (2007) 27. Zhang, S., Han, J., Liu, Z., Wang, K., Feng, S.: Spatial queries evaluation with mapreduce. In: GCC, pp (2009) 28. Zhao, Y., Deshpande, P., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. In: Peckham, J. (ed.) SIGMOD, Tucson, USA, pp (1997)
How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System
Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute
More informationIntegrating Hadoop and Parallel DBMS
Integrating Hadoop and Parallel DBMS Yu Xu Pekka Kostamaa Like Gao Teradata San Diego, CA, USA and El Segundo, CA, USA {yu.xu,pekka.kostamaa,like.gao}@teradata.com ABSTRACT Teradata s parallel DBMS has
More informationBuilding OLAP cubes on a Cloud Computing environment with MapReduce
Building OLAP cubes on a Cloud Computing environment with MapReduce Billel ARRES Universite Lumire Lyon 2 5 avenue Pierre Mands-France 69676 Bron, France Billel.Arres@univ-lyon2.fr Nadia KABBACHI Universite
More informationJackHare: a framework for SQL to NoSQL translation using MapReduce
DOI 10.1007/s10515-013-0135-x JackHare: a framework for SQL to NoSQL translation using MapReduce Wu-Chun Chung Hung-Pin Lin Shih-Chang Chen Mon-Fong Jiang Yeh-Ching Chung Received: 15 December 2012 / Accepted:
More informationToward Lightweight Transparent Data Middleware in Support of Document Stores
Toward Lightweight Transparent Data Middleware in Support of Document Stores Kun Ma, Ajith Abraham Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan,
More informationData Migration from Grid to Cloud Computing
Appl. Math. Inf. Sci. 7, No. 1, 399-406 (2013) 399 Applied Mathematics & Information Sciences An International Journal Data Migration from Grid to Cloud Computing Wei Chen 1, Kuo-Cheng Yin 1, Don-Lin Yang
More informationAlejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
More informationData and Algorithms of the Web: MapReduce
Data and Algorithms of the Web: MapReduce Mauro Sozio May 13, 2014 Mauro Sozio (Telecom Paristech) Data and Algorithms of the Web: MapReduce May 13, 2014 1 / 39 Outline 1 MapReduce Introduction MapReduce
More informationFlying Yellow Elephant: Predictable and Efficient MapReduce in the Cloud
Flying Yellow Elephant: Predictable and Efficient MapReduce in the Cloud Jörg Schad Supervised by: Prof. Dr. Jens Dittrich Information Systems Group, Saarland University http://infosys.cs.uni-saarland.de
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Hadoop Ecosystem Overview of this Lecture Module Background Google MapReduce The Hadoop Ecosystem Core components: Hadoop
More informationMapReduce With Columnar Storage
SEMINAR: COLUMNAR DATABASES 1 MapReduce With Columnar Storage Peitsa Lähteenmäki Abstract The MapReduce programming paradigm has achieved more popularity over the last few years as an option to distributed
More informationA Study on Big Data Integration with Data Warehouse
A Study on Big Data Integration with Data Warehouse T.K.Das 1 and Arati Mohapatro 2 1 (School of Information Technology & Engineering, VIT University, Vellore,India) 2 (Department of Computer Science,
More informationHigh performance computing network for cloud environment using simulators
High performance computing network for cloud environment using simulators Ajith Singh. N 1 and M. Hemalatha 2 1 Ph.D, Research Scholar (CS), Karpagam University, Coimbatore, India 2 Prof & Head, Department
More informationA Design and implementation of a data warehouse for research administration universities
A Design and implementation of a data warehouse for research administration universities André Flory 1, Pierre Soupirot 2, and Anne Tchounikine 3 1 CRI : Centre de Ressources Informatiques INSA de Lyon
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches Column-Stores Horizontal/Vertical Partitioning Horizontal Partitions Master Table Vertical Partitions Primary Key 3 Motivation
More informationAnalysis and Optimization of Massive Data Processing on High Performance Computing Architecture
Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National
More informationA Dynamic Load Balancing Strategy for Parallel Datacube Computation
A Dynamic Load Balancing Strategy for Parallel Datacube Computation Seigo Muto Institute of Industrial Science, University of Tokyo 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan +81-3-3402-6231 ext.
More informationData Management Course Syllabus
Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to
More informationReview on the Cloud Computing Programming Model
, pp.11-16 http://dx.doi.org/10.14257/ijast.2014.70.02 Review on the Cloud Computing Programming Model Chao Shen and Weiqin Tong School of Computer Engineering and Science Shanghai University, Shanghai
More informationDATA WAREHOUSING AND OLAP TECHNOLOGY
DATA WAREHOUSING AND OLAP TECHNOLOGY Manya Sethi MCA Final Year Amity University, Uttar Pradesh Under Guidance of Ms. Shruti Nagpal Abstract DATA WAREHOUSING and Online Analytical Processing (OLAP) are
More informationA Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems
A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down
More informationCooperative Database Caching within Cloud Environments
Cooperative Database Caching within Cloud Environments Andrei Vancea 1, Guilherme Sperb Machado 1, Laurent d Orazio 2, and Burkhard Stiller 1 1 Department of Informatics (IFI), University of Zürich, Zürich,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationHANDLING IMPRECISION IN QUALITATIVE DATA WAREHOUSE: URBAN BUILDING SITES ANNOYANCE ANALYSIS USE CASE
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-2/W1, 213 8th International Symposium on Spatial Data Quality, 3 May - 1 June 213, Hong Kong HANDLING
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationOLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP
Data Warehousing and End-User Access Tools OLAP and Data Mining Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities. Key
More informationReview of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.
Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases
More informationA Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2
RESEARCH ARTICLE A Comparative Study on Operational base, Warehouse Hadoop File System T.Jalaja 1, M.Shailaja 2 1,2 (Department of Computer Science, Osmania University/Vasavi College of Engineering, Hyderabad,
More informationSecond Credit Seminar Presentation on Big Data Analytics Platforms: A Survey
Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey By, Mr. Brijesh B. Mehta Admission No.: D14CO002 Supervised By, Dr. Udai Pratap Rao Computer Engineering Department S. V. National
More informationDWEB: A Data Warehouse Engineering Benchmark
DWEB: A Data Warehouse Engineering Benchmark Jérôme Darmont, Fadila Bentayeb, and Omar Boussaïd ERIC, University of Lyon 2, 5 av. Pierre Mendès-France, 69676 Bron Cedex, France {jdarmont, boussaid, bentayeb}@eric.univ-lyon2.fr
More informationCubeView: A System for Traffic Data Visualization
CUBEVIEW: A SYSTEM FOR TRAFFIC DATA VISUALIZATION 1 CubeView: A System for Traffic Data Visualization S. Shekhar, C.T. Lu, R. Liu, C. Zhou Computer Science Department, University of Minnesota 200 Union
More informationR.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5
Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach
More informationINTEROPERABILITY IN DATA WAREHOUSES
INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content
More informationEnhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
More informationBig Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich
Big Data Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich MapReduce & Hadoop The new world of Big Data (programming model) Overview of this Lecture Module Background Google MapReduce The Hadoop
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More information11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.
by shatha muhi CS 6030 1 q Big Data: collections of large datasets (huge volume, high velocity, and variety of data). q Apache Hadoop framework emerged to solve big data management and processing challenges.
More informationChameleon: The Performance Tuning Tool for MapReduce Query Processing Systems
paper:38 Chameleon: The Performance Tuning Tool for MapReduce Query Processing Systems Edson Ramiro Lucsa Filho 1, Ivan Luiz Picoli 2, Eduardo Cunha de Almeida 2, Yves Le Traon 1 1 University of Luxembourg
More informationFlexPRICE: Flexible Provisioning of Resources in a Cloud Environment
FlexPRICE: Flexible Provisioning of Resources in a Cloud Environment Thomas A. Henzinger Anmol V. Singh Vasu Singh Thomas Wies Damien Zufferey IST Austria A-3400 Klosterneuburg, Austria {tah,anmol.tomar,vasu.singh,thomas.wies,damien.zufferey}@ist.ac.at
More informationA Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures
A Distributed Tree Data Structure For Real-Time OLAP On Cloud Architectures F. Dehne 1,Q.Kong 2, A. Rau-Chaplin 2, H. Zaboli 1, R. Zhou 1 1 School of Computer Science, Carleton University, Ottawa, Canada
More informationDESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT Gita Shah 1, Annappa 2 and K. C. Shet 3 1,2,3 Department of Computer Science & Engineering, National Institute of Technology,
More informationA Comparison of Approaches to Large-Scale Data Analysis
A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce
More informationSARAH Statistical Analysis for Resource Allocation in Hadoop
SARAH Statistical Analysis for Resource Allocation in Hadoop Bruce Martin Cloudera, Inc. Palo Alto, California, USA bruce@cloudera.com Abstract Improving the performance of big data applications requires
More informationNetFlow Analysis with MapReduce
NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with
More informationA Brief Tutorial on Database Queries, Data Mining, and OLAP
A Brief Tutorial on Database Queries, Data Mining, and OLAP Lutz Hamel Department of Computer Science and Statistics University of Rhode Island Tyler Hall Kingston, RI 02881 Tel: (401) 480-9499 Fax: (401)
More informationHow To Understand Cloud Computing
Cloud Computing: a Perspective Study Lizhe WANG, Gregor von LASZEWSKI, Younge ANDREW, Xi HE Service Oriented Cyberinfrastruture Lab, Rochester Inst. of Tech. Abstract The Cloud computing emerges as a new
More informationHow To Write A Paper On Bloom Join On A Distributed Database
Research Paper BLOOM JOIN FINE-TUNES DISTRIBUTED QUERY IN HADOOP ENVIRONMENT Dr. Sunita M. Mahajan 1 and Ms. Vaishali P. Jadhav 2 Address for Correspondence 1 Principal, Mumbai Education Trust, Bandra,
More informationMapReduce: A Flexible Data Processing Tool
DOI:10.1145/1629175.1629198 MapReduce advantages over parallel databases include storage-system independence and fine-grain fault tolerance for large jobs. BY JEFFREY DEAN AND SANJAY GHEMAWAT MapReduce:
More informationResearch Article An Extended Form of MATLAB To-map Reduce Frameworks in HADOOP Based Cloud Computing Environments
Research Journal of Applied Sciences, Engineering and Technology 12(9): 900-906, 2016 DOI:1019026/rjaset122807 ISSN: 2040-7459; e-issn: 2040-7467 2016 Maxwell Scientific Publication Corp Submitted: September
More informationUsing the column oriented NoSQL model for implementing big data warehouses
Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 469 Using the column oriented NoSQL model for implementing big data warehouses Khaled. Dehdouh 1, Fadila. Bentayeb 1, Omar. Boussaid 1, and Nadia
More informationCUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB
CUBE INDEXING IMPLEMENTATION USING INTEGRATION OF SIDERA AND BERKELEY DB Badal K. Kothari 1, Prof. Ashok R. Patel 2 1 Research Scholar, Mewar University, Chittorgadh, Rajasthan, India 2 Department of Computer
More informationOptimal Service Pricing for a Cloud Cache
Optimal Service Pricing for a Cloud Cache K.SRAVANTHI Department of Computer Science & Engineering (M.Tech.) Sindura College of Engineering and Technology Ramagundam,Telangana G.LAKSHMI Asst. Professor,
More informationEnhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications
Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationReview. Data Warehousing. Today. Star schema. Star join indexes. Dimension hierarchies
Review Data Warehousing CPS 216 Advanced Database Systems Data warehousing: integrating data for OLAP OLAP versus OLTP Warehousing versus mediation Warehouse maintenance Warehouse data as materialized
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
More informationMapReduce for Data Warehouses
MapReduce for Data Warehouses Data Warehouses: Hadoop and Relational Databases In an enterprise setting, a data warehouse serves as a vast repository of data, holding everything from sales transactions
More informationWeb Log Data Sparsity Analysis and Performance Evaluation for OLAP
Web Log Data Sparsity Analysis and Performance Evaluation for OLAP Ji-Hyun Kim, Hwan-Seung Yong Department of Computer Science and Engineering Ewha Womans University 11-1 Daehyun-dong, Seodaemun-gu, Seoul,
More informationBUSINESS INTELLIGENCE AND NOSQL DATABASES
INFORMATION SYSTEMS IN MANAGEMENT Information Systems in Management (2012) Vol. 1 (1) 25 37 BUSINESS INTELLIGENCE AND NOSQL DATABASES JERZY DUDA Department of Applied Computer Science, Faculty of Management,
More informationThe Hidden Extras. The Pricing Scheme of Cloud Computing. Stephane Rufer
The Hidden Extras The Pricing Scheme of Cloud Computing Stephane Rufer Cloud Computing Hype Cycle Definition Types Architecture Deployment Pricing/Charging in IT Economics of Cloud Computing Pricing Schemes
More informationIntroduction to Cloud Computing
Discovery 2015: Cloud Computing Workshop June 20-24, 2011 Berkeley, CA Introduction to Cloud Computing Keith R. Jackson Lawrence Berkeley National Lab What is it? NIST Definition Cloud computing is a model
More informationNew Cloud Computing Network Architecture Directed At Multimedia
2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.16 New Cloud Computing Network
More informationGuidelines for Selecting Hadoop Schedulers based on System Heterogeneity
Noname manuscript No. (will be inserted by the editor) Guidelines for Selecting Hadoop Schedulers based on System Heterogeneity Aysan Rasooli Douglas G. Down Received: date / Accepted: date Abstract Hadoop
More informationAdaptive Query Execution for Cloud Based Data Management
Adaptive Query Execution for Data Management in the Cloud Adrian Daniel Popescu Debabrata Dash Verena Kantere Anastasia Ailamaki School of Computer and Communication Sciences École Polytechnique Fédérale
More informationDATA WAREHOUSING - OLAP
http://www.tutorialspoint.com/dwh/dwh_olap.htm DATA WAREHOUSING - OLAP Copyright tutorialspoint.com Online Analytical Processing Server OLAP is based on the multidimensional data model. It allows managers,
More informationIndex Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
More informationHorizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
More informationFacilitating Consistency Check between Specification and Implementation with MapReduce Framework
Facilitating Consistency Check between Specification and Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, and Keijiro ARAKI Grad. School of Information Science and Electrical Engineering,
More informationInvestigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses
Investigating the Effects of Spatial Data Redundancy in Query Performance over Geographical Data Warehouses Thiago Luís Lopes Siqueira Ricardo Rodrigues Ciferri Valéria Cesário Times Cristina Dutra de
More informationCLOUD BASED PEER TO PEER NETWORK FOR ENTERPRISE DATAWAREHOUSE SHARING
CLOUD BASED PEER TO PEER NETWORK FOR ENTERPRISE DATAWAREHOUSE SHARING Basangouda V.K 1,Aruna M.G 2 1 PG Student, Dept of CSE, M.S Engineering College, Bangalore,basangoudavk@gmail.com 2 Associate Professor.,
More informationhttp://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
More informationRESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE
RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming Institute of GIS, Chinese Academy of Surveying and Mapping No.16, Road Beitaiping, District Haidian, Beijing, P.R.China,
More informationDaniel J. Adabi. Workshop presentation by Lukas Probst
Daniel J. Adabi Workshop presentation by Lukas Probst 3 characteristics of a cloud computing environment: 1. Compute power is elastic, but only if workload is parallelizable 2. Data is stored at an untrusted
More informationData Warehousing Systems: Foundations and Architectures
Data Warehousing Systems: Foundations and Architectures Il-Yeol Song Drexel University, http://www.ischool.drexel.edu/faculty/song/ SYNONYMS None DEFINITION A data warehouse (DW) is an integrated repository
More informationDistributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
More informationOLAP. Business Intelligence OLAP definition & application Multidimensional data representation
OLAP Business Intelligence OLAP definition & application Multidimensional data representation 1 Business Intelligence Accompanying the growth in data warehousing is an ever-increasing demand by users for
More informationPrediction System for Reducing the Cloud Bandwidth and Cost
ISSN (e): 2250 3005 Vol, 04 Issue, 8 August 2014 International Journal of Computational Engineering Research (IJCER) Prediction System for Reducing the Cloud Bandwidth and Cost 1 G Bhuvaneswari, 2 Mr.
More informationCity University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Data-Intensive Computing Course Code: CS4480
More informationA Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems
A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-
More informationDIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach
DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES A User-driven Approach Cécile Favre, Fadila Bentayeb, Omar Boussaid ERIC Laboratory, University of Lyon, 5 av. Pierre Mendès-France, 69676 Bron Cedex, France
More informationData Warehouse Snowflake Design and Performance Considerations in Business Analytics
Journal of Advances in Information Technology Vol. 6, No. 4, November 2015 Data Warehouse Snowflake Design and Performance Considerations in Business Analytics Jiangping Wang and Janet L. Kourik Walker
More informationAN EFFICIENT STRATEGY OF THE DATA INTEGRATION BASED CLOUD
INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFICIENT STRATEGY OF THE DATA INTEGRATION BASED CLOUD Koncha Anantha Laxmi Prasad 1, M.Yaseen Pasha 2, V.Hari Prasad 3 1
More informationAndreas Rauber and Philipp Tomsich Institute of Software Technology Vienna University of Technology, Austria {andi,phil}@ifs.tuwien.ac.
An Architecture for Modular On-Line Analytical Processing Systems: Supporting Distributed and Parallel Query Processing Using Co-operating CORBA Objects Andreas Rauber and Philipp Tomsich Institute of
More informationMRGIS: A MapReduce-Enabled High Performance Workflow System for GIS
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS Qichang Chen, Liqiang Wang Department of Computer Science University of Wyoming {qchen2, wang}@cs.uwyo.edu Zongbo Shang WyGISC and Department
More informationBig Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
More informationCleveland State University
Cleveland State University CIS 695 Big Data Processing and Data Analytics (3-0-3) 2016 Section 51 Class Nbr. 5493. Tues, Thur TBA Prerequisites: CIS 505 and CIS 530. CIS 612, CIS 660 Preferred. Instructor:
More informationTHE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT
TREX WORKSHOP 2013 THE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT Jukka Tupamäki, Relevantum Oy Software Specialist, MSc in Software Engineering (TUT) tupamaki@gmail.com / @tukkajukka 30.10.2013 1 e arrival
More informationAnalysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
More informationHosting Transaction Based Applications on Cloud
Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India
More informationBUILDING OLAP TOOLS OVER LARGE DATABASES
BUILDING OLAP TOOLS OVER LARGE DATABASES Rui Oliveira, Jorge Bernardino ISEC Instituto Superior de Engenharia de Coimbra, Polytechnic Institute of Coimbra Quinta da Nora, Rua Pedro Nunes, P-3030-199 Coimbra,
More informationA DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT
A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT Xiufeng Liu 1 & Xiaofeng Luo 2 1 Department of Computer Science Aalborg University, Selma Lagerlofs Vej 300, DK-9220 Aalborg, Denmark 2 Telecommunication Engineering
More informationlow-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
More informationIndex Selection Techniques in Data Warehouse Systems
Index Selection Techniques in Data Warehouse Systems Aliaksei Holubeu as a part of a Seminar Databases and Data Warehouses. Implementation and usage. Konstanz, June 3, 2005 2 Contents 1 DATA WAREHOUSES
More informationA Technical Review on On-Line Analytical Processing (OLAP)
A Technical Review on On-Line Analytical Processing (OLAP) K. Jayapriya 1., E. Girija 2,III-M.C.A., R.Uma. 3,M.C.A.,M.Phil., Department of computer applications, Assit.Prof,Dept of M.C.A, Dhanalakshmi
More informationEvaluation of New Technique to Secure End User Information Using Cloud Monitoring Approach
International Journal of Electronics and Computer Science Engineering 86 Available Online at www.ijecse.org ISSN- 2277-1956 Evaluation of New Technique to Secure End User Information Using Cloud Monitoring
More informationAn Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment
An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment Daeyong Jung 1, SungHo Chin 1, KwangSik Chung 2, HeonChang Yu 1, JoonMin Gil 3 * 1 Dept. of Computer
More informationAn introduction to Tsinghua Cloud
. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2010 Vol. 53 No. 7: 1481 1486 doi: 10.1007/s11432-010-4011-z An introduction to Tsinghua Cloud ZHENG WeiMin 1,2 1 Department of Computer Science
More informationSpatialHadoop: Towards Flexible and Scalable Spatial Processing using MapReduce
SpatialHadoop: Towards Flexible and Scalable Spatial Processing using MapReduce Ahmed Eldawy Expected Graduation: December 2015 Supervised by: Mohamed F. Mokbel Computer Science and Engineering Department
More information