Data Management Course Syllabus

Size: px
Start display at page:

Download "Data Management Course Syllabus"

Transcription

1 Data Management Course Syllabus Data Management: This course is designed to give students a broad understanding of modern storage systems, data management techniques, and how these systems are used to store, access and analyze Big Data. Topics include data modeling; storage system design of disk arrays, network attached storage, clusters and data centers: relational databases and the use of madlib techniques for data analytics; no-sql databases and their advantages; cloud data storage and the use of clouds for big data; data warehouses and data mining; and the mapreduce paradigm for data analytics and the hadoop file system. Homework assignments will give students practical experience with important topics covered in the course, including the use of cloud storage, relational databases, NoSQL databases, and hadoop/map Reduce. Week Topic Readings Homework Exams 1 Introduction, data modeling Gray, J. "Evolution of data management". Computer, 29(10):38-46, J. Gray, D. T. Liu, M. Nieto-Santisteban, A. Szalay, D. J. DeWitt, and G. Heber, "Scientific data management in the coming decade," ACM SIGMOD Record, vol. 34, pp , Jim Gray on escience: a transformed scientific method, The Fourth Paradigm: Data-Intensive Scientific Discovery, Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. 2 Disk arrays, Network Attached Storage, Clusters Ch. 2 of A First Course in Database systems, by Jeff Ullman, and Jennifer Widom ml D. A. Patterson, G. Gibson, and R. H. Katz, "A case for redundant arrays of inexpensive disks (RAID)," in ACM SIGMOD international conference on Management of data (SIGMOD '88): ACM, 1988, pp Homework 1 Y. Saito, S. Frølund, A. Veitch, A. Merchant, and S. Spence, "FAB: building distributed enterprise disk arrays from commodity components," ACM SIGOPS Operating Systems Review, vol. 38, pp.

2 48-58, G. A. Gibson and R. Van Meter, "Network attached storage architecture," Communications of the ACM, vol. 43, pp , Dillow, Z. Zhang, and B. W. Settlemyer, "Workload characterization of a leadership class storage cluster," in Petascale Data Storage Workshop (PDSW), th, 2010, pp Data Centers The Datacenter as a Computer: An Introduction to the Design of Warehouse- Scale Machines, Second edition July 2013, 154 pages, Luiz André Barroso, Jimmy Clidaras, Urs Hölzle, Google, Inc /S00516ED2V01Y201306CAC024 4 Cloud Storage Systems M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, and I. Stoica, "A view of cloud computing," Communications of the ACM, vol. 53, pp , Homework 1 Homework 2 C. Wang, K. Ren, W. Lou, and J. Li, "Toward publicly auditable secure cloud data storage services," Network, IEEE, vol. 24, pp , R. Grossman and Y. Gu, "Data mining using high performance data clouds: experimental studies using sector and sphere," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp Commercial Cloud Storage Systems Amazon S3, Cloud Computing Storage for Files, Images, Videos aws.amazon.com/s3/ Amazon SimpleDB - Amazon Web Homework 2 Homework 3

3 Services aws.amazon.com/simpledb/ 6 File Systems for Massive Storage Amazon Glacier - Amazon Web Services aws.amazon.com/glacier/ B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable Performance of the Panasas Parallel File System," in FAST, 2008, pp Homework 3 due 7 Midterm 1 and Relational databases and analytics Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS Operating Systems Review. Vol. 37. No. 5. ACM, M.-S. Chen, J. Han, and P. S. Yu, "Data mining: an overview from a database perspective," Knowledge and data Engineering, IEEE Transactions on, vol. 8, pp , Midterm 1 8 Relational databases and analytics (cont.) Optional background reading: A First Course in Database systems, by Jeff Ullman, and Jennifer Widom Cohen, Jeffrey, et al. "MAD skills: new analysis practices for big data." Proceedings of the VLDB Endowment 2.2 (2009): Homework 4 9 NoSQL 10 NoSQL Hellerstein, Joseph M., et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): R. Cattell, "Scalable SQL and NoSQL data stores," ACM SIGMOD Record, vol. 39, pp , F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, "Bigtable: A distributed storage system for structured data," ACM Transactions on Computer Systems (TOCS), vol. 26, p. 4, Lakshman and P. Malik, "Cassandra: a decentralized structured storage system," Homework 4 Homework 5

4 (cont.) ACM SIGOPS Operating Systems Review, vol. 44, pp , G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, "Dynamo: amazon's highly available key-value store," in SOSP, 2007, pp Distributed M. Stonebraker, P. M. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. Sidell, C. Staelin, and A. Yu, "Mariposa: a wide-area distributed database system," The VLDB Journal, vol. 5, pp , Homework 5 Homework 6 12 Map Reduce, Hadoop File System J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, and P. Hochschild, "Spanner: Google s globallydistributed database," in Proceedings of OSDI, J. Dean and S. Ghemawat, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, pp , Map Reduce, Hadoop File System (cont.) 14 Midterm 2, Data Warehouses 15 Data Warehouses (cont.) K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The hadoop distributed file system," in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, 2010, pp D. Wegener, M. Mock, D. Adranale, and S. Wrobel, "Toolkit-based highperformance Data Mining of large Data on MapReduce Clusters," in Data Mining Workshops, ICDMW'09. IEEE International Conference on, 2009, pp Surajit Chaudhuri Umeshwar Dayal. An Overview of Data Warehousing and OLAP Technology.. SIGMOD Record, 26(1), 1997, J. C. Prather, D. F. Lobach, L. K. Goodwin, J. W. Hales, M. L. Hage, and W. E. Hammond, "Medical data mining: knowledge discovery in a clinical data Homework 6 due Midterm 2

5 warehouse," in Proceedings of the AMIA Annual Fall Symposium, 1997, p A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, "Hive-a petabyte scale data warehouse using hadoop," in Data Engineering (ICDE), 2010 IEEE 26th International Conference on, 2010, pp

USC Viterbi School of Engineering

USC Viterbi School of Engineering USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office

More information

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Data-Intensive Computing Course Code: CS4480

More information

Cloud Data Management: A Short Overview and Comparison of Current Approaches

Cloud Data Management: A Short Overview and Comparison of Current Approaches Cloud Data Management: A Short Overview and Comparison of Current Approaches Siba Mohammad Otto-von-Guericke University Magdeburg [email protected] Sebastian Breß Otto-von-Guericke University

More information

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected].

Joining Cassandra. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece mattos@csd.uoc. Luiz Fernando M. Schlindwein Computer Science Department University of Crete Heraklion, Greece [email protected] Joining Cassandra Binjiang Tao Computer Science Department University of Crete Heraklion,

More information

Scalable Multiple NameNodes Hadoop Cloud Storage System

Scalable Multiple NameNodes Hadoop Cloud Storage System Vol.8, No.1 (2015), pp.105-110 http://dx.doi.org/10.14257/ijdta.2015.8.1.12 Scalable Multiple NameNodes Hadoop Cloud Storage System Kun Bi 1 and Dezhi Han 1,2 1 College of Information Engineering, Shanghai

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 695 Big Data Processing and Data Analytics (3-0-3) 2016 Section 51 Class Nbr. 5493. Tues, Thur TBA Prerequisites: CIS 505 and CIS 530. CIS 612, CIS 660 Preferred. Instructor:

More information

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur.

Review of Query Processing Techniques of Cloud Databases Ruchi Nanda Assistant Professor, IIS University Jaipur. Suresh Gyan Vihar University Journal of Engineering & Technology (An International Bi Annual Journal) Vol. 1, Issue 2, 2015,pp.12-16 ISSN: 2395 0196 Review of Query Processing Techniques of Cloud Databases

More information

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications

A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications A Demonstration of Rubato DB: A Highly Scalable NewSQL Database System for OLTP and Big Data Applications Li-Yan Yuan Department of Computing Science University of Alberta [email protected] Lengdong

More information

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM

MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM MANAGEMENT OF DATA REPLICATION FOR PC CLUSTER BASED CLOUD STORAGE SYSTEM Julia Myint 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar [email protected] 2 University of Computer

More information

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google

More information

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS

AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS INTERNATIONAL JOURNAL OF REVIEWS ON RECENT ELECTRONICS AND COMPUTER SCIENCE AN EFFECTIVE PROPOSAL FOR SHARING OF DATA SERVICES FOR NETWORK APPLICATIONS Koyyala Vijaya Kumar 1, L.Sunitha 2, D.Koteswar Rao

More information

Evaluation of NoSQL and Array Databases for Scientific Applications

Evaluation of NoSQL and Array Databases for Scientific Applications Evaluation of NoSQL and Array Databases for Scientific Applications Lavanya Ramakrishnan, Pradeep K. Mantha, Yushu Yao, Richard S. Canon Lawrence Berkeley National Lab Berkeley, CA 9472 [lramakrishnan,pkmantha,yyao,scanon]@lbl.gov

More information

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD

SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE SYSTEM IN CLOUD International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-3, JUNE 2014, 54-58 IIST SOLVING LOAD REBALANCING FOR DISTRIBUTED FILE

More information

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with components like Flume, Pig, Hive and Jaql Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group [email protected] ABSTRACT We define analytic infrastructure to be the services,

More information

Hosting Transaction Based Applications on Cloud

Hosting Transaction Based Applications on Cloud Proc. of Int. Conf. on Multimedia Processing, Communication& Info. Tech., MPCIT Hosting Transaction Based Applications on Cloud A.N.Diggikar 1, Dr. D.H.Rao 2 1 Jain College of Engineering, Belgaum, India

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Scalable Queries For Large Datasets Using Cloud Computing: A Case Study

Scalable Queries For Large Datasets Using Cloud Computing: A Case Study Scalable Queries For Large Datasets Using Cloud Computing: A Case Study James P. McGlothlin The University of Texas at Dallas Richardson, TX USA [email protected] Latifur Khan The University of

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612 Modern Database Processing & Big Data (3-0-3) Fall 2015 Section 50 Class Nbr. 5378. Tues, Thu 4:30 5:45 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred. Instructor:

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER TABLE OF CONTENTS INTRODUCTION WHAT IS BIG DATA?...

More information

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore.

A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA. Technology, Coimbatore. Engineering and Technology, Coimbatore. A REVIEW ON EFFICIENT DATA ANALYSIS FRAMEWORK FOR INCREASING THROUGHPUT IN BIG DATA 1 V.N.Anushya and 2 Dr.G.Ravi Kumar 1 Pg scholar, Department of Computer Science and Engineering, Coimbatore Institute

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]

More information

DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING

DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING DIB: DATA INTEGRATION IN BIGDATA FOR EFFICIENT QUERY PROCESSING P.Divya, K.Priya Abstract In any kind of industry sector networks they used to share collaboration information which facilitates common interests

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

EXPERIMENTAL EVALUATION OF NOSQL DATABASES

EXPERIMENTAL EVALUATION OF NOSQL DATABASES EXPERIMENTAL EVALUATION OF NOSQL DATABASES Veronika Abramova 1, Jorge Bernardino 1,2 and Pedro Furtado 2 1 Polytechnic Institute of Coimbra - ISEC / CISUC, Coimbra, Portugal 2 University of Coimbra DEI

More information

CSCI 550: Advanced Data Stores

CSCI 550: Advanced Data Stores CSCI 550: Advanced Data Stores Basic Information Place and time: Spring 2014, Tue/Thu 9:30-10:50 am Instructor: Prof. Shahram Ghandeharizadeh, [email protected], 213-740-4781 ITS Help: E-mail: [email protected]

More information

INTRO TO BIG DATA. Djoerd Hiemstra. http://www.cs.utwente.nl/~hiemstra. Big Data in Clinical Medicinel, 30 June 2014

INTRO TO BIG DATA. Djoerd Hiemstra. http://www.cs.utwente.nl/~hiemstra. Big Data in Clinical Medicinel, 30 June 2014 INTRO TO BIG DATA Big Data in Clinical Medicinel, 30 June 2014 Djoerd Hiemstra http://www.cs.utwente.nl/~hiemstra WHY BIG DATA? 2 Source: http://en.wikipedia.org/wiki/mount_everest 3 19 May 2012: 234 people

More information

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.

11/18/15 CS 6030. q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in. by shatha muhi CS 6030 1 q Big Data: collections of large datasets (huge volume, high velocity, and variety of data). q Apache Hadoop framework emerged to solve big data management and processing challenges.

More information

Deep Explore in Big Data Analytics for Business Intelligence

Deep Explore in Big Data Analytics for Business Intelligence Deep Explore in Big Data Analytics for Business Intelligence Ms.Divya.P * Assistant Professor CKCET Cuddalore, INDIA E-Mail:[email protected] Mr.Murugan.R Assistant Professor CKCET Cuddalore, INDIA

More information

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql

Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759

More information

American International Journal of Research in Science, Technology, Engineering & Mathematics

American International Journal of Research in Science, Technology, Engineering & Mathematics American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Which NoSQL Database? A Performance Overview

Which NoSQL Database? A Performance Overview 2014 by the authors; licensee RonPub, Lübeck, Germany. This article is an open access article distributed under the terms and conditions Veronika of the Creative Abramova, Commons Jorge Attribution Bernardino,

More information

reviewed paper Data-based Collaboration on a Grand Scale Markus Mayr, Paolo Fogliaroni

reviewed paper Data-based Collaboration on a Grand Scale Markus Mayr, Paolo Fogliaroni reviewed paper Data-based Collaboration on a Grand Scale Markus Mayr, Paolo Fogliaroni (Dipl. Ing. Markus Mayr, TU Vienna Department of Geodesy and Geoinformation, Gusshausstraße 27-29 1040 Vienna, [email protected])

More information

Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey

Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey By, Mr. Brijesh B. Mehta Admission No.: D14CO002 Supervised By, Dr. Udai Pratap Rao Computer Engineering Department S. V. National

More information

ANALYSIS OF SMART METER DATA USING HADOOP

ANALYSIS OF SMART METER DATA USING HADOOP ANALYSIS OF SMART METER DATA USING HADOOP 1 Balaji K. Bodkhe, 2 Dr. Sanjay P. Sood MESCOE Pune, CDAC Mohali Email: 1 [email protected], 2 [email protected] Abstract The government agencies and the

More information

DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM

DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE DISTRIBUTION OF DATA SERVICES FOR CORPORATE APPLICATIONS IN CLOUD SYSTEM Itishree Boitai 1, S.Rajeshwar 2 1 M.Tech Student, Dept of

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Comparison of Different Implementation of Inverted Indexes in Hadoop

Comparison of Different Implementation of Inverted Indexes in Hadoop Comparison of Different Implementation of Inverted Indexes in Hadoop Hediyeh Baban, S. Kami Makki, and Stefan Andrei Department of Computer Science Lamar University Beaumont, Texas (hbaban, kami.makki,

More information

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems Ismail Hababeh School of Computer Engineering and Information Technology, German-Jordanian University Amman, Jordan Abstract-

More information

III. SYSTEM ARCHITECTURE

III. SYSTEM ARCHITECTURE SQLMR : A Scalable Database Management System for Cloud Computing Meng-Ju Hsieh Institute of Information Science Academia Sinica, Taipei, Taiwan Email: [email protected] Chao-Rui Chang Institute of Information

More information

How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System

How To Analyze Log Files In A Web Application On A Hadoop Mapreduce System Analyzing Web Application Log Files to Find Hit Count Through the Utilization of Hadoop MapReduce in Cloud Computing Environment Sayalee Narkhede Department of Information Technology Maharashtra Institute

More information

Cloud computing doesn t yet have a

Cloud computing doesn t yet have a The Case for Cloud Computing Robert L. Grossman University of Illinois at Chicago and Open Data Group To understand clouds and cloud computing, we must first understand the two different types of clouds.

More information

Big Automotive Data. Leveraging large volumes of data for knowledge-driven product development

Big Automotive Data. Leveraging large volumes of data for knowledge-driven product development Big Automotive Data Leveraging large volumes of data for knowledge-driven product development Mathias Johanson, Stanislav Belenki, Jonas Jalminger, Magnus Fant Alkit Communications AB Mölndal, Sweden {mathias,

More information

THE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT

THE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT TREX WORKSHOP 2013 THE CLOUD AND ITS EFFECTS ON WEB DEVELOPMENT Jukka Tupamäki, Relevantum Oy Software Specialist, MSc in Software Engineering (TUT) [email protected] / @tukkajukka 30.10.2013 1 e arrival

More information

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2

A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja 1, M.Shailaja 2 RESEARCH ARTICLE A Comparative Study on Operational base, Warehouse Hadoop File System T.Jalaja 1, M.Shailaja 2 1,2 (Department of Computer Science, Osmania University/Vasavi College of Engineering, Hyderabad,

More information

Approaches for parallel data loading and data querying

Approaches for parallel data loading and data querying 78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies [email protected] This paper

More information

REVIEW: Big Data on Cloud Computing

REVIEW: Big Data on Cloud Computing REVIEW: Big Data on Cloud Computing Akram Roshdi 1, Mahboubeh Shamsi 2 1 Departmentof Engineering, Khoy branch, Islamic Azad University, Khoy, IRAN 2 Department of Engineering, Qom University of Technology,

More information

How To Analyze Big Data In Healthcare

How To Analyze Big Data In Healthcare BIG DATA ANALYTICS IN HEALTHCARE: A SURVEY Gemson Andrew Ebenezer J. 1 and Durga S. 2 1 Department of Computer Science and Engineering, Karunya University, Coimbatore, India 2 Department of Information

More information

Data Management Challenges in Cloud Computing Infrastructures

Data Management Challenges in Cloud Computing Infrastructures Data Management Challenges in Cloud Computing Infrastructures Divyakant Agrawal Amr El Abbadi Shyam Antony Sudipto Das University of California, Santa Barbara {agrawal, amr, shyam, sudipto}@cs.ucsb.edu

More information

How To Build Cloud Storage On Google.Com

How To Build Cloud Storage On Google.Com Building Scalable Cloud Storage Alex Kesselman [email protected] Agenda Desired System Characteristics Scalability Challenges Google Cloud Storage What does a customer want from a cloud service? Reliability

More information

Slave. Master. Research Scholar, Bharathiar University

Slave. Master. Research Scholar, Bharathiar University Volume 3, Issue 7, July 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper online at: www.ijarcsse.com Study on Basically, and Eventually

More information

Distributed Lucene : A distributed free text index for Hadoop

Distributed Lucene : A distributed free text index for Hadoop Distributed Lucene : A distributed free text index for Hadoop Mark H. Butler and James Rutherford HP Laboratories HPL-2008-64 Keyword(s): distributed, high availability, free text, parallel, search Abstract:

More information

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit

Loose Coupling between Cloud Computing Applications and Databases: A Challenge to be Hit International Journal of Computer Systems (ISSN: 2394-1065), Volume 2 Issue 3, March, 2015 Available at http://www.ijcsonline.com/ Loose Coupling between Cloud Computing Applications and Databases: A Challenge

More information

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team) Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we

More information

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT Gita Shah 1, Annappa 2 and K. C. Shet 3 1,2,3 Department of Computer Science & Engineering, National Institute of Technology,

More information

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014 Hadoop and Hive Introduction,Installation and Usage Saatvik Shah Data Analytics for Educational Data May 23, 2014 Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 1 / 15

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

More information

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive

Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive E. Laxmi Lydia 1,Dr. M.Ben Swarup 2 1 Associate Professor, Department of Computer Science and Engineering, Vignan's Institute

More information

NetFlow Analysis with MapReduce

NetFlow Analysis with MapReduce NetFlow Analysis with MapReduce Wonchul Kang, Yeonhee Lee, Youngseok Lee Chungnam National University {teshi85, yhlee06, lee}@cnu.ac.kr 2010.04.24(Sat) based on "An Internet Traffic Analysis Method with

More information

Lifetime Management of Cache Memory using Hadoop Snehal Deshmukh 1 Computer, PGMCOE, Wagholi, Pune, India

Lifetime Management of Cache Memory using Hadoop Snehal Deshmukh 1 Computer, PGMCOE, Wagholi, Pune, India Volume 3, Issue 1, January 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com ISSN:

More information

JackHare: a framework for SQL to NoSQL translation using MapReduce

JackHare: a framework for SQL to NoSQL translation using MapReduce DOI 10.1007/s10515-013-0135-x JackHare: a framework for SQL to NoSQL translation using MapReduce Wu-Chun Chung Hung-Pin Lin Shih-Chang Chen Mon-Fong Jiang Yeh-Ching Chung Received: 15 December 2012 / Accepted:

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Data Management in Cloud based Environment using k- Median Clustering Technique

Data Management in Cloud based Environment using k- Median Clustering Technique Data Management in Cloud based Environment using k- Median Clustering Technique Kashish Ara Shakil Department of Computer Science Jamia Millia Islamia New Delhi, India Mansaf Alam Department of Computer

More information

A REVIEW: Distributed File System

A REVIEW: Distributed File System Journal of Computer Networks and Communications Security VOL. 3, NO. 5, MAY 2015, 229 234 Available online at: www.ijcncs.org EISSN 23089830 (Online) / ISSN 24100595 (Print) A REVIEW: System Shiva Asadianfam

More information

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Achieving 100,000,000 database inserts per second using Accumulo and D4M Achieving 100,000,000 database inserts per second using Accumulo and D4M Jeremy Kepner, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Peter Michaleas, Julie

More information

A Study on Big Data Integration with Data Warehouse

A Study on Big Data Integration with Data Warehouse A Study on Big Data Integration with Data Warehouse T.K.Das 1 and Arati Mohapatro 2 1 (School of Information Technology & Engineering, VIT University, Vellore,India) 2 (Department of Computer Science,

More information

A TAXONOMY AND COMPARISON OF HADOOP DISTRIBUTED FILE SYSTEM WITH CASSANDRA FILE SYSTEM

A TAXONOMY AND COMPARISON OF HADOOP DISTRIBUTED FILE SYSTEM WITH CASSANDRA FILE SYSTEM A TAXONOMY AND COMPARISON OF HADOOP DISTRIBUTED FILE SYSTEM WITH CASSANDRA FILE SYSTEM Kalpana Dwivedi and Sanjay Kumar Dubey Department of Computer Science Engineering, Amity School of Engineering and

More information

Introduction to Hadoop

Introduction to Hadoop 1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

More information

NoSQL Databases: a step to database scalability in Web environment

NoSQL Databases: a step to database scalability in Web environment NoSQL Databases: a step to database scalability in Web environment Jaroslav Pokorny Charles University, Faculty of Mathematics and Physics, Malostranske n. 25, 118 00 Praha 1 Czech Republic +420-221914265

More information