DBugHelper: A Debug assistant tool for distributed systems

Size: px
Start display at page:

Download "DBugHelper: A Debug assistant tool for distributed systems"

Transcription

1 ( ) Journal of East China Normal University (Natural Science) No. 5 Sept : (2016) DBugHelper: Debug,,, (, ) :,, Debug., Debug, bug., bug bug, bug.,, bug. bug bug. Debug DBugHelper, bug. DBugHelper bug,, bug,. bug, Debug,. : ; Debug; bug ; : TP391 : A DOI: /j.issn DBugHelper: A Debug assistant tool for distributed systems ZHANG Yan-fei, ZHANG Chun-xi, LI Yu-ming, ZHANG Rong (Institute for Data Science and Engineering, Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai , China) Abstract: Development of large-scale distributed systems has experienced a long developing period. During the whole development cycle, debug is one of the most important steps. We meet the challenges of finding all the bugs and the corresponding solutions fixing bugs in a short time. Bug reports record bug histories and solutions, which provide a way to understand bug features and help to find solutions for new bugs. After we analyze the bug reports and fixed solutions, we find that there are strong correlation and similarity among many large-scale distributed systems. Thus the developing and fixing scheme of : : 863 (2015AA015307); ( , ); ( ) :,,,. yfzhang@stu.ecnu.edu.cn. :,,,,. rzhang@sei.ecnu.edu.cn.

2 154 ( ) 2016 bugs may have similar characteristics. Then existed fixing solutions of bugs can be used to assist fixing new bugs. In this paper, we propose DBugHelper, a debug helping tool which can be applied to boost the development of large-scale distributed systems and provide a more effective way to fix bugs. In DBugHelper, the existed bug reports are processed offline, and the latest bug report is represented as a query vector. We query the bug report history database and find the similar bugs with their solutions. In such way, we suppose to shorten the whole system development period. Key words: large-scale distributed system; Debug; bug report; assistance 0 Debug,., ;.,,, : Spark HBase Cassandra Hadoop ; CouchDB MongoDB. bug, Debug.,, Debug. Debug bug,, Debug., bug., bug Debug. Kim S [1] bug, bug ; Zhou J [2], Nguyen A [3] bug, bug. Debug, bug bug, bug [3-4]. bug, 6. bug bug, bug., APACHE bug , bug [5] ; MapReduce bug 3 898, bug 1 435, bug 711 [5]. bug, bug, bug bug, bug, bug bug., bug, bug bug, bug,.

3 5, : DBugHelper: Debug 155, bug, bug [6] ; bug [7] ; (Orthogonal Defect Classification, ODC),,, [8]. bug, bug. Bug (, bug ) bug bug, bug., : bug bug,, : Q1: bug bug Q2: bug bug, : (1) : bug. Latent Dirichlet Allocation (LDA) [9-10], bug, ; (2) : LDA,, Vector Space Model(VSM) ; (3) : bug. LDA VSM bug, : LDA, bug,,, bug, ; VSM, bug,. DBugHelper LDA VSM L VSM,, bug. : LDA VSM L VSM, bug. bug,,. Debug DBugHelper. bug, bug. : 1 bug ; 2 DBugHelper ; 3 ; 4 ; 5 ; 6. 1 Bug bug, : bug : bug, bug, bug,.

4 156 ( ) 2016, bug., bug. : bug, bug. bug, bug,., bug, bug. :. bug,,.,. DBugHelper, bug, bug bug., bug bug [2]..,. 2 Debug DBugHelper 2.1 DBugHelper 1 Debug DBugHelper. : 1) : ; 2) :. bug,. bug L VSM, bug,., bug bug, bug, bug 1 DBugHelper Fig. 1 DBugHelper overview

5 5, : DBugHelper: Debug 157. bug bug, bug. 2.2, bug, L VSM bug, bug. DBugHelper bug F R,, bug F. : : bug, bug T s, bug ; bug bug T g ; T (T s + T g ). bug T, bug F T, F w. : T w, T w θ(t w) w T, w T, w w c = [w c,1, w c,2,, w c,t ], T, c, w c,t : w c,t = θ t,c T k=1 θ k,c. (1) w w uniform = [1/ T, 1/ T, 1/ T,, 1/ T ], 1/ T, cosine w c w uniform : simi c ( w c, w uniform ) = cos w c, w uniform = w c w uniform w c w uniform, (2).,. w D. : bug F T, T F w, w θ T., F, D. bug,,,. LDA,,,,,.,,. : D bug d, D F bug bug, F D t. bug, w i,j (tf) (idf).,. VSM, tf idf : tf i,j = n i,j k n k,j D, idf i = log {j : t i d j }. (3)

6 158 ( ) 2016 (3) n i,j t i d j, d j. D D d j, t j. tf, VSM, VSM [11]., : tf i,j = log(n i,j ) + 1. (4) L VSM, (4) tf, t i w i,j : D w i,j = tf i,j idf i = (log(n i,j ) + 1) log {j : t i d j }, (5) d j V j : V j = (w 1,j, w 2,j, w 3,j,, w i,j ). (6) L VSM LDA Gibbs Sampling, (2) bug, VSM, (6) F, : F = { V 1, V 2, V 3,, V D }. (7) : bug,,. bug,, bug. Bug K-Means [12],, K-Means bug. F K(K D ), F K, : S = {F 1, F 2, F 3,, F K }. (8) K, K,, K, K. K-Means : E = F n=1 k=1 K r nk ( V n µ k ) 2. (9) (9) µ k S k, r nk V n S n, 1, 0. E V n S n, k, : n µ k = r nk V n n r. (10) nk bug B, B.

7 5, : DBugHelper: Debug bug,.,, 2. bug, bug D B; S n C n ; bug. 2 DBugHelper Fig. 2 Three layers for online processing in DBugHelper B C n, B C n S B, (2) cosine B : simi(b i, C n ) = cos B i, C n. (11) B C n B, bug bug bug,. bug F, B S B bug, bug, bug DBugHelper, ( 1 ). bug, bug. HDFS,, Hadoop. MapReduce Hadoop,. HBase Hadoop, HDFS, MapReduce. Hive Hadoop,.

8 160 ( ) Tab. 1 Study project bugs bugs HDFS An open source distributed file system 17/Mar/09 1/May/ MapReduce An open source programming model 22/Jun/07 29/April/ HBase An open source distributed database 1/Feb/08 5/May/ Hive An open source distributed data warehouse 30/Oct/08 19/April/ , bug ( BugZilla APACHE ) bug, DBugHelper bug. bug, : (1) Bug. bug ( : MapReduce job can infinitely increase number of reducer resource requests, Bug, Blocker, 2.8.0, None ). (2) Bug. Debug bug,. 3.3 : Q1: DBugHelper bug, bug, bug DBugHelper. bug. Top N(N=1, 5, 10, 15, 20) bug, bug. 100 bug, bug (Accuracy). Q2: L VSM bug 2, L VSM, bug. L VSM, VSM, DBugHelper L VSM VSM, Q1 DBugHelper bug, L VSM. Q3: DBugHelper bug 2,,,, bug., (TA) ( ),. 3.4 DBugHelper bug, : Accuracy ( ) bug DBugHelper Top N(N=1,5,10,,n)., bug.,. TA( / ), Bug. TA DBugHelper,.

9 5, : DBugHelper: Debug 161 : TA = M P Time j j=1 M Accuracy (12) j=1 Time Top N bug ; M Top N bug ; Accuracy DBugHelper Top N bug. 4 M P Time j : Q1: DBugHelper bug 2 DBugHelper Top N bug bug bug, bug Top N. bug bug ( 100 bug ),. 100 HDFS bug, Top 10 44, 44%. Hive,, Top 20 68, 68%. N, DBugHelper. 2 DBugHelper Tab. 2 DBugHelper accuracy Top 1/% Top 5/% Top 10/% Top 15/% Top 20/% HDFS MapReduce HBase Hive Q2: L VSM bug DBugHelper, L VSM, bug. 3 VSM L VSM DBugHelper. HDFS MapReduce bug, Q1,, L VSM bug. 3, L VSM Top N(N=1, 5, 15, 20) bug VSM. HDFS MapReduce, L VSM VSM. VSM DBugHelper N, L VSM., L VSM bug bug. 3 VSM L VSM DBugHelper Tab. 3 Accuracy comparison between VSM and L VSM in DBugHelper VSM Top 1/% Top 5/% Top 10/% Top 15/% Top 20/% HDFS MapReduce L VSM Classic VSM L VSM Classic VSM

10 162 ( ) 2016 Q3: DBugHelper bug 4 Q1, Top N. HDFS 100 bug, 100 Top N(N=1, 5, 15, 20) bug,, bug 2. 4 DBugHelper Tab. 4 Execution time in DBugHelper Top 1/s Top 5/s Top 10/s Top 15/s Top 20/s HDFS MapReduce HBase Hive (12), TA DBugHelper. 3, TA N. Top 1 Top 5, ; Top 5 Top 20,.,. TA, DBugHelper. TA,, bug. DBugHelper, L VSM bug. VSM, L VSM ;,, Debug DBugHelper. 3 DBugHelper Fig. 3 The efficiency of DBugHelper 5, Debug. Bug, [13]. bug, bug bug. Lukins LDA bug, bug [14]. LDA bug,. DBugHelper bug, bug,., bug,. DBugHelper bug,, bug

11 5, : DBugHelper: Debug 163., bug bug. DBugHelper LDA VSM Bug, Rao [15] Unigram Model(UM), Vector Space Model(VSM), Latent Semantic Analysis Model(LSA), Latent Dirichlet Allocation Model(LDA) Cluster Based Document Model (CBDM), UM VSM. LDA VSM Bug,.., bug. bug bug, bug [16], bug [17], bug [1].,, bug,,., bug Debug. 6 bug, bug, bug., bug, bug,, bug., Debug. bug,, Debug DBugHelper. L VSM bug bug,., 4, DBugHelper bug., bug. DBugHelper,. [ ] [ 1 ] KIM S, ZIMMERMANN T, WHITEHEADE E J, et al. Predicting faults from cached history [C]//Proceedings of the 29th International Conference on Software Engineering [ 2 ] ZHOU J, ZHANG H, LO D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports [C]//Proceedings of the 2012 International Conference on Software Engineering. 2012: [ 3 ] NGUYEN A T, NGUYEN T T, AL-KOFAHI J, et al. A topic-based approach for narrowing the search space of buggy files from a bug report [C]//Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 2011: [ 4 ] ZHANG J, WANG X Y, HAO D, et al. A survey on bug-report analysis[j]. Science China, 2015, 58(2): [ 5 ] Hadoop Map/Reduce[EB/OL]. [ ]. [ 6 ] SUN X, LI B, LEUNG H, et al. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks[j]. Information & Software Technology, 2015, 66: [ 7 ] HUANG L, NG V, PERSING I, et al. AutoODC: Automated generation of orthogonal defect classifications [J]. Automated Software Engineering, 2015, 22(1): 3-46.

12 164 ( ) 2016 [ 8 ] THUNG F, LO D, JIANG L. Automatic defect categorization [C]//Proceedings of the th Working Conference on Reverse Engineering (WCRE). IEEE, 2012: [ 9 ] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation [J]. Journal of Machine Learning Research, 2003: [10] THOMAS S W. Mining software repositories using topic models [C]//Proceedings of the 33rd International Conference on Software Engineering. 2011: [11] MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to Information Retrieval [M]. Cambridge: Cambridge University Press, [12] KANUNGO T, MOUNT D M, NETANYAHU N S, et al. An efficient k-means clustering algorithm: Analysis and implementation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 24(7): [13] SI X S, HU C H, ZHOU Z J. Fault prediction model based on evidential reasoning approach [J]. Science China Information Sciences, 2010, 53(10): [14] LUKINS S K, KRAFT N A, ETZKORN L H. Bug localization using latent Dirichlet allocation [J]. Information & Software Technology, 2010, 52(9): [15] RAO S, KAK A. Retrieval from software libraries for bug localization: A comparative study of generic and composite text models [C]//Proceedings of the International Working Conference on Mining Software Repositories. 2011: [16] PINGCLASAI N, HATA H, MATSUMOTO K. Classifying bug reports to bugs and other requests using topic modeling [C]//Proceedings of the Asia-Pacific Software Engineering Conference. IEEE Computer Society, 2013: [17] RUNESON P, ALEXANDERSSON M, NYHOLM O. Detection of duplicate defect reports using natural language processing [C]//Proceedings of the 29th International Conference on Software Engineering. 2007: ( : ) ( 143 ) [14] LAMB A, FULLER M, VARADARAJAN R, et al. The vertica analytic database: C-store 7 years later [C]//Proceedings of the VLDB Endowment. 2012: [15] CHANG L, WANG Z, MA T, et al. Hawq: A massively parallel processing sql engine in hadoop [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data [16] STONEBRAKER M, WEISBERG A. The VoltDB main memory DBMS [J]. IEEE Data Eng Bull, 2013: [17] BRYANT R E, O HALLARON D R. [M]. :, [18] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system [J]. Communications of the ACM, 1976, 19(11): [19] STONEBRAKER M. One Size Fits None-(Everything You Learned in Your DBMS Class is Wrong) [R/OL]. ( )[ ]. [20] WEIKUM G, VOSSEN G. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery [M]. San Francisco: Morgan Kaufmann Publishers, [21] DIACONU C, FREEDMAN C, ISMERT E, et al. Hekaton: SQL server s memory-optimized OLTP engine [C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data [22] MICHAEL M M. High performance dynamic lock-free hash tables and list-based sets [C]//Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. 2002: [23] LAMPSON B W, STURGIS H E. Crash Recovery in a Distributed Data Storage System [R]. Palo Alto, California: Xerox Palo Alto Research Center, [24] SKEEN D. Nonblocking commit protocols [C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data [25] HAN J, HAIHONG E, LE G, et al. Survey on NoSQL database [C]//Proceedings of the th International Conference on Pervasive Computing and Applications. 2011: [26] O NEIL E J, O NEIL P E, WEIKUM G. The LRU-K page replacement algorithm for database disk buffering [C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: ( : )

Toward Lightweight Transparent Data Middleware in Support of Document Stores

Toward Lightweight Transparent Data Middleware in Support of Document Stores Toward Lightweight Transparent Data Middleware in Support of Document Stores Kun Ma, Ajith Abraham Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan,

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Semantic Concept Based Retrieval of Software Bug Report with Feedback

Semantic Concept Based Retrieval of Software Bug Report with Feedback Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

Big Data Storage Architecture Design in Cloud Computing

Big Data Storage Architecture Design in Cloud Computing Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

UPS battery remote monitoring system in cloud computing

UPS battery remote monitoring system in cloud computing , pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology

More information

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,

More information

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1

IN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 IN-MEMORY DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 Analytical Processing Today Separation of OLTP and OLAP Motivation Online Transaction Processing (OLTP)

More information

Query and Analysis of Data on Electric Consumption Based on Hadoop

Query and Analysis of Data on Electric Consumption Based on Hadoop , pp.153-160 http://dx.doi.org/10.14257/ijdta.2016.9.2.17 Query and Analysis of Data on Electric Consumption Based on Hadoop Jianjun 1 Zhou and Yi Wu 2 1 Information Science and Technology in Heilongjiang

More information

http://www.paper.edu.cn

http://www.paper.edu.cn 5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Classification Algorithms for Detecting Duplicate Bug Reports in Large Open Source Repositories

Classification Algorithms for Detecting Duplicate Bug Reports in Large Open Source Repositories Classification Algorithms for Detecting Duplicate Bug Reports in Large Open Source Repositories Sarah Ritchey (Computer Science and Mathematics) sritchey@student.ysu.edu - student Bonita Sharif (Computer

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing

Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing

More information

Augmented Search for Software Testing

Augmented Search for Software Testing Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

Applied research on data mining platform for weather forecast based on cloud storage

Applied research on data mining platform for weather forecast based on cloud storage Applied research on data mining platform for weather forecast based on cloud storage Haiyan Song¹, Leixiao Li 2* and Yuhong Fan 3* 1 Department of Software Engineering t, Inner Mongolia Electronic Information

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management

NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International

More information

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas. POSTGRAD PLACEMENTS COMPUTATIONAL FINANCE DATA SCIENCE AND ANALYTICS MACHINE LEARNING KEY INFORMATION Placements can start in the middle of June 2015 or later and must finish by the middle of June 2016

More information

Machine Learning Log File Analysis

Machine Learning Log File Analysis Machine Learning Log File Analysis Research Proposal Kieran Matherson ID: 1154908 Supervisor: Richard Nelson 13 March, 2015 Abstract The need for analysis of systems log files is increasing as systems

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

The WAMS Power Data Processing based on Hadoop

The WAMS Power Data Processing based on Hadoop Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore The WAMS Power Data Processing based on Hadoop Zhaoyang Qu 1, Shilin

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

Analysis of Software Project Reports for Defect Prediction Using KNN

Analysis of Software Project Reports for Defect Prediction Using KNN , July 2-4, 2014, London, U.K. Analysis of Software Project Reports for Defect Prediction Using KNN Rajni Jindal, Ruchika Malhotra and Abha Jain Abstract Defect severity assessment is highly essential

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

USC Viterbi School of Engineering

USC Viterbi School of Engineering USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Performance Analysis for NoSQL and SQL

Performance Analysis for NoSQL and SQL Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-issn: 2394-3343 p-issn: 2394-5494 Performance Analysis for NoSQL and SQL Ms. Megha Katkar ME

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs

More information

Search Engine Architecture

Search Engine Architecture Search Engine Architecture 1. Introduction This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Big Data and Analytics (Fall 2015)

Big Data and Analytics (Fall 2015) Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every

More information

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL

SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems

NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems Henrique S. C. Rocha 1, Guilherme A. de Oliveira 2, Humberto T. Marques-Neto 2, Marco Túlio O. Valente 1 1 Department of Computer Science

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

More information

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Big Data Research in the AMPLab: BDAS and Beyond

Big Data Research in the AMPLab: BDAS and Beyond Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned

More information

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through

More information

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data

More information

Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform

Open Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1463-1467 1463 Open Access Research on Database Massive Data Processing and Mining Method

More information

On a Hadoop-based Analytics Service System

On a Hadoop-based Analytics Service System Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Big Data. Lyle Ungar, University of Pennsylvania

Big Data. Lyle Ungar, University of Pennsylvania Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -

More information

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

TE's Analytics on Hadoop and SAP HANA Using SAP Vora TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING

A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING ASMAA IBRAHIM Technology Innovation and Entrepreneurship Center, Egypt aelrehim@itida.gov.eg MOHAMED EL NAWAWY Technology Innovation and Entrepreneurship

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

CS 564: DATABASE MANAGEMENT SYSTEMS

CS 564: DATABASE MANAGEMENT SYSTEMS Fall 2013 CS 564: DATABASE MANAGEMENT SYSTEMS 9/4/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Teaching Staff Instructor: Jignesh Patel, jignesh@cs.wisc.edu Office Hours: Mon, Wed 1:30-2:30

More information

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.

More information

Distributed Framework for Data Mining As a Service on Private Cloud

Distributed Framework for Data Mining As a Service on Private Cloud RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

More information

Approaches for parallel data loading and data querying

Approaches for parallel data loading and data querying 78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

Federated Cloud-based Big Data Platform in Telecommunications

Federated Cloud-based Big Data Platform in Telecommunications Federated Cloud-based Big Data Platform in Telecommunications Chao Deng dengchao@chinamobilecom Yujian Du duyujian@chinamobilecom Ling Qian qianling@chinamobilecom Zhiguo Luo luozhiguo@chinamobilecom Meng

More information

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution

Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution , pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik

More information

Systems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de

Systems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft

More information

On Big Data Benchmarking

On Big Data Benchmarking On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University r.han10@imperial.ac.uk, luxi@cse.ohio-state.edu Abstract Big data systems address

More information

Data Analytics Infrastructure

Data Analytics Infrastructure Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure

More information