DBugHelper: A Debug assistant tool for distributed systems
|
|
- Jeffry Montgomery
- 7 years ago
- Views:
Transcription
1 ( ) Journal of East China Normal University (Natural Science) No. 5 Sept : (2016) DBugHelper: Debug,,, (, ) :,, Debug., Debug, bug., bug bug, bug.,, bug. bug bug. Debug DBugHelper, bug. DBugHelper bug,, bug,. bug, Debug,. : ; Debug; bug ; : TP391 : A DOI: /j.issn DBugHelper: A Debug assistant tool for distributed systems ZHANG Yan-fei, ZHANG Chun-xi, LI Yu-ming, ZHANG Rong (Institute for Data Science and Engineering, Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai , China) Abstract: Development of large-scale distributed systems has experienced a long developing period. During the whole development cycle, debug is one of the most important steps. We meet the challenges of finding all the bugs and the corresponding solutions fixing bugs in a short time. Bug reports record bug histories and solutions, which provide a way to understand bug features and help to find solutions for new bugs. After we analyze the bug reports and fixed solutions, we find that there are strong correlation and similarity among many large-scale distributed systems. Thus the developing and fixing scheme of : : 863 (2015AA015307); ( , ); ( ) :,,,. yfzhang@stu.ecnu.edu.cn. :,,,,. rzhang@sei.ecnu.edu.cn.
2 154 ( ) 2016 bugs may have similar characteristics. Then existed fixing solutions of bugs can be used to assist fixing new bugs. In this paper, we propose DBugHelper, a debug helping tool which can be applied to boost the development of large-scale distributed systems and provide a more effective way to fix bugs. In DBugHelper, the existed bug reports are processed offline, and the latest bug report is represented as a query vector. We query the bug report history database and find the similar bugs with their solutions. In such way, we suppose to shorten the whole system development period. Key words: large-scale distributed system; Debug; bug report; assistance 0 Debug,., ;.,,, : Spark HBase Cassandra Hadoop ; CouchDB MongoDB. bug, Debug.,, Debug. Debug bug,, Debug., bug., bug Debug. Kim S [1] bug, bug ; Zhou J [2], Nguyen A [3] bug, bug. Debug, bug bug, bug [3-4]. bug, 6. bug bug, bug., APACHE bug , bug [5] ; MapReduce bug 3 898, bug 1 435, bug 711 [5]. bug, bug, bug bug, bug, bug bug., bug, bug bug, bug,.
3 5, : DBugHelper: Debug 155, bug, bug [6] ; bug [7] ; (Orthogonal Defect Classification, ODC),,, [8]. bug, bug. Bug (, bug ) bug bug, bug., : bug bug,, : Q1: bug bug Q2: bug bug, : (1) : bug. Latent Dirichlet Allocation (LDA) [9-10], bug, ; (2) : LDA,, Vector Space Model(VSM) ; (3) : bug. LDA VSM bug, : LDA, bug,,, bug, ; VSM, bug,. DBugHelper LDA VSM L VSM,, bug. : LDA VSM L VSM, bug. bug,,. Debug DBugHelper. bug, bug. : 1 bug ; 2 DBugHelper ; 3 ; 4 ; 5 ; 6. 1 Bug bug, : bug : bug, bug, bug,.
4 156 ( ) 2016, bug., bug. : bug, bug. bug, bug,., bug, bug. :. bug,,.,. DBugHelper, bug, bug bug., bug bug [2]..,. 2 Debug DBugHelper 2.1 DBugHelper 1 Debug DBugHelper. : 1) : ; 2) :. bug,. bug L VSM, bug,., bug bug, bug, bug 1 DBugHelper Fig. 1 DBugHelper overview
5 5, : DBugHelper: Debug 157. bug bug, bug. 2.2, bug, L VSM bug, bug. DBugHelper bug F R,, bug F. : : bug, bug T s, bug ; bug bug T g ; T (T s + T g ). bug T, bug F T, F w. : T w, T w θ(t w) w T, w T, w w c = [w c,1, w c,2,, w c,t ], T, c, w c,t : w c,t = θ t,c T k=1 θ k,c. (1) w w uniform = [1/ T, 1/ T, 1/ T,, 1/ T ], 1/ T, cosine w c w uniform : simi c ( w c, w uniform ) = cos w c, w uniform = w c w uniform w c w uniform, (2).,. w D. : bug F T, T F w, w θ T., F, D. bug,,,. LDA,,,,,.,,. : D bug d, D F bug bug, F D t. bug, w i,j (tf) (idf).,. VSM, tf idf : tf i,j = n i,j k n k,j D, idf i = log {j : t i d j }. (3)
6 158 ( ) 2016 (3) n i,j t i d j, d j. D D d j, t j. tf, VSM, VSM [11]., : tf i,j = log(n i,j ) + 1. (4) L VSM, (4) tf, t i w i,j : D w i,j = tf i,j idf i = (log(n i,j ) + 1) log {j : t i d j }, (5) d j V j : V j = (w 1,j, w 2,j, w 3,j,, w i,j ). (6) L VSM LDA Gibbs Sampling, (2) bug, VSM, (6) F, : F = { V 1, V 2, V 3,, V D }. (7) : bug,,. bug,, bug. Bug K-Means [12],, K-Means bug. F K(K D ), F K, : S = {F 1, F 2, F 3,, F K }. (8) K, K,, K, K. K-Means : E = F n=1 k=1 K r nk ( V n µ k ) 2. (9) (9) µ k S k, r nk V n S n, 1, 0. E V n S n, k, : n µ k = r nk V n n r. (10) nk bug B, B.
7 5, : DBugHelper: Debug bug,.,, 2. bug, bug D B; S n C n ; bug. 2 DBugHelper Fig. 2 Three layers for online processing in DBugHelper B C n, B C n S B, (2) cosine B : simi(b i, C n ) = cos B i, C n. (11) B C n B, bug bug bug,. bug F, B S B bug, bug, bug DBugHelper, ( 1 ). bug, bug. HDFS,, Hadoop. MapReduce Hadoop,. HBase Hadoop, HDFS, MapReduce. Hive Hadoop,.
8 160 ( ) Tab. 1 Study project bugs bugs HDFS An open source distributed file system 17/Mar/09 1/May/ MapReduce An open source programming model 22/Jun/07 29/April/ HBase An open source distributed database 1/Feb/08 5/May/ Hive An open source distributed data warehouse 30/Oct/08 19/April/ , bug ( BugZilla APACHE ) bug, DBugHelper bug. bug, : (1) Bug. bug ( : MapReduce job can infinitely increase number of reducer resource requests, Bug, Blocker, 2.8.0, None ). (2) Bug. Debug bug,. 3.3 : Q1: DBugHelper bug, bug, bug DBugHelper. bug. Top N(N=1, 5, 10, 15, 20) bug, bug. 100 bug, bug (Accuracy). Q2: L VSM bug 2, L VSM, bug. L VSM, VSM, DBugHelper L VSM VSM, Q1 DBugHelper bug, L VSM. Q3: DBugHelper bug 2,,,, bug., (TA) ( ),. 3.4 DBugHelper bug, : Accuracy ( ) bug DBugHelper Top N(N=1,5,10,,n)., bug.,. TA( / ), Bug. TA DBugHelper,.
9 5, : DBugHelper: Debug 161 : TA = M P Time j j=1 M Accuracy (12) j=1 Time Top N bug ; M Top N bug ; Accuracy DBugHelper Top N bug. 4 M P Time j : Q1: DBugHelper bug 2 DBugHelper Top N bug bug bug, bug Top N. bug bug ( 100 bug ),. 100 HDFS bug, Top 10 44, 44%. Hive,, Top 20 68, 68%. N, DBugHelper. 2 DBugHelper Tab. 2 DBugHelper accuracy Top 1/% Top 5/% Top 10/% Top 15/% Top 20/% HDFS MapReduce HBase Hive Q2: L VSM bug DBugHelper, L VSM, bug. 3 VSM L VSM DBugHelper. HDFS MapReduce bug, Q1,, L VSM bug. 3, L VSM Top N(N=1, 5, 15, 20) bug VSM. HDFS MapReduce, L VSM VSM. VSM DBugHelper N, L VSM., L VSM bug bug. 3 VSM L VSM DBugHelper Tab. 3 Accuracy comparison between VSM and L VSM in DBugHelper VSM Top 1/% Top 5/% Top 10/% Top 15/% Top 20/% HDFS MapReduce L VSM Classic VSM L VSM Classic VSM
10 162 ( ) 2016 Q3: DBugHelper bug 4 Q1, Top N. HDFS 100 bug, 100 Top N(N=1, 5, 15, 20) bug,, bug 2. 4 DBugHelper Tab. 4 Execution time in DBugHelper Top 1/s Top 5/s Top 10/s Top 15/s Top 20/s HDFS MapReduce HBase Hive (12), TA DBugHelper. 3, TA N. Top 1 Top 5, ; Top 5 Top 20,.,. TA, DBugHelper. TA,, bug. DBugHelper, L VSM bug. VSM, L VSM ;,, Debug DBugHelper. 3 DBugHelper Fig. 3 The efficiency of DBugHelper 5, Debug. Bug, [13]. bug, bug bug. Lukins LDA bug, bug [14]. LDA bug,. DBugHelper bug, bug,., bug,. DBugHelper bug,, bug
11 5, : DBugHelper: Debug 163., bug bug. DBugHelper LDA VSM Bug, Rao [15] Unigram Model(UM), Vector Space Model(VSM), Latent Semantic Analysis Model(LSA), Latent Dirichlet Allocation Model(LDA) Cluster Based Document Model (CBDM), UM VSM. LDA VSM Bug,.., bug. bug bug, bug [16], bug [17], bug [1].,, bug,,., bug Debug. 6 bug, bug, bug., bug, bug,, bug., Debug. bug,, Debug DBugHelper. L VSM bug bug,., 4, DBugHelper bug., bug. DBugHelper,. [ ] [ 1 ] KIM S, ZIMMERMANN T, WHITEHEADE E J, et al. Predicting faults from cached history [C]//Proceedings of the 29th International Conference on Software Engineering [ 2 ] ZHOU J, ZHANG H, LO D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports [C]//Proceedings of the 2012 International Conference on Software Engineering. 2012: [ 3 ] NGUYEN A T, NGUYEN T T, AL-KOFAHI J, et al. A topic-based approach for narrowing the search space of buggy files from a bug report [C]//Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 2011: [ 4 ] ZHANG J, WANG X Y, HAO D, et al. A survey on bug-report analysis[j]. Science China, 2015, 58(2): [ 5 ] Hadoop Map/Reduce[EB/OL]. [ ]. [ 6 ] SUN X, LI B, LEUNG H, et al. MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks[j]. Information & Software Technology, 2015, 66: [ 7 ] HUANG L, NG V, PERSING I, et al. AutoODC: Automated generation of orthogonal defect classifications [J]. Automated Software Engineering, 2015, 22(1): 3-46.
12 164 ( ) 2016 [ 8 ] THUNG F, LO D, JIANG L. Automatic defect categorization [C]//Proceedings of the th Working Conference on Reverse Engineering (WCRE). IEEE, 2012: [ 9 ] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation [J]. Journal of Machine Learning Research, 2003: [10] THOMAS S W. Mining software repositories using topic models [C]//Proceedings of the 33rd International Conference on Software Engineering. 2011: [11] MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to Information Retrieval [M]. Cambridge: Cambridge University Press, [12] KANUNGO T, MOUNT D M, NETANYAHU N S, et al. An efficient k-means clustering algorithm: Analysis and implementation [J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 24(7): [13] SI X S, HU C H, ZHOU Z J. Fault prediction model based on evidential reasoning approach [J]. Science China Information Sciences, 2010, 53(10): [14] LUKINS S K, KRAFT N A, ETZKORN L H. Bug localization using latent Dirichlet allocation [J]. Information & Software Technology, 2010, 52(9): [15] RAO S, KAK A. Retrieval from software libraries for bug localization: A comparative study of generic and composite text models [C]//Proceedings of the International Working Conference on Mining Software Repositories. 2011: [16] PINGCLASAI N, HATA H, MATSUMOTO K. Classifying bug reports to bugs and other requests using topic modeling [C]//Proceedings of the Asia-Pacific Software Engineering Conference. IEEE Computer Society, 2013: [17] RUNESON P, ALEXANDERSSON M, NYHOLM O. Detection of duplicate defect reports using natural language processing [C]//Proceedings of the 29th International Conference on Software Engineering. 2007: ( : ) ( 143 ) [14] LAMB A, FULLER M, VARADARAJAN R, et al. The vertica analytic database: C-store 7 years later [C]//Proceedings of the VLDB Endowment. 2012: [15] CHANG L, WANG Z, MA T, et al. Hawq: A massively parallel processing sql engine in hadoop [C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data [16] STONEBRAKER M, WEISBERG A. The VoltDB main memory DBMS [J]. IEEE Data Eng Bull, 2013: [17] BRYANT R E, O HALLARON D R. [M]. :, [18] ESWARAN K P, GRAY J N, LORIE R A, et al. The notions of consistency and predicate locks in a database system [J]. Communications of the ACM, 1976, 19(11): [19] STONEBRAKER M. One Size Fits None-(Everything You Learned in Your DBMS Class is Wrong) [R/OL]. ( )[ ]. [20] WEIKUM G, VOSSEN G. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery [M]. San Francisco: Morgan Kaufmann Publishers, [21] DIACONU C, FREEDMAN C, ISMERT E, et al. Hekaton: SQL server s memory-optimized OLTP engine [C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data [22] MICHAEL M M. High performance dynamic lock-free hash tables and list-based sets [C]//Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures. 2002: [23] LAMPSON B W, STURGIS H E. Crash Recovery in a Distributed Data Storage System [R]. Palo Alto, California: Xerox Palo Alto Research Center, [24] SKEEN D. Nonblocking commit protocols [C]//Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data [25] HAN J, HAIHONG E, LE G, et al. Survey on NoSQL database [C]//Proceedings of the th International Conference on Pervasive Computing and Applications. 2011: [26] O NEIL E J, O NEIL P E, WEIKUM G. The LRU-K page replacement algorithm for database disk buffering [C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. 1993: ( : )
Toward Lightweight Transparent Data Middleware in Support of Document Stores
Toward Lightweight Transparent Data Middleware in Support of Document Stores Kun Ma, Ajith Abraham Shandong Provincial Key Laboratory of Network Based Intelligent Computing University of Jinan, Jinan,
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationSemantic Concept Based Retrieval of Software Bug Report with Feedback
Semantic Concept Based Retrieval of Software Bug Report with Feedback Tao Zhang, Byungjeong Lee, Hanjoon Kim, Jaeho Lee, Sooyong Kang, and Ilhoon Shin Abstract Mining software bugs provides a way to develop
More informationBUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:
More informationBig Data Storage Architecture Design in Cloud Computing
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationCleveland State University
Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More informationUPS battery remote monitoring system in cloud computing
, pp.11-15 http://dx.doi.org/10.14257/astl.2014.53.03 UPS battery remote monitoring system in cloud computing Shiwei Li, Haiying Wang, Qi Fan School of Automation, Harbin University of Science and Technology
More informationAn efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi
International Conference on Applied Science and Engineering Innovation (ASEI 2015) An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi Institute of Computer Forensics,
More informationIN-MEMORY DATABASE SYSTEMS. Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1
IN-MEMORY DATABASE SYSTEMS Prof. Dr. Uta Störl Big Data Technologies: In-Memory DBMS - SoSe 2015 1 Analytical Processing Today Separation of OLTP and OLAP Motivation Online Transaction Processing (OLTP)
More informationQuery and Analysis of Data on Electric Consumption Based on Hadoop
, pp.153-160 http://dx.doi.org/10.14257/ijdta.2016.9.2.17 Query and Analysis of Data on Electric Consumption Based on Hadoop Jianjun 1 Zhou and Yi Wu 2 1 Information Science and Technology in Heilongjiang
More informationhttp://www.paper.edu.cn
5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationInternational Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationClassification Algorithms for Detecting Duplicate Bug Reports in Large Open Source Repositories
Classification Algorithms for Detecting Duplicate Bug Reports in Large Open Source Repositories Sarah Ritchey (Computer Science and Mathematics) sritchey@student.ysu.edu - student Bonita Sharif (Computer
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationData Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority
More informationJournal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392. Research Article. E-commerce recommendation system on cloud computing
Available online www.jocpr.com Journal of Chemical and Pharmaceutical Research, 2015, 7(3):1388-1392 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 E-commerce recommendation system on cloud computing
More informationAugmented Search for Software Testing
Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationActian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationApplied research on data mining platform for weather forecast based on cloud storage
Applied research on data mining platform for weather forecast based on cloud storage Haiyan Song¹, Leixiao Li 2* and Yuhong Fan 3* 1 Department of Software Engineering t, Inner Mongolia Electronic Information
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationDealing with Data Especially Big Data
Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing
More informationText Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
More informationNewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management
NewSQL: Towards Next-Generation Scalable RDBMS for Online Transaction Processing (OLTP) for Big Data Management A B M Moniruzzaman Department of Computer Science and Engineering, Daffodil International
More informationPOSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.
POSTGRAD PLACEMENTS COMPUTATIONAL FINANCE DATA SCIENCE AND ANALYTICS MACHINE LEARNING KEY INFORMATION Placements can start in the middle of June 2015 or later and must finish by the middle of June 2016
More informationMachine Learning Log File Analysis
Machine Learning Log File Analysis Research Proposal Kieran Matherson ID: 1154908 Supervisor: Richard Nelson 13 March, 2015 Abstract The need for analysis of systems log files is increasing as systems
More informationIT services for analyses of various data samples
IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationThe WAMS Power Data Processing based on Hadoop
Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore The WAMS Power Data Processing based on Hadoop Zhaoyang Qu 1, Shilin
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationWhat is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationAnalysis of Software Project Reports for Defect Prediction Using KNN
, July 2-4, 2014, London, U.K. Analysis of Software Project Reports for Defect Prediction Using KNN Rajni Jindal, Ruchika Malhotra and Abha Jain Abstract Defect severity assessment is highly essential
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationUSC Viterbi School of Engineering
USC Viterbi School of Engineering INF 551: Foundations of Data Management Units: 3 Term Day Time: Spring 2016 MW 8:30 9:50am (section 32411D) Location: GFS 116 Instructor: Wensheng Wu Office: GER 204 Office
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationPerformance Analysis for NoSQL and SQL
Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-issn: 2394-3343 p-issn: 2394-5494 Performance Analysis for NoSQL and SQL Ms. Megha Katkar ME
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationApplication and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
More informationSearch Engine Architecture
Search Engine Architecture 1. Introduction This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationBig Data and Analytics (Fall 2015)
Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every
More informationSEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL
SEARCH ENGINE WITH PARALLEL PROCESSING AND INCREMENTAL K-MEANS FOR FAST SEARCH AND RETRIEVAL Krishna Kiran Kattamuri 1 and Rupa Chiramdasu 2 Department of Computer Science Engineering, VVIT, Guntur, India
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationNextBug: A Tool for Recommending Similar Bugs in Open-Source Systems
NextBug: A Tool for Recommending Similar Bugs in Open-Source Systems Henrique S. C. Rocha 1, Guilherme A. de Oliveira 2, Humberto T. Marques-Neto 2, Marco Túlio O. Valente 1 1 Department of Computer Science
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationDataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations
Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets
More informationSEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.
More informationBIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More informationBig Data Research in the AMPLab: BDAS and Beyond
Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationBig Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationHorizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
More informationHadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationE6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data
More informationOpen Access Research on Database Massive Data Processing and Mining Method based on Hadoop Cloud Platform
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2014, 6, 1463-1467 1463 Open Access Research on Database Massive Data Processing and Mining Method
More informationOn a Hadoop-based Analytics Service System
Int. J. Advance Soft Compu. Appl, Vol. 7, No. 1, March 2015 ISSN 2074-8523 On a Hadoop-based Analytics Service System Mikyoung Lee, Hanmin Jung, and Minhee Cho Korea Institute of Science and Technology
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationThis Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationA STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING
A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING ASMAA IBRAHIM Technology Innovation and Entrepreneurship Center, Egypt aelrehim@itida.gov.eg MOHAMED EL NAWAWY Technology Innovation and Entrepreneurship
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationCS 564: DATABASE MANAGEMENT SYSTEMS
Fall 2013 CS 564: DATABASE MANAGEMENT SYSTEMS 9/4/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Teaching Staff Instructor: Jignesh Patel, jignesh@cs.wisc.edu Office Hours: Mon, Wed 1:30-2:30
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationDistributed Framework for Data Mining As a Service on Private Cloud
RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &
More informationApproaches for parallel data loading and data querying
78 Approaches for parallel data loading and data querying Approaches for parallel data loading and data querying Vlad DIACONITA The Bucharest Academy of Economic Studies diaconita.vlad@ie.ase.ro This paper
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationFederated Cloud-based Big Data Platform in Telecommunications
Federated Cloud-based Big Data Platform in Telecommunications Chao Deng dengchao@chinamobilecom Yujian Du duyujian@chinamobilecom Ling Qian qianling@chinamobilecom Zhiguo Luo luozhiguo@chinamobilecom Meng
More informationDevelopment of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution
, pp. 93-102 http://dx.doi.org/10.14257/ijseia.2015.9.7.10 Development of Real-time Big Data Analysis System and a Case Study on the Application of Information in a Medical Institution Mi-Jin Kim and Yun-Sik
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationOn Big Data Benchmarking
On Big Data Benchmarking 1 Rui Han and 2 Xiaoyi Lu 1 Department of Computing, Imperial College London 2 Ohio State University r.han10@imperial.ac.uk, luxi@cse.ohio-state.edu Abstract Big data systems address
More informationData Analytics Infrastructure
Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure
More information