WHAT DEVELOPERS ARE TALKING ABOUT?

Size: px
Start display at page:

Download "WHAT DEVELOPERS ARE TALKING ABOUT?"

Transcription

1 WHAT DEVELOPERS ARE TALKING ABOUT? AN ANALYSIS OF STACK OVERFLOW DATA 1. Abstract We implemented a methodology to analyze the textual content of Stack Overflow discussions. We used latent Dirichlet allocation (LDA), a statistical topic modeling technique, to automatically discover the main topics present in developer discussions. We analyzed the discovered topics, as well as their relationships and trends over time, to gain insights into the development community. 2. Topic Modelling 2.1 Topic Model - LDA A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, we can expect particular words to appear in the document more or less frequently. Currently, Latent Dirichlet allocation (LDA), is one of the most common topic model in use. Basically, LDA is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. 2.2 MALLET MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. It includes sophisticated tools for document classification: efficient routines for converting text to "features", a wide variety of algorithms (including Naïve Bayes, Maximum Entropy, and Decision Trees), and code for evaluating classifier performance using several commonly used metrics. 3. Stack Overflow Data Set In this section, we discuss the relevance of Stack Overflow Data and the organization of data dump. 3.1 Stack Overflow In recent years, stackoverflow.com has become a major source of information for developer community. This Q & A site is quite popular among software developers and discussions on this website reflects upon the current usage or popularity of technologies. 3.2 Data Set Stack Overflow Data is publicly available under the Creative Commons license. The dataset is organised into five XML documents: badges.xml, comments.xml, posts.xml, users.xml and votes.xml. We were particularly interested in posts.xml, which contains posts information (questions

2 and answers with tags). We analysed the data set which spanned over 3 years from July, 2008 to June, Size of posts.xml for three years was around 10 GB and posed as a challenge in terms of parsing and processing it. 4. Method Overview Figure 1, depicts the various phases of data processing which are discussed in this section. FI G U R E 1 : O V E R A L L W O R K I N G O F PR O J EC T 4.1 Data extraction and pre-processing. Posts.xml was parsed using SAX XML Parser in python and content (title, tags and body) of the post were written to plain text files. Majority of the posts fall in the category of coding related discussions and hence contain code snippets. We remove all code snippets(if, while, etc) from the posts and utilize the remaining information in the post. Also, the content of the posts is present in html format and hence html tags were removed in order to get the actual text content of the post. 4.2 Topic Modeling The text files generated after data-extraction and preprocessing were then fed to the Topic Modeling component of MALLET. This package by default takes one input text file and performs topic modeling over that file, we modified this package to process large number of files and enable automatic discovery of files, given a directory name. Stop words list was also modified incrementally to include more technical stop words and to reduce noise. 4.3 Post processing Topic modeling was performed over quarterly data to generate trends discussed in section 6.3 and over the posts related to most popular topics to generated trends discussed in section Research Question 1 - Does a question in one topic trigger answers in another?

3 5.1 Motivation We investigate whether some topics are related to other topics in terms of questions and answers. This can help us identify closely-coupled topics, where questions in one topic tend to generate answers in seemingly unrelated topics. Moreover, this can help point out the cross-cutting areas of concerns for developers across different topics: problems so common that they span across multiple domains. For instance, if many questions regarding both mobile application development and web development generate answers related to user interfaces, it hints that user interface development is a cross-cutting concern faced by developers across two different platforms Solution The steps involved are: Finding Top K topics Generating Mappings(topic to question posts and question posts to answer posts) Get all answer posts for each topic in Top K posts Run topic modeling for each topic over the answer posts from above step Project the data collected in a comprehensible manner using data visualize 5.3. Results The entire space represents Stack Overflow. Each outer circle represents the topics and the size of each circle is proportional to the popularity of the topic. Each topic in-turn consists of lot of nested circles to represent the topics triggered from it. Again the size of each of them is proportional to the popularity of the topic. We have done some post processing to remove few obvious topics in each category which might not be of interest to our research question in context. For instance, topic java did generate topics like data structure, library usage and so on. But any language is bound to trigger activities in such areas. Hence, we added this step in post processing stage Analysis Few of the results shown above are surprising and very informative. Lets talk about the most trending topic Java. Java has triggered activities in areas like hibernate (ORM tool), SQL, etc. This information would be a good food for business analyst to figure out statistics like the most sought after ORM tool used with java, most used backend database with java and much more. One surprising stats is the blooming of github due to ruby on rails. Git hub is known to be gaining popularity in recent times, but this data analysis shows that more interactions have been triggered due to ruby-on-rails. Thus, this data analysis gives us a wholesome view of relation between various topics and to get an insight about the activities triggered in cross cutting areas of concern.

4 FI G U R E 2 : V I S U A L I Z A T I O N O F TH E RE S U L TS 6. How does developer interest change over time? 6.1. Motivation and Research Question By analyzing the rise and fall of interests in different topics, product developers will be able to assess the relative popularity of their products. This will also help in identifying marketing and research opportunities and trends. For example, if interest in.net Framework topic is rising while interest in Java topic is dropping, then companies, book publishers, and researchers might want to direct their attention to.net problems and challenges. The trend analysis also helps in reasoning about the rise or fall of certain topics in developer discussions.

5 6.2 Solution We divided the entire dataset into chunks of fixed time frames with each chunk covering posts over 3 months. Hence, we got 12 partitions over the entire data set covering 3 years in all. Topic modeling is performed for each chunk separately to find the trending topics. We also wish to analyze the temporal trends of topics. To do so, we define the impact of a topic z k in month m as where D(m) is the set of all posts over 3 months in context. The impact metric measures the relative θ (di, zk) proportion of posts related to that topic compared to the other topics in that particular time frame. represents the topic score of zk for the document di. All the statistics thus collected are projected in a 2- dimensional space where in, impact of a topic versus time is shown as below. We categorized the entire space into various meaningful categories so make our comparison more meaningful and comprehensible. Thus we had 4 different comparisons showing comparisons of different topics. Category 1 Programming Languages: Java, c++, Python Category 2 Web Technologies: JavaScript, php, Ruby-on-Rails, django and HTML/CSS Category 3 Application Development iphone application development and Android application development Category 4 General Trend Web Technologies, Server side Technologies and Mobile application development Last category is a more general comparison where in, we combined few topics put together to give a holistic idea of which layer of stack is trending more among developers. Thus, server-side technologies include.net framework, MySQL; web technologies include PHP, JavaScript, ruby-on-rails, HTML/CSS and django; and mobile technologies include iphone application development and Android application development. This analysis will give us an overall picture of the general trend among developers, whether developers are more interested in server-side development or web development or mobile application development.

6 Figure 3 Languages Java Green C++ - Blue Orange Python Light green PHP Figure 6 Yellow Web Technologies Blue Mobile Technologies Technology Domains Green Server Side Technologies 6.3. Results Above graph shows the comparison of web technologies over 3 years time frame with each plot representing a 3-month period. The graph shows that Web technologies is clearly the winner among the related all technology domains, as it remains the top player during most of the quarters. Thus the above analysis gives a good comparison of the popularity of various technologies among developers. It also helps us to reason out the highs and lows for a particular technology as explained in the next paragraph Real time events During this trend analysis, some of the technologies surfaced as trending at some particular point of time, this increased our curiosity to discover the reasons behind the sudden increase in the popularity of some of the technologies. Following is the list of trends which surfaced and their association with real time events:

7 1. iphone OS 2.0 SDK was released in March 2008 which led to iphone Application Development trending in Apr-June Rails version 2.3 (with major changes) was released in March 2009 leading to Ruby on Rails surfacing up in trends in Apr-June Adobe Flex version released in March 2010 and it started trending in Apr-June Challenges faced and Future Work One of the challenges which we faced was that of the Data size. The post.xml file was 10 GB. This took a lot of processing time. One more challenge which we faced was that, MALLET does not remove technical stop words from the data. In other words, there are technical words, which would not help in topic modeling, and are quite general in nature. To remove such kind of technical stopwords we used explicit codes. One more challenge which we faced was that of wrongly tagged questions. In stack over flow, the person who asks the questions has to tag it with keywords which are related to the question. There are chances of questions being wrongly tagged. Wrongly tagged questions create noise which is hard to eliminate. MALLET just gives us the set of keywords related to the topic, but it does not give us the name of the topic corresponding to the set of keywords. So, we had to manually go through all the keywords of a particular topic and name it accordingly. This process was arduous and time consuming. Also, there were few topics which had keywords which were general in nature, and made it difficult to name the specific topic. As future work we would like to extend our work to compare trends of specific technologies, and how interests in related/competing technologies differ over time. 8. Conclusion In this project, we implement a methodology to discover and quantify the topics and trends in Stack Overflow, a popular Q&A website with millions of active users. Our methodology is based on LDA, a widely-applied statistical topic model, which discovers topics from the textual content of Stack Overflow. We use various metrics to quantify the topics and their changes over time, which allows us to gain insight into the discussions in Stack Overflow. Our analysis provides an approximation of the wants and needs of the contemporary developer. Also, Our analysis can be used by the Stack Overflow team to better understand the content generated by its users. Knowing what topics are present, and which are popular at any given time, could help in the moderation of the website. 9. Source Code References [1] Anton Barua, Stephen W. Thomas, and Ahmed E. Hassan, "What are developers talking about? An analysis of topics and trends in Stack Overflow", Empirical Software Engineering, 2012 [2] MALLET

What Are Developers Talking About? An Analysis of Topics and Trends in Stack Overflow

What Are Developers Talking About? An Analysis of Topics and Trends in Stack Overflow Empirical Software Engineering manuscript No. (will be inserted by the editor) What Are Developers Talking About? An Analysis of Topics and Trends in Stack Overflow Anton Barua Stephen W. Thomas Ahmed

More information

A Manual Categorization of Android App Development Issues on Stack Overflow

A Manual Categorization of Android App Development Issues on Stack Overflow 2014 IEEE International Conference on Software Maintenance and Evolution A Manual Categorization of Android App Development Issues on Stack Overflow Stefanie Beyer Software Engineering Research Group University

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Visualization of Semantic Windows with SciDB Integration

Visualization of Semantic Windows with SciDB Integration Visualization of Semantic Windows with SciDB Integration Hasan Tuna Icingir Department of Computer Science Brown University Providence, RI 02912 hti@cs.brown.edu February 6, 2013 Abstract Interactive Data

More information

10CS73:Web Programming

10CS73:Web Programming 10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Data Mining Yelp Data - Predicting rating stars from review text

Data Mining Yelp Data - Predicting rating stars from review text Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University rchada@cs.stonybrook.edu Chetan Naik Stony Brook University cnaik@cs.stonybrook.edu ABSTRACT The majority

More information

Web Frameworks. web development done right. Course of Web Technologies A.A. 2010/2011 Valerio Maggio, PhD Student Prof.

Web Frameworks. web development done right. Course of Web Technologies A.A. 2010/2011 Valerio Maggio, PhD Student Prof. Web Frameworks web development done right Course of Web Technologies A.A. 2010/2011 Valerio Maggio, PhD Student Prof.ssa Anna Corazza Outline 2 Web technologies evolution Web frameworks Design Principles

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Active Learning SVM for Blogs recommendation

Active Learning SVM for Blogs recommendation Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the

More information

Senior Business Intelligence/Engineering Analyst

Senior Business Intelligence/Engineering Analyst We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey 1 Google Analytics for Robust Website Analytics Deepika Verma, Depanwita Seal, Atul Pandey 2 Table of Contents I. INTRODUCTION...3 II. Method for obtaining data for web analysis...3 III. Types of metrics

More information

Web 2.0 Technology Overview. Lecture 8 GSL Peru 2014

Web 2.0 Technology Overview. Lecture 8 GSL Peru 2014 Web 2.0 Technology Overview Lecture 8 GSL Peru 2014 Overview What is Web 2.0? Sites use technologies beyond static pages of earlier websites. Users interact and collaborate with one another Rich user experience

More information

2. Distributed Handwriting Recognition. Abstract. 1. Introduction

2. Distributed Handwriting Recognition. Abstract. 1. Introduction XPEN: An XML Based Format for Distributed Online Handwriting Recognition A.P.Lenaghan, R.R.Malyan, School of Computing and Information Systems, Kingston University, UK {a.lenaghan,r.malyan}@kingston.ac.uk

More information

Education. Relevant Courses

Education. Relevant Courses and s and s COMM/CS GPA: topsecret Developed application and designed logo: https://play.google.com/- store/apps/details?id=com.teamhex. colorbird Permanent Address 759 East 221 Street Apt. Website: 1B

More information

Braindumps.C2150-810.50 questions

Braindumps.C2150-810.50 questions Braindumps.C2150-810.50 questions Number: C2150-810 Passing Score: 800 Time Limit: 120 min File Version: 5.3 http://www.gratisexam.com/ -810 IBM Security AppScan Source Edition Implementation This is the

More information

Powerful. Flexible. Intelligent

Powerful. Flexible. Intelligent Powerful. Flexible. Intelligent The Highland Business Research Quick Guide to new features in Released 20 th October 2009 Google has just announced a range of new features available to Google Analytics

More information

idashboards FOR SOLUTION PROVIDERS

idashboards FOR SOLUTION PROVIDERS idashboards FOR SOLUTION PROVIDERS The idashboards team was very flexible, investing considerable time working with our technical staff to come up with the perfect solution for us. Scott W. Ream, President,

More information

Operationalise Predictive Analytics

Operationalise Predictive Analytics Operationalise Predictive Analytics Publish SPSS, Excel and R reports online Predict online using SPSS and R models Access models and reports via Android app Organise people and content into projects Monitor

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD

ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD ADHAWK WORKS ADVERTISING ANALTICS ON A DASHBOARD Mrs. Vijayalaxmi M. 1, Anagha Kelkar 2, Neha Puthran 2, Sailee Devne 2 Vice Principal 1, B.E. Students 2, Department of Information Technology V.E.S Institute

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other

More information

Syllabus INFO-GB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups)

Syllabus INFO-GB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups) Syllabus INFO-GB-3322 Design and Development of Web and Mobile Applications (Especially for Start Ups) Spring 2015 Stern School of Business Norman White, KMEC 8-88 Email: nwhite@stern.nyu.edu Phone: 212-998

More information

Trollhättan, Sweden. http://keryx.se/ http://twitter.com/itpastorn/ http://itpastorn.blogspot.com/

Trollhättan, Sweden. http://keryx.se/ http://twitter.com/itpastorn/ http://itpastorn.blogspot.com/ Trollhättan, Sweden Lars Gunther is a web developer, computer science teacher and a pastor, who lives in Trollhättan, Sweden. He is the lead editor of several courses for WaSP Interact and invited expert

More information

Crossreader. Open Positions

Crossreader. Open Positions Open Positions Crossreader CrossReader develops a Revolutionary product to enhance the mobile web experience by enabling content discovery and search in tablets and ereaders Job Title Team leader for application

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

A Study of Web Log Analysis Using Clustering Techniques

A Study of Web Log Analysis Using Clustering Techniques A Study of Web Log Analysis Using Clustering Techniques Hemanshu Rana 1, Mayank Patel 2 Assistant Professor, Dept of CSE, M.G Institute of Technical Education, Gujarat India 1 Assistant Professor, Dept

More information

Whitepapers at Amikelive.com

Whitepapers at Amikelive.com Brief Overview view on Web Scripting Languages A. Web Scripting Languages This document will review popular web scripting languages[1,2,12] by evaluating its history and current trends. Scripting languages

More information

MENDIX FOR MOBILE APP DEVELOPMENT WHITE PAPER

MENDIX FOR MOBILE APP DEVELOPMENT WHITE PAPER MENDIX FOR MOBILE APP DEVELOPMENT WHITE PAPER TABLE OF CONTENTS Market Demand for Enterprise Mobile Mobile App Development Approaches Native Apps Mobile Web Apps Hybrid Apps Mendix Vision for Mobile App

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS

FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE CONTENTS FIVE STEPS FOR DELIVERING SELF-SERVICE BUSINESS INTELLIGENCE TO EVERYONE Wayne Eckerson CONTENTS Know Your Business Users Create a Taxonomy of Information Requirements Map Users to Requirements Map User

More information

BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME

BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME BEST WEB PROGRAMMING LANGUAGES TO LEARN ON YOUR OWN TIME System Analysis and Design S.Mohammad Taheri S.Hamed Moghimi Fall 92 1 CHOOSE A PROGRAMMING LANGUAGE FOR THE PROJECT 2 CHOOSE A PROGRAMMING LANGUAGE

More information

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT

More information

Your Own Web Page: Quick and Dirty

Your Own Web Page: Quick and Dirty Your Own Web Page: Quick and Dirty A Special Language for the Web In the early 1990 s web pages were mostly described using a special purpose language, called Hyper- Text Markup Language, HTML HTML provides

More information

Bazaarvoice SEO implementation guide

Bazaarvoice SEO implementation guide Bazaarvoice SEO implementation guide TOC Contents Bazaarvoice SEO...3 The content you see is not what search engines see...3 SEO best practices for your review pages...3 Implement Bazaarvoice SEO...4 Verify

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2 DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.

More information

A Cost Effective GPS-GPRS Based Women Tracking System and Women Safety Application using Android Mobile

A Cost Effective GPS-GPRS Based Women Tracking System and Women Safety Application using Android Mobile A Cost Effective GPS-GPRS Based Women Tracking System and Women Safety Application using Android Mobile Devendra Thorat, Kalpesh Dhumal, Aniket Sadaphule, Vikas Arade B.E Computer Engineering, Navsahyadri

More information

Syllabus INFO-UB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups)

Syllabus INFO-UB-3322. Design and Development of Web and Mobile Applications (Especially for Start Ups) Syllabus INFO-UB-3322 Design and Development of Web and Mobile Applications (Especially for Start Ups) Fall 2014 Stern School of Business Norman White, KMEC 8-88 Email: nwhite@stern.nyu.edu Phone: 212-998

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR ccfong@umac.mo Abstract An e-government

More information

SEO Techniques for Higher Visibility LeadFormix Best Practices

SEO Techniques for Higher Visibility LeadFormix Best Practices Introduction How do people find you on the Internet? How will business prospects know where to find your product? Can people across geographies find your product or service if you only advertise locally?

More information

A review and analysis of technologies for developing web applications

A review and analysis of technologies for developing web applications A review and analysis of technologies for developing web applications Asha Mandava and Solomon Antony Murray state University Murray, Kentucky Abstract In this paper we review technologies useful for design

More information

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers

A Comparative Study on Vega-HTTP & Popular Open-source Web-servers A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Start up Jobs Germany FEB 2014

Start up Jobs Germany FEB 2014 Start up Jobs y FEB 2014 JOB TITLE LANGUAGE LOCATION REQUIREMENTS REF Lead English Berlin Lots of PHP, Magento, Zend, 80H PHPUnit, MySQL Snr ERP English Berlin Navision ERP development, Version 80I 2009

More information

Financial Trading System using Combination of Textual and Numerical Data

Financial Trading System using Combination of Textual and Numerical Data Financial Trading System using Combination of Textual and Numerical Data Shital N. Dange Computer Science Department, Walchand Institute of Rajesh V. Argiddi Assistant Prof. Computer Science Department,

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

COMPASS Database Work in 2014/15

COMPASS Database Work in 2014/15 COMPASS Database Work in 2014/15 Martin Bodlak Joined Czech Group, COMPASS Experiment at CERN 30 July 2015 COMPASS database servers in 888 PCCODB00 VIRTUAL ADDR PCCODB22 CLIENTS PCCODB21 PCCODB23 PCCODB20

More information

S3 Monitor Design and Implementation Plans

S3 Monitor Design and Implementation Plans S 3 Monitor Version 1.0 Specifications and Integration Plan 1 Copyright c 2011 Hewlett Packard Copyright c 2011 Purdue University Permission is hereby granted, free of charge, to any person obtaining a

More information

PROVIDING INSIGHT FOR OPERATIONAL SUCCESS

PROVIDING INSIGHT FOR OPERATIONAL SUCCESS idashboards for Operations Management PROVIDING INSIGHT FOR OPERATIONAL SUCCESS idashboards helped Precoat move from manual data mining and paper reports to a system that allows us to identify best practices

More information

www.expaway.com Offerte del 13 giugno 2014

www.expaway.com Offerte del 13 giugno 2014 www.expaway.com Offerte del 13 giugno 2014 TR1414A - SOFTWARE DEVELOPER/ ARCHITECT (GERLINGEN) Location: Gerlingen (9 km west of Stuttgart) Field of operation: Consumer Services Founded: 2011 and German

More information

VAT: SE556981-2265. Phone: +46 (0) 733443238

VAT: SE556981-2265. Phone: +46 (0) 733443238 Hello My name is Tord and I'm a freelancing programmer with six years of professional. I like native ios and Android programming as well as web development - including server configuration and maintenance.

More information

The Analysis of Online Communities using Interactive Content-based Social Networks

The Analysis of Online Communities using Interactive Content-based Social Networks The Analysis of Online Communities using Interactive Content-based Social Networks Anatoliy Gruzd Graduate School of Library and Information Science, University of Illinois at Urbana- Champaign, agruzd2@uiuc.edu

More information

Grow Revenues and Reduce Risk with Powerful Analytics Software

Grow Revenues and Reduce Risk with Powerful Analytics Software Grow Revenues and Reduce Risk with Powerful Analytics Software Overview Gaining knowledge through data selection, data exploration, model creation and predictive action is the key to increasing revenues,

More information

Co-evolving document collections and knowledge structures. CoDAK. Dr. Evgeny Knutov! ! (MSc Seminar Nov. 11 2013)

Co-evolving document collections and knowledge structures. CoDAK. Dr. Evgeny Knutov! ! (MSc Seminar Nov. 11 2013) Co-evolving document collections and knowledge structures CoDAK Dr. Evgeny Knutov (MSc Seminar Nov. 11 2013) The CoDAK project CoDAK: Co-evolving Document Collections and Knowledge Structures AgentschapNL:

More information

SENIOR WEB DEVELOPER

SENIOR WEB DEVELOPER SENIOR WEB DEVELOPER Belatrix s Software Developers play a vital role in helping our global clients to innovate and produce game changing software products. Using an Agile approach, Developers participate

More information

Welcome to the second half ofour orientation on Spotfire Administration.

Welcome to the second half ofour orientation on Spotfire Administration. Welcome to the second half ofour orientation on Spotfire Administration. In this presentation, I ll give a quick overview of the products that can be used to enhance a Spotfire environment: TIBCO Metrics,

More information

Boolean 101. The Recruiter s Guide to the Hunt for Top Talent AN EBOOK BY

Boolean 101. The Recruiter s Guide to the Hunt for Top Talent AN EBOOK BY Boolean 101 The Recruiter s Guide to the Hunt for Top Talent AN EBOOK BY Baffled by Boolean? We can help with that. Finding the right candidate for your open opportunity is no walk in the park. Sourcing

More information

Cleaned Data. Recommendations

Cleaned Data. Recommendations Call Center Data Analysis Megaputer Case Study in Text Mining Merete Hvalshagen www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 10 Bloomington, IN 47404, USA +1 812-0-0110

More information

Data Visualization in Ext Js 3.4

Data Visualization in Ext Js 3.4 White Paper Data Visualization in Ext Js 3.4 Ext JS is a client-side javascript framework for rapid development of cross-browser interactive Web applications using techniques such as Ajax, DHTML and DOM

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

Challenge 10 - Attack Visualization The Honeynet Project / Forensic Challenge 2011 / 2011-12-18

Challenge 10 - Attack Visualization The Honeynet Project / Forensic Challenge 2011 / 2011-12-18 Challenge 10 - Attack Visualization The Honeynet Project / Forensic Challenge 2011 / 2011-12-18 Fabian Fischer Data Analysis and Visualization Group University of Konstanz Data Preprocessing with & I wanted

More information

HTML5. Turn this page to see Quick Guide of CTTC

HTML5. Turn this page to see Quick Guide of CTTC Programming SharePoint 2013 Development Courses ASP.NET SQL TECHNOLGY TRAINING GUIDE Visual Studio PHP Programming Android App Programming HTML5 Jquery Your Training Partner in Cutting Edge Technologies

More information

Tool Support for Inspecting the Code Quality of HPC Applications

Tool Support for Inspecting the Code Quality of HPC Applications Tool Support for Inspecting the Code Quality of HPC Applications Thomas Panas Dan Quinlan Richard Vuduc Center for Applied Scientific Computing Lawrence Livermore National Laboratory P.O. Box 808, L-550

More information

Politecnico di Torino. Porto Institutional Repository

Politecnico di Torino. Porto Institutional Repository Politecnico di Torino Porto Institutional Repository [Proceeding] NEMICO: Mining network data through cloud-based data mining techniques Original Citation: Baralis E.; Cagliero L.; Cerquitelli T.; Chiusano

More information

Adaptive Context-sensitive Analysis for JavaScript

Adaptive Context-sensitive Analysis for JavaScript Adaptive Context-sensitive Analysis for JavaScript Shiyi Wei and Barbara G. Ryder Department of Computer Science Virginia Tech Blacksburg, VA, USA {wei, ryder}@cs.vt.edu Abstract Context sensitivity is

More information

Using Ruby on Rails for Web Development. Introduction Guide to Ruby on Rails: An extensive roundup of 100 Ultimate Resources

Using Ruby on Rails for Web Development. Introduction Guide to Ruby on Rails: An extensive roundup of 100 Ultimate Resources Using Ruby on Rails for Web Development Introduction Guide to Ruby on Rails: An extensive roundup of 100 Ultimate Resources Ruby on Rails 100 Success Secrets Copyright 2008 Notice of rights All rights

More information

DEVELOPMENT OF AN ANALYSIS AND REPORTING TOOL FOR ORACLE FORMS SOURCE CODES

DEVELOPMENT OF AN ANALYSIS AND REPORTING TOOL FOR ORACLE FORMS SOURCE CODES DEVELOPMENT OF AN ANALYSIS AND REPORTING TOOL FOR ORACLE FORMS SOURCE CODES by Çağatay YILDIRIM June, 2008 İZMİR CONTENTS Page PROJECT EXAMINATION RESULT FORM...ii ACKNOWLEDGEMENTS...iii ABSTRACT... iv

More information

1.Full-Time Positions Marketing and Project Consultant

1.Full-Time Positions Marketing and Project Consultant 1.Full-Time Positions Marketing and Project Consultant As Oursky grows from a team of 3 to 35, we have scaled up our development, design, project management and QA team. While it was impressive that we

More information

IMPLEMENTING HEALTHCARE DASHBOARDS FOR OPERATIONAL SUCCESS

IMPLEMENTING HEALTHCARE DASHBOARDS FOR OPERATIONAL SUCCESS idashboards for Healthcare IMPLEMENTING HEALTHCARE DASHBOARDS FOR OPERATIONAL SUCCESS idashboards gives me access to real-time actionable data from all areas of the hospital. Internally, the adoption rate

More information

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler Machine Learning and Data Mining Regression Problem (adapted from) Prof. Alexander Ihler Overview Regression Problem Definition and define parameters ϴ. Prediction using ϴ as parameters Measure the error

More information

Finding Execution Faults in Dynamic Web Application

Finding Execution Faults in Dynamic Web Application International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 5 (2014), pp. 445-452 International Research Publications House http://www. irphouse.com /ijict.htm Finding

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK E-mail: wataru.takase@kek.jp

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK E-mail: wataru.takase@kek.jp SCALA: A Framework for Graphical Operations for irods KEK E-mail: wataru.takase@kek.jp Adil Hasan University of Liverpool E-mail: adilhasan2@gmail.com Yoshimi Iida KEK E-mail: yoshimi.iida@kek.jp Francesca

More information

Technology Services...Ahead of Times. Enterprise Application on ipad

Technology Services...Ahead of Times. Enterprise Application on ipad Technology Services...Ahead of Times Enterprise Application on ipad Diaspark, 60/2 Babu Labhchand Chhajlani Marg, Indore M.P. (India) 452009 Overview This white paper talks about the capabilities of ipad

More information

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/

Model Deployment. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/ Model Deployment Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Model Deployment Creation of the model is generally not the end of the project.

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

I'M MICHAL I'M JANKOWSKI

I'M MICHAL I'M JANKOWSKI I'M MICHAL I'M JANKOWSKI.NET Enthusiast & Professional Developer.NET Enthusiast & Professional Developer ABOUT ME A small introduction about myself Michal Jankowski C# Desktop Developer With Passion Determined

More information

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS

PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS PREDICTING STUDENTS PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION ALGORITHMS Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao ABSTRACT Department of Computer Engineering, Fr.

More information

A Platform Independent Testing Tool for Automated Testing of Web Applications

A Platform Independent Testing Tool for Automated Testing of Web Applications A Platform Independent Testing Tool for Automated Testing of Web Applications December 10, 2009 Abstract Increasing complexity of web applications and their dependency on numerous web technologies has

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS This article looks into the benefits of using the Platform as a Service paradigm to develop applications on the cloud. It also compares a few top PaaS providers

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Working with telecommunications

Working with telecommunications Working with telecommunications Minimizing churn in the telecommunications industry Contents: 1 Churn analysis using data mining 2 Customer churn analysis with IBM SPSS Modeler 3 Types of analysis 3 Feature

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

PROVIDING INSIGHT FOR OPERATIONAL SUCCESS

PROVIDING INSIGHT FOR OPERATIONAL SUCCESS idashboards for Financial Services PROVIDING INSIGHT FOR OPERATIONAL SUCCESS We had a huge leap in account openings once staff could see how their sales compared to other staff and branches. They now have

More information