Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Size: px
Start display at page:

Download "Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker"

Transcription

1 CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation of a solution to the term clouds task. When our system receives a request via the REST interface, we begin by fetching the HTML file specified by the query URL. The HTML file is processed in order to extract the primary content of the page and to separate that content into several different groups according to HTML tags. Lists of unigrams, bigrams, and trigrams are built from the document, with initial weights assigned based on their frequency distribution and PMI scores. Cues from HTML formatting are used to boost the weighting of ngrams appearing in certain HTML tags. At last, these lists are combined and the weights are balanced in order to produce the final set of terms and weights, which is then returned via the REST service and rendered as a term cloud. REST Interface: Our system provides a REST interface to respond to queries from the project evaluation platform. To build the REST interface, we used the Django framework, which enables rapid development of Python web applications. Django also provides a simple web server to host such applications. We created a Django app that listens for GET requests conforming to the project specifications. When a request is received that is prefaced by the cloud prefix, the Django server initializes the web app we built. This app simply retrieves the URL from the GET request string, and passes this URL to a wrapper function serving as an entrypoint into the business logic of our system. This wrapper function ultimately returns an appropriately formatted string containing the terms and weights extracted from the HTML page located at the query URL, and this return string is delivered via an HTTP response through the REST interface back to the project evaluation platform to be rendered as a term cloud. HTML Parsing: First, we strip the single quotes from the URL parameter submitted via the GET request, and add the prefix if it is missing. We retrieve the HTML file located at the URL, and use the BeautifulSoup package to parse two versions of the HTML. The first parse tree is based on the original, complete HTML file; this version is used to extract terms from the HTML head elements. The second parse tree build by BeautifulSoup is based on the main textual content of the web page, as identified by the ArticleExtractor feature of the BoilerPipe web API. This parse is used to extract terms from the body of the HTML file.

2 Text Preprocessing: We organize the text of the web page into eight groupings from which we extract ngrams. These groups are based on the text contained in the original HTML tags. The eight groupings are: Title of the web page Description in meta tags Keywords in meta tags Headings in <H[1-6]> tags Text in hyperlinks Bold text in <B> tags Italicized text in <I> tags All text in the body of the web page (which contains all text from the other grouping except for the description & keywords) A series of preprocessing steps are performed on each group: 1. All remaining HTML tags are removed 2. Newline characters are stripped-out 3. Hypens appearing outside of hyphenated words are eliminated 4. All instances of the HTML double quote entity (") are removed 5. All commas are stripped-out, unless used as thousands separators in large numbers 6. All other non-alphanumeric characters not previously mentioned (plus spaces, slashes, and dollar signs) are wiped out 7. Tokens appearing in nltk.corpus.stopwords.words( english ) are removed, along with common contractions and other unwanted words (e.g., 'whose','said') The remaining raw text in each group is converted into a list of strings and passed to the ngram extraction module for further processing. Extracting Ngrams/Initial Weighting: Text from the Keywords, Headings and Body groups are combined together into one master list. This list is then processed using NLTK s Frequency Distribution, which creates a list of each unigram and their counts. Dividing the count of each unigram by the number of items in the master list results in the unigram s TF value. NLTK s bigram and trigram collocation finder are then used on the master list to produce a list for bigrams and trigrams, which are scored using the collocation module s PMI bigram measure and PMI trigram measure respectively. Initially, weighting was based on TF/IDF where TF was calculated as described above and IDF was obtained using Microsoft Ngram Service. Surprisingly, simple TF scores alone provided more reasonable results (once stopwords were removed) than TF/IDF based on ngram probabilities. It seems likely that including the IDF from Microsoft Ngram Service diluted the score based on statistics from documents of potentially many classes. Thus, we removed the IDF portion of the weighting and therefore the use of MS Ngram Service from our final system.

3 Re-weighting Ngrams (HTML Tags): Each of the three lists (one for unigrams, bigrams and trigrams) is compared against the eight groupings based on the original HTML tags. For ngrams found in the Title, Heading, Description or Keywords groups, a multiplier of 3 is applied to the TF value. A multiplier of 2 is applied to ngrams found in groups Links, Bolds and Italics. All other ngrams TF values are unchanged. For any ngram, only one of the two multipliers is applied and preference is given to the larger. We found these multipliers to be an accurate representation of the importance of text within the html markup tags. Normalizing TF: To normalize the weights across unigrams, bigrams and trigrams, another multiplier is applied. TF values for unigrams are multiplied by 1000 while TF values for bigrams are multiplied by 3. These two multipliers were found by trial and error using the test set and did the best to balance out the scores. These multipliers result in unigrams and bigrams being normalized to the trigrams TF values. Subsuming Ngrams: Any unigram that appears fully within a bigram or trigram is subsumed by that trigram. Bigrams, however, are considered subsumed by a trigram if the entire bigram appears within a trigram or either token of the bigram appears anywhere in the trigram. The subsuming ngram s weights are adjusted as follows: 1/10 th of the subsumed bigram s TF value is added to the subsuming trigram s TF value 1/20 th of the subsumed unigram s TF value is added to the trigram s TF value. 1/10 th of the subsumed unigram s TF value is added to the subsuming bigram s TF value Although the TF values are modified, no ngrams are removed from the lists at this time. In our first implementation we only considered bigrams being subsumed by a trigram if the entire bigram appeared in one trigram. In testing that design, we noticed our results had too much overlap, which limited the diversity of information our word cloud provided. Our current implementation, considering bigrams with one token appearing in a trigram as subsumed, results in a broader representation of the document. Combining Ngrams: Our final word cloud is comprised of 15 ngrams with the approximate distribution of 47% unigrams (7), 33% bigrams (5) and 20% trigrams (3).

4 The top three trigrams are identified first. The process selects the three trigrams with the highest TF values. These three trigrams are compared against each other to check for overlap. If any two of these trigrams share an exact token, the trigram with the lower TF score is removed from the list and half of its TF value is added to the trigram which contains the same token. The trigram with the next highest TF value from the complete list of trigrams is then added to the remaining top two trigrams for consideration and the process is repeated. The resulting list contains the top three trigrams based on TF scores that do not have any matching tokens. The entire list of bigrams is then iterated over, removing any bigram that is subsumed by one of the top trigrams (following the subsuming rules described in the section above). The top five bigrams are then identified using the same technique as used in identifying the top trigrams. Again, the resulting bigram list contains five bigrams based on TF scores which each have unique tokens. The entire list of unigrams is then iterated over, removing any unigram subsumed by either a top trigram or top bigram. The remaining top seven unigrams, based on TF score, are selected. The resulting 15 ngrams are sorted from maximum TF value to minimum TF value. This sorted list represents our final list of ngram terms and associated weights. Our first implementation did not consider repeat tokens within ngrams of the same size. As discussed in the subsuming ngrams section, this resulted in some cases where all three trigrams contained the same word. Since this limited the amount of information our word cloud would convey, we imposed the rules described above in our current implementation. The combination of not allowing repeat words within ngrams of the same size and removing smaller order ngrams which have one or more tokens in a larger order ngram ensures our final word cloud will be a better representation of the webpage content. Final Output: The final list of ngram terms and weights identified by the ngram extraction module is returned to the wrapper function, which constructs an appropriatelyformatted return string from this list. This return string is then delivered to the evaluation platform as an HTTP response via the REST interface, and is subsequently rendered as a term cloud by the Google TermCloud Visualization API. Sample Output: (using three example webpages) 1) clouds centering tags image hosting 58.4 tag cloud edit social software 58.3 used display non-tag blog aggregator 58.0 visual appearance 63.0 data 37.4 coupland microserfs 58.9 type 29.1 word 27.7 size 20.8 search 19.4 flickr 13.8 collocate 12.4

5 Boosted by HTML tags: [clouds, tag, cloud, edit, visual, appearance, coupland, microserfs, image, hosting, social, software, blog, aggregator, data, word, search, flickr, collocate] Subsumed ngrams: [tag cloud, clouds centering, centering tags, tag, cloud, clouds, centering, edit, appearance, visual, history, coupland, hosting, image, microserfs, blog aggregator, social, software] Overlapping ngrams: [tag clouds centering] 2) campaign launched university annualgiving udel edu faculty 90.3 staff encouraged participate 78.2 exam schedule 76.6 located online 76.0 diamonds society 65.0 library final 50.6 delaware 33.4 udid employee 30.0 year 23.4 gift 16.7 make 13.3 means 13.3 programs 13.3 Boosted by HTML tags: [university, faculty, staff, encouraged, participate, annualgiving, campaign, launched, delaware, diamonds, society, udel, edu] Subsumed ngrams: [encouraged participate, annualgiving udel, launched university, campaign launched, udel edu, annual giving] Overlapping ngrams: [staff campaign launched, faculty staff campaign, annual giving campaign, annualgiving udel, launched university faculty, university faculty staff, annual giving, staff campaign] 3) thailand believes trucks popular food truck maze infighting feel los realization street-food culture grilled cheese 78.0 becoming mainstream 76.5 hits kogi 65.1 scene 25.0 politics 15.0 business 8.3 choi 8.3 hiller 8.3 city 6.6 hot 6.6 Boosted by HTML tags: [trucks, food, truck, feel, los, maze, infighting, becoming, mainstream, grilled, cheese, hits, kogi, scene, culture, politics] Subsumed ngrams: [food truck, believes trucks, thailand believes, popular food, truck, food, trucks, believe, believes, popular, thailand, kogi, becoming, cheese, grilled, infighting, mainstream, maze, feel, fighting, grill, hits, los] Overlapping ngrams: [debate flashy trucks, launching trucks scene, food trucks scene, roadstoves launching trucks, trucks also placed, new-wave food trucks, flashy trucks generated, circus food trucks, food trucks also, generated food trucks, food trucks generated, trucks generated food, trucks la two, angeles food truck, food truck scene, food truck culture, food truck cultures, trucks scene infighting politics, mainstream maze]

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015

NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 NLP Lab Session Week 3 Bigram Frequencies and Mutual Information Scores in NLTK September 16, 2015 Starting a Python and an NLTK Session Open a Python 2.7 IDLE (Python GUI) window or a Python interpreter

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

CREATING AND EDITING CONTENT AND BLOG POSTS WITH THE DRUPAL CKEDITOR

CREATING AND EDITING CONTENT AND BLOG POSTS WITH THE DRUPAL CKEDITOR Drupal Website CKeditor Tutorials - Adding Blog Posts, Images & Web Pages with the CKeditor module The Drupal CKEditor Interface CREATING AND EDITING CONTENT AND BLOG POSTS WITH THE DRUPAL CKEDITOR "FINDING

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

WEBSITE MARKETING REVIEW

WEBSITE MARKETING REVIEW WEBSITE MARKETING REVIEW 46.2 Your website score Review of ampere-electricalservices.com Generated on July 23 2013 Introduction This report provides a review of the key factors that influence the SEO and

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

XML Processing and Web Services. Chapter 17

XML Processing and Web Services. Chapter 17 XML Processing and Web Services Chapter 17 Textbook to be published by Pearson Ed 2015 in early Pearson 2014 Fundamentals of http://www.funwebdev.com Web Development Objectives 1 XML Overview 2 XML Processing

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects

Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Enhancing the relativity between Content, Title and Meta Tags Based on Term Frequency in Lexical and Semantic Aspects Mohammad Farahmand, Abu Bakar MD Sultan, Masrah Azrifah Azmi Murad, Fatimah Sidi me@shahroozfarahmand.com

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad

More information

Administrator s Guide

Administrator s Guide SEO Toolkit 1.3.0 for Sitecore CMS 6.5 Administrator s Guide Rev: 2011-06-07 SEO Toolkit 1.3.0 for Sitecore CMS 6.5 Administrator s Guide How to use the Search Engine Optimization Toolkit to optimize your

More information

Visualization with Excel Tools and Microsoft Azure

Visualization with Excel Tools and Microsoft Azure Visualization with Excel Tools and Microsoft Azure Introduction Power Query and Power Map are add-ins that are available as free downloads from Microsoft to enhance the data access and data visualization

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou, Konstantinos Leymonis, Michalis Vazirgiannis Abstract Online advertising is a fast developing industry

More information

Course Scheduling Support System

Course Scheduling Support System Course Scheduling Support System Roy Levow, Jawad Khan, and Sam Hsu Department of Computer Science and Engineering, Florida Atlantic University Boca Raton, FL 33431 {levow, jkhan, samh}@fau.edu Abstract

More information

The Django web development framework for the Python-aware

The Django web development framework for the Python-aware The Django web development framework for the Python-aware Bill Freeman PySIG NH September 23, 2010 Bill Freeman (PySIG NH) Introduction to Django September 23, 2010 1 / 18 Introduction Django is a web

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

CHEAT SHEET GETTING KEYWORD IDEAS WWW.UNDERCOVERSTRATEGIST.COM

CHEAT SHEET GETTING KEYWORD IDEAS WWW.UNDERCOVERSTRATEGIST.COM CHEAT SHEET GETTING KEYWORD IDEAS WWW.UNDERCOVERSTRATEGIST.COM OVERVIEW Keywords or phrases in he context of a web search engine are those terms that a user enters into the search query field to find information

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Cache Configuration Reference

Cache Configuration Reference Sitecore CMS 6.2 Cache Configuration Reference Rev: 2009-11-20 Sitecore CMS 6.2 Cache Configuration Reference Tips and Techniques for Administrators and Developers Table of Contents Chapter 1 Introduction...

More information

Startup Guide. Version 2.3.9

Startup Guide. Version 2.3.9 Startup Guide Version 2.3.9 Installation and initial setup Your welcome email included a link to download the ORBTR plugin. Save the software to your hard drive and log into the admin panel of your WordPress

More information

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Folksonomies versus Automatic Keyword Extraction: An Empirical Study Folksonomies versus Automatic Keyword Extraction: An Empirical Study Hend S. Al-Khalifa and Hugh C. Davis Learning Technology Research Group, ECS, University of Southampton, Southampton, SO17 1BJ, UK {hsak04r/hcd}@ecs.soton.ac.uk

More information

Five Steps to Optimizing an ecommerce Site for Search Engines

Five Steps to Optimizing an ecommerce Site for Search Engines Five Steps to Optimizing an ecommerce Site for Search Engines A Systematic Approach to Implementing SEO on an ecommerce Website Whitepaper Written By: Tom Kuthy, Search Engine Optimization Expert, WSI

More information

User Data Analytics and Recommender System for Discovery Engine

User Data Analytics and Recommender System for Discovery Engine User Data Analytics and Recommender System for Discovery Engine Yu Wang Master of Science Thesis Stockholm, Sweden 2013 TRITA- ICT- EX- 2013: 88 User Data Analytics and Recommender System for Discovery

More information

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking

ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking ANNLOR: A Naïve Notation-system for Lexical Outputs Ranking Anne-Laure Ligozat LIMSI-CNRS/ENSIIE rue John von Neumann 91400 Orsay, France annlor@limsi.fr Cyril Grouin LIMSI-CNRS rue John von Neumann 91400

More information

Communicating with Web APIs

Communicating with Web APIs Chapter 24 Communicating with Web APIs Mobile technology and the ubiquitous nature of the Web have changed the world we live in. You can now sit in the park and do your banking, search Amazon.com to find

More information

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and Automated Data Collection with R A Practical Guide to Web Scraping and Text Mining Simon Munzert Department of Politics and Public Administration, Germany Christian Rubba University ofkonstanz, Department

More information

Puppet Firewall Module and Landb Integration

Puppet Firewall Module and Landb Integration Puppet Firewall Module and Landb Integration Supervisor: Steve Traylen Student: Andronidis Anastasios Summer 2012 1 Abstract During my stay at CERN as an intern, I had to complete two tasks that are related

More information

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation Credit-By-Assessment (CBA) Competency List Written Assessment Competency List Introduction to the Internet

More information

White Paper On. Single Page Application. Presented by: Yatin Patel

White Paper On. Single Page Application. Presented by: Yatin Patel White Paper On Single Page Application Presented by: Yatin Patel Table of Contents Executive Summary... 3 Web Application Architecture Patterns... 4 Common Aspects... 4 Model... 4 View... 4 Architecture

More information

Introducing our new Editor: Email Creator

Introducing our new Editor: Email Creator Introducing our new Editor: Email Creator To view a section click on any header below: Creating a Newsletter... 3 Create From Templates... 4 Use Current Templates... 6 Import from File... 7 Import via

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

itunes Store Publisher User Guide Version 1.1

itunes Store Publisher User Guide Version 1.1 itunes Store Publisher User Guide Version 1.1 Version Date Author 1.1 10/09/13 William Goff Table of Contents Table of Contents... 2 Introduction... 3 itunes Console Advantages... 3 Getting Started...

More information

77% 77% 42 Good Signals. 16 Issues Found. Keyword. Landing Page Audit. credit. discover.com. Put the important stuff above the fold.

77% 77% 42 Good Signals. 16 Issues Found. Keyword. Landing Page Audit. credit. discover.com. Put the important stuff above the fold. 42 Good Signals 16 Issues Found Page Grade Put the important stuff above the fold. SPEED SECONDS 0.06 KILOBYTES 17.06 REQUESTS 32 This page loads fast enough This size of this page is ok The number of

More information

Automatic Advertising Campaign Development

Automatic Advertising Campaign Development Matina Thomaidou, Kyriakos Liakopoulos, Michalis Vazirgiannis Athens University of Economics and Business April, 2011 Outline 1 2 3 4 5 Introduction Campaigns Online advertising is a form of promotion

More information

Taxi Service Design Description

Taxi Service Design Description Taxi Service Design Description Version 2.0 Page 1 Revision History Date Version Description Author 2012-11-06 0.1 Initial Draft DSD staff 2012-11-08 0.2 Added component diagram Leon Dragić 2012-11-08

More information

77 Top SEO Ranking Factors

77 Top SEO Ranking Factors 77 Top SEO Ranking Factors If you ve downloaded this resource, it suggests that you re likely looking to improve your website s search engine rankings and get more new customers for your business. Keep

More information

Power Tools for Pivotal Tracker

Power Tools for Pivotal Tracker Power Tools for Pivotal Tracker Pivotal Labs Dezmon Fernandez Victoria Kay Eric Dattore June 16th, 2015 Power Tools for Pivotal Tracker 1 Client Description Pivotal Labs is an agile software development

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

ITP 140 Mobile Technologies. Mobile Topics

ITP 140 Mobile Technologies. Mobile Topics ITP 140 Mobile Technologies Mobile Topics Topics Analytics APIs RESTful Facebook Twitter Google Cloud Web Hosting 2 Reach We need users! The number of users who try our apps Retention The number of users

More information

Introduction to Python for Text Analysis

Introduction to Python for Text Analysis Introduction to Python for Text Analysis Jennifer Pan Institute for Quantitative Social Science Harvard University (Political Science Methods Workshop, February 21 2014) *Much credit to Andy Hall and Learning

More information

Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model

Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model Condro Wibawa, Irwan Bastian, Metty Mustikasari Department of Information Systems, Faculty of Computer Science and

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

Web Programming. Robert M. Dondero, Ph.D. Princeton University

Web Programming. Robert M. Dondero, Ph.D. Princeton University Web Programming Robert M. Dondero, Ph.D. Princeton University 1 Objectives You will learn: The fundamentals of web programming... The hypertext markup language (HTML) Uniform resource locators (URLs) The

More information

SEO Analysis Guide CreatorSEO easy to use SEO tools

SEO Analysis Guide CreatorSEO easy to use SEO tools CreatorSEO Analysis Guide Updated: July 2010 Introduction This guide has been developed by CreatorSEO to help our clients manage their SEO campaigns. This guide will be updated regularly as the Search

More information

On-Site Search Engine Optimisation Tip Sheet Key Multimedia Ltd

On-Site Search Engine Optimisation Tip Sheet Key Multimedia Ltd On-Site Search Engine Optimisation Tip Sheet Key Multimedia Ltd Search Engine Optimisation is the process of optimising the pages within your website in order to achieve better rankings in the Search Engine

More information

MIS 510: Cyber Analytics Project

MIS 510: Cyber Analytics Project MIS 510: Cyber Analytics Project Team: Never Off Guard SUMEET BHATIA AADIL HUSSAINI SNEHAL NAVALAKHA MO ZHOU 1 Table of Contents Introduction... 2 Hacker Web... 3 Data Collection... 3 Research Question

More information

Make search become the internal function of Internet

Make search become the internal function of Internet Make search become the internal function of Internet Wang Liang 1, Guo Yi-Ping 2, Fang Ming 3 1, 3 (Department of Control Science and Control Engineer, Huazhong University of Science and Technology, WuHan,

More information

Twitter sentiment vs. Stock price!

Twitter sentiment vs. Stock price! Twitter sentiment vs. Stock price! Background! On April 24 th 2013, the Twitter account belonging to Associated Press was hacked. Fake posts about the Whitehouse being bombed and the President being injured

More information

Android Based Mobile Gaming Based on Web Page Content Imagery

Android Based Mobile Gaming Based on Web Page Content Imagery Spring 2011 CSIT691 Independent Project Android Based Mobile Gaming Based on Web Page Content Imagery TU Qiang qiangtu@ust.hk Contents 1. Introduction... 2 2. General ideas... 2 3. Puzzle Game... 4 3.1

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Cross Site Scripting Prevention

Cross Site Scripting Prevention Project Report CS 649 : Network Security Cross Site Scripting Prevention Under Guidance of Prof. Bernard Menezes Submitted By Neelamadhav (09305045) Raju Chinthala (09305056) Kiran Akipogu (09305074) Vijaya

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

SEO REFERENCE SHEET. Search Engine Optimization 101: How to get customers to find your website. (The Short Version) www.chaosmap.

SEO REFERENCE SHEET. Search Engine Optimization 101: How to get customers to find your website. (The Short Version) www.chaosmap. SEO REFERENCE SHEET Search Engine Optimization 101: How to get customers to find your website (The Short Version) www.chaosmap.com 1 Overview The Internet has become one of the single most important business

More information

DataPA OpenAnalytics End User Training

DataPA OpenAnalytics End User Training DataPA OpenAnalytics End User Training DataPA End User Training Lesson 1 Course Overview DataPA Chapter 1 Course Overview Introduction This course covers the skills required to use DataPA OpenAnalytics

More information

Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014

Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014 Machine Learning and Predictive Analytics Foster Growth Convert Edit Feb. 21 2014 By Janet Wagner, PW Staff Machine learning technology, which is defined in this ProgrammableWeb article, is starting to

More information

Micro blogs Oriented Word Segmentation System

Micro blogs Oriented Word Segmentation System Micro blogs Oriented Word Segmentation System Yijia Liu, Meishan Zhang, Wanxiang Che, Ting Liu, Yihe Deng Research Center for Social Computing and Information Retrieval Harbin Institute of Technology,

More information

Search Engine Optimization for Higher Education. An Ingeniux Whitepaper

Search Engine Optimization for Higher Education. An Ingeniux Whitepaper Search Engine Optimization for Higher Education An Ingeniux Whitepaper This whitepaper provides recommendations on how colleges and universities may improve search engine rankings by focusing on proper

More information

What is a Mobile Responsive Website?

What is a Mobile Responsive Website? More and more of your target audience is viewing websites using smart phones and tablets. What is a Mobile Responsive Website? Web Design is the process of creating a website to represent your business,

More information

Website Standards Association. Business Website Search Engine Optimization

Website Standards Association. Business Website Search Engine Optimization Website Standards Association Business Website Search Engine Optimization Copyright 2008 Website Standards Association Page 1 1. FOREWORD...3 2. PURPOSE AND SCOPE...4 2.1. PURPOSE...4 2.2. SCOPE...4 2.3.

More information

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support General principles and architecture of Adlib and Adlib API Petra Otten Manager Customer Support Adlib Database management program, mainly for libraries, museums and archives 1600 customers in app. 30 countries

More information

ITP 342 Mobile App Development. APIs

ITP 342 Mobile App Development. APIs ITP 342 Mobile App Development APIs API Application Programming Interface (API) A specification intended to be used as an interface by software components to communicate with each other An API is usually

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

60% 60% 32 Good Signals. 26 Issues Found. Keyword. Landing Page Audit. UK News. www.bbc.co.uk. Put the important stuff above the fold.

60% 60% 32 Good Signals. 26 Issues Found. Keyword. Landing Page Audit. UK News. www.bbc.co.uk. Put the important stuff above the fold. 32 Good Signals 26 Issues Found Page Grade Put the important stuff above the fold. SPEED SECONDS 3.7 KILOBYTES 1109.09 REQUESTS 40 This page should load quicker This size of this page is ok The number

More information

OpenText Information Hub (ihub) 3.1 and 3.1.1

OpenText Information Hub (ihub) 3.1 and 3.1.1 OpenText Information Hub (ihub) 3.1 and 3.1.1 OpenText Information Hub (ihub) 3.1.1 meets the growing demand for analytics-powered applications that deliver data and empower employees and customers to

More information

mdata from Mobile Commons enables organizations to make any data accessible to the public via text message, no programming required.

mdata from Mobile Commons enables organizations to make any data accessible to the public via text message, no programming required. mdata Web Services mdata from Mobile Commons enables organizations to make any data accessible to the public via text message, no programming required. How it Works 1. A user sends a text message with

More information

Introduction to Database Systems CSE 444. Lecture 24: Databases as a Service

Introduction to Database Systems CSE 444. Lecture 24: Databases as a Service Introduction to Database Systems CSE 444 Lecture 24: Databases as a Service CSE 444 - Spring 2009 References Amazon SimpleDB Website Part of the Amazon Web services Google App Engine Datastore Website

More information

What is a Mobile Responsive Website?

What is a Mobile Responsive Website? More and more of your target audience is viewing websites using smart phones and tablets. What is a Mobile Responsive Website? Web Design is the process of creating a website to represent your business,

More information

CS297 Report. JavaScript Game Engine for Mobile using HTML5

CS297 Report. JavaScript Game Engine for Mobile using HTML5 CS297 Report JavaScript Game Engine for Mobile using HTML5 by Nakul Vishwas Natu Nakul.natu@gmail.com Fall 2011 Advisor: Dr. Chris Pollett San José State University Department of Computer Science One Washington

More information

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract

Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation. Abstract Sentiment Analysis on Twitter with Stock Price and Significant Keyword Correlation Linhao Zhang Department of Computer Science, The University of Texas at Austin (Dated: April 16, 2013) Abstract Though

More information

Yandex: Webmaster Tools Overview and Guidelines

Yandex: Webmaster Tools Overview and Guidelines Yandex: Webmaster Tools Overview and Guidelines Agenda Introduction Register Features and Tools 2 Introduction What is Yandex Yandex is the leading search engine in Russia. It has nearly 60% market share

More information

Search Engine Optimisation (SEO)

Search Engine Optimisation (SEO) WEB DESIGN DIGITAL MARKETING BRANDING ADVERTISING Keyword Research Definitely number one on the list; your entire search engine optimisation programme will revolve around your chosen Keywords. Which search

More information

DIGITAL MARKETING BASICS: SEO

DIGITAL MARKETING BASICS: SEO DIGITAL MARKETING BASICS: SEO Search engine optimization (SEO) refers to the process of increasing website visibility or ranking visibility in a search engine's "organic" or unpaid search results. As an

More information

Sentiment Analysis for Movie Reviews

Sentiment Analysis for Movie Reviews Sentiment Analysis for Movie Reviews Ankit Goyal, a3goyal@ucsd.edu Amey Parulekar, aparulek@ucsd.edu Introduction: Movie reviews are an important way to gauge the performance of a movie. While providing

More information

Automatic Text Analysis Using Drupal

Automatic Text Analysis Using Drupal Automatic Text Analysis Using Drupal By Herman Chai Computer Engineering California Polytechnic State University, San Luis Obispo Advised by Dr. Foaad Khosmood June 14, 2013 Abstract Natural language processing

More information

Student Project 2 - Apps Frequently Installed Together

Student Project 2 - Apps Frequently Installed Together Student Project 2 - Apps Frequently Installed Together 42matters is a rapidly growing start up, leading the development of next generation mobile user modeling technology. Our solutions are used by big

More information

Site Files. Pattern Discovery. Preprocess ed

Site Files. Pattern Discovery. Preprocess ed Volume 4, Issue 12, December 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Review on

More information

72% 72% 42 Good Signals. 16 Issues Found. Keyword. Landing Page Audit. project management. basecamp.com/ Put the important stuff above the fold.

72% 72% 42 Good Signals. 16 Issues Found. Keyword. Landing Page Audit. project management. basecamp.com/ Put the important stuff above the fold. 42 Good Signals 16 Issues Found Page Grade Put the important stuff above the fold. SPEED SECONDS 2.3 KILOBYTES 689.43 REQUESTS 17 This page should load quicker This size of this page is ok The number of

More information

Java Application Developer Certificate Program Competencies

Java Application Developer Certificate Program Competencies Java Application Developer Certificate Program Competencies After completing the following units, you will be able to: Basic Programming Logic Explain the steps involved in the program development cycle

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Text Clustering Using LucidWorks and Apache Mahout

Text Clustering Using LucidWorks and Apache Mahout Text Clustering Using LucidWorks and Apache Mahout (Nov. 17, 2012) 1. Module name Text Clustering Using Lucidworks and Apache Mahout 2. Scope This module introduces algorithms and evaluation metrics for

More information

SEO Services Sample Proposal

SEO Services Sample Proposal SEO Services Sample Proposal Scroll down to see the rest of this truncated sample. When purchased, the complete sample is 18 pages long and was written using these Proposal Pack templates: Cover Letter,

More information

INTRODUCING AZURE SEARCH

INTRODUCING AZURE SEARCH David Chappell INTRODUCING AZURE SEARCH Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents Understanding Azure Search... 3 What Azure Search Provides...3 What s Required to

More information

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.

More information

What is a Mobile Responsive

What is a Mobile Responsive y and tablets. What is a Mobile Responsive Website? Web Design is the process of creating a website to represent your business, brand, products and services. It involves the planning and execution of many

More information

Client Side Binding of Dynamic Drop Downs

Client Side Binding of Dynamic Drop Downs International Journal of Scientific and Research Publications, Volume 5, Issue 9, September 2015 1 Client Side Binding of Dynamic Drop Downs Tanuj Joshi R&D Department, Syscom Corporation Limited Abstract-

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

How To Rank High In The Search Engines

How To Rank High In The Search Engines Search Engine Optimization Guide A Guide to Improving Website Rankings in the Search Engines Prepared by: Rosemary Brisco ToTheWeb LLC Sep 2007 Table of Contents WHY WORRY ABOUT SEARCH ENGINE MARKETING?...3

More information

7.22. YourDomain.com 800.555.1234 sales@yourdomain.com. Prepared by: Your Company Name 800.555.1234 sales@yourdomain.com

7.22. YourDomain.com 800.555.1234 sales@yourdomain.com. Prepared by: Your Company Name 800.555.1234 sales@yourdomain.com 8.555.1234 54 SEO SCORE 26 SEO SCORE SPEED SPEED 7.22 16 36 SECONDS KILOBYTES REQUESTS SECONDS KILOBYTES REQUESTS This page loads quickly enough. This page loads quickly enough. This size of this page

More information

SharePoint Integration Framework Developers Cookbook

SharePoint Integration Framework Developers Cookbook Sitecore CMS 6.3 to 6.6 and SIP 3.2 SharePoint Integration Framework Developers Cookbook Rev: 2013-11-28 Sitecore CMS 6.3 to 6.6 and SIP 3.2 SharePoint Integration Framework Developers Cookbook A Guide

More information

CS 558 Internet Systems and Technologies

CS 558 Internet Systems and Technologies CS 558 Internet Systems and Technologies Dimitris Deyannis deyannis@csd.uoc.gr 881 Heat seeking Honeypots: Design and Experience Abstract Compromised Web servers are used to perform many malicious activities.

More information

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping 3.1.1 Constants, variables and data types Understand what is mean by terms data and information Be able to describe the difference

More information

Adding Panoramas to Google Maps Using Ajax

Adding Panoramas to Google Maps Using Ajax Adding Panoramas to Google Maps Using Ajax Derek Bradley Department of Computer Science University of British Columbia Abstract This project is an implementation of an Ajax web application. AJAX is a new

More information

48% 48% 33 Good Signals. 25 Issues Found. Keyword. Landing Page Audit. financial advisor. www.chicagofinancialadvisers.com/

48% 48% 33 Good Signals. 25 Issues Found. Keyword. Landing Page Audit. financial advisor. www.chicagofinancialadvisers.com/ 33 Good Signals 25 Issues Found Page Grade Put the important stuff above the fold. SPEED SECONDS 6.94 KILOBYTES 2082.42 REQUESTS 45 This page should load quicker Reduce the page size The number of file

More information

49% 49% 30 Good Signals. 28 Issues Found. Keyword. Landing Page Audit. financial advisor. www.unitedcp.com/wa1/

49% 49% 30 Good Signals. 28 Issues Found. Keyword. Landing Page Audit. financial advisor. www.unitedcp.com/wa1/ 30 Good Signals 28 Issues Found Page Grade Put the important stuff above the fold. SPEED SECONDS 4.91 KILOBYTES 1472.05 REQUESTS 90 This page should load quicker This size of this page is ok Too many file

More information

Search Engine Marketing (SEM) with Google Adwords

Search Engine Marketing (SEM) with Google Adwords Search Engine Marketing (SEM) with Google Adwords Account Setup A thorough account setup will ensure that your search engine marketing efforts are on a solid framework. This ensures the campaigns, ad groups

More information

Field Properties Quick Reference

Field Properties Quick Reference Field Properties Quick Reference Data types The following table provides a list of the available data types in Microsoft Office Access 2007, along with usage guidelines and storage capacities for each

More information

REST web services. Representational State Transfer Author: Nemanja Kojic

REST web services. Representational State Transfer Author: Nemanja Kojic REST web services Representational State Transfer Author: Nemanja Kojic What is REST? Representational State Transfer (ReST) Relies on stateless, client-server, cacheable communication protocol It is NOT

More information

SEO Basics for Starters

SEO Basics for Starters SEO Basics for Starters Contents What is Search Engine Optimisation?...3 Why is Search Engine Optimisation important?... 4 How Search Engines Work...6 Google... 7 SEO - What Determines Your Ranking?...

More information

SEO 101. Learning the basics of search engine optimization. Marketing & Web Services

SEO 101. Learning the basics of search engine optimization. Marketing & Web Services SEO 101 Learning the basics of search engine optimization Marketing & Web Services Table of Contents SEARCH ENGINE OPTIMIZATION BASICS WHAT IS SEO? WHY IS SEO IMPORTANT? WHERE ARE PEOPLE SEARCHING? HOW

More information