2010 Exceptional Web Experience TECH-B21 Search Engine Optimization and WebSphere Portal - Best Practices Andreas Prokoph, Lead architect Search in Portal and WCM, Portal development (pkp@de.ibm.com) IBM Portal Excellence Conference July 19 22, 2010 Chicago, Illinois 2010 IBM Corporation
Agenda Introduction User patterns using Internet search engines What makes a webpage 'relevant'? What features does WebSphere Portal provide to support SEO Reference materials Questions answered... 2
The place to be... 2010 Exceptional Web Experience Conference
Why aren t our pages on the top in Google? asks your boss You did your best you thought... Even you were wondering what was going on Keywords are there Metadata polished beautiful URLs What did I miss???????
Search marketing versus organic search Organic search result Paid placement Paid search is advertising Organic search is the result of editorial efforts
Search and your Internet presence Typically two steps involved to attract traffic and keep/win users/customers: First step is to attract and get users to your website Internet Search Second step is once they are at your site, they might have already found what they were looking for, or even look for more Site search Google attracts users to a website If interesting enough they are likely to search for more information at that website
User patterns
The golden triangle from an eye-tracking study Aggregate map: All consumer search activity Red is most-viewed; black is unviewed. Source: Enquiro
Ideally this is where you want your page to score: in the Golden Triangle 1 2 3 1 2 Visitors view their results for an average of 6.3 seconds before clicking on a link. Just enough time to scan the first three to five results and the top one or two ads. Chances are: if your page doesn t rank at the top, it probably won t be seen.
What make a webpage a good webpage? Good content and information Good content and information focused content and information and finally.. interesting enough for others (external) to reference from their webpage(s)
Why do search engines keep their ranking metrics a secret? If the metrics would be published, too many would take advantage of them, trying to boost their pages In the end: Internet search engine would be worthless to the broad community!! bad example of the past: AltaVista emphasized on keywords in titles and metadata Within 1 year AltaVista was dead because it s search results quality was declining rapidly Imagine you were Google would you take chances to ruin your 150 Billion$ company? Or live up to user s expectations?
What does not work. 2010 Exceptional Web Experience Conference.. though some SEO consultants might say otherwise Metadata usage stuffing metadata fields like Keywords and Description with all kinds of (unrelated) keywords Alt tag stuffing used to describe what a certain image is about.. example: <img alt= windows, ABC consulting, windows, developer, tutorials, ABC consulting, developer, windows, tutorial, tutorial, tutorials, resources, windows, tutorials, developer" />.. and there is also: search engine friendly URLs a widespread misconception as to how search engines work for a crawler if it gets a return code '200', then that URL is OK
A word about metadata usage by Internet search engines Title and description information is important for the initial representation of a webpage in the search result list However: these two above and any others are not relevant in any way for determining the webpage's relevancy one could still argue about Title again: in the past that was one of the metrics documented by AltaVista and... we know how that went... An example for 'official' webmaster recommendations, see: http://googlewebmastercentral.blogspot.com/2007/12/answering-more-popular-picks-meta-tags.html
User friendly URLs.. why would you think it s relevant? When looking at the presentation of a Google search result, the assumption is that Google highlights whatever has contributed to a webpage s relevance ranking Truth is: the highlighting is straight forward and simply marks everything in bold which matches one of the keywords, assuming the user will then be able to better judge a page's relevance for himself...
How search engines work having discussed what DOESN'T work, We'll take a look at how relevancy IS determined
The following are the main metrics that get applied Page or document relevancy term frequency times inverted document frequency tf x idf To some degree Hypertext-Matching Analysis: analyzes the full content of a page and also analyzes the content of linked local pages PageRank link popularity - how important other Internet users think a specific page is Note that Google specifically has a set of more than 100 rules that potentially can get applied (for various purposes) Note further: if one of the rules is seen to be mis-used, then its very soon gone or gets corrected
Basic relevance about precision and recall Comparison between two search engines, one is real world, the other the ideal one good optimal Relationship between recall and precision. The Golden Rule
PageRank the obvious A Which webpage would you think is the more important one? B C
How PageRank is calculated 2010 Exceptional Web Experience Conference PageRank formula put simple: PR(A) = (1-d)/N + d (PR(T 1 )/C(T 1 ) +... + PR(T n )/C(T n )) Definition: PR(A) is PageRank of document A PR(T n ) are PageRank of document T ni, which includes a link to document A N is the count of qualifying documents C(T i ) total count of links on page T i and d is a confidence value, where 0 d 1 The confidence value d is defined as 0.85 (constant) PageRank calculation for the three pages shown on the left PR(A) = 0,15 + 0,85 * PR(C) PR(B) = 0,15 + 0,85 * (PR(A)/2) PR(C) = 0,15 + 0,85 * (PR(B) + PR(A)/2) PR(A) 1,16 PR(B) 0,64 PR(C) 1,19
Can I influence my PageRank? A $ If this page had a PageRank score of 2 and also 100 links on it, then a referenced (linked) page would only receive 2/100ths of its PageRank score which is.. just about nothing.. $ B $ With this: again: which is the more important webpage? C $ $
OK, so what can I do? Part 1 Ensure proper crawling of your website Redirects only if required! Don't even think of redirecting only crawlers What the crawler gets is the same as to what the user gets!! no JavaScript to generate content or URLs have good navigation e.g. crawlers like a Sitemap! Sitemaps 0.90 protocol support see also: WebSphere Portal Search Sitemap Utility portlet available through Portal Catalog
OK, so what can I do? Part 2 Publish appropriate content 2010 Exceptional Web Experience Conference not too little not too much information on a web page note that Flash objects and images might hide essential information from the crawlers Focus on what you want to tell your users or customers first! Then think about what keywords users (not you!) might choose to find similar content/information elsewhere reconsider in cases of mismatch to adjust the keywords on your web pages accordingly
What MORE can I do to improve my PageRanks? Let s face it: Not much! Seek relationships with trusted web sites and share information with one another Register your web site with Yahoo! Make your web pages easy to cross-link Note that even more web sites based on the platform, like WebSphere Portal have URLs which somehow reference the user s history (e.g. session object/id) consider making use of 'user friendly URLs' for users to pick up and use on their websites In the end: if the page is good, then others will refer to it, if not: they won t
A brief word to Sitemaps 0.90 protocol.. <?xml version="1.0" encoding="utf-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2019-06-09</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> </urlset> No it doesn't influence the PageRank score of that page
A brief word to Sitemaps 0.90 protocol..
How do I promote my pages? Question: how will users know what information there is at my web site if they don t find it easily (top) through Google? As mentioned in the beginning: this is also a two-phase approach: get the high-level pages in good shape with all the important keywords you have selected once users get to your homepage, they might be inspired to look for more information How: does you site have good search? If so: they ll most likely find the information if it s there! If what they find is good they might consider pointing to those pages from their web sites
WebSphere Portal and SEO enablement Search engine crawler awareness since Portal V6.0 Portal Server will externalize for the crawler normalized URLs URLs which do not maintain e.g. navigational state Sitemaps 0.90 protocol support www.sitemaps.org developerworks article: http://www.ibm.com/developerworks/library/xsitemaps/index.html Search Sitemap Utility portlet download: https://greenhouse.lotus.com/plugins/plugincatalog.nsf/assetdetails.xsp?action=editdocument&documentid=a1ff51d2c2e82cbe852576ab006ed 590 Remains: bookmark support for users to pickup normalized URLs
almost forgot.. for all those still insisting to get 'search engine friendly URLs'
search engine User friendly-urls 2010 Exceptional Web Experience Conference Friendly-URLs result in human readable URL prefixes that lead to portal pages Each content node might have a friendly name assigned The friendly-url is a hierarchical path constructed from these names based on the content topology (see URL mappings) Every URL that is generated by WP APIs will contain the friendly-path automatically It is even guaranteed that every URL that leads to a particular page will start with the page s friendly-path info Content Nodes root home shop shoes /wps/portal/home /wps/portal/home/shop /wps/portal/home/shop/shoes /wps/portal/home/shop/shoes/!ut/p/04_sb8k8xllm9ms...
Crawling the Portal 30
Public Portal pages and how Internet search engines work Portal Server recognizes the crawler and triggers URLs to be normalized. Web crawlers Homepage or Sitemap Search indexes Crawler follows hrefs only 'GET' requests no Javascript interpreted has an agent identifier no metadata used! 31
The fundamental problem for web-crawlers! Welcome page Page A Page C Welcome Page link! URL-A URL-D URL-C URL-B URL Portal encodes in URLs additional information about the navigational state of the user: like: which page he comes from and how he left it e.g. a portlet was maximized Page B URL-EURL URL Information encoded within URLs: URL-A Target: Page A, coming from Welcome page URL-B Target: Page B, coming from Welcome page URL-C Target: Page C, coming from Page A A crawler would want to assume: URL-A and URL-D to be identical URL-B and URL-E to be identical URL-D Target: Page A, coming from Page C URL-E Target: Page B, coming from Page C 32
A thing of the past - How Internet search engines had seen a Portal site This set of pages represents the structure of the Portal site. Web crawlers Search indexes This set of pages the crawler retrieves and assumes to be unique based on the link structure of the site. Result: a few thousand pages will grow into hundred-thousands the crawler might have to give-up.. no end of the site seen few or none pages will be indexed cross references from other sites is doable, but of no advantage (PageRank!) 33
WebSphere Portal V6 crawlability enablement! Portal Server recognizes the crawler and triggers URLs to be normalized. Web crawlers Normalized URL = all navigational state information is discarded from the URL Search indexes Result: no more duplicate pages all linked and public Portal pages are crawled and indexed 34
Crawlability enablement for Search Engines Crawler awareness - the Portal Server will recognize a crawler by its web agent identifier. A default list is available already covering the 50 most popular crawlers (via pattern matching - thus potentially more enabled). The Portal will then transform all URLs that are output on the pages as so-called normalized URLs, thus making them unique. In addition - action URLs are nullified, thus not allowing crawlers to execute actions such as 'delete document' or 'login', etc.. A Sitemap portlet a crawler can be pointed at. For efficiency reasons it might be advisable to also place appropriate robot directives into the theme to ensure that the crawler will only follow such links available in the Sitemap, and thus not having to re-evaluate links found on each of the pages. Search Engine Utility portlet provides support for the Sitemap 0.90 protocol. Supported by Google, Yahoo! and Microsoft Live Search In Summary Portal will provide the means to allow for a complete crawl of a portal site (public pages) and the tools to allow for adequate linking of portal pages from an external site to support PageRanks. 35
WebSphere Portal Search Engine Utility portlet See also http://www.ibm.com/developerworks/library/x-sitemaps/index.html Export the Sitemap to a Google Sitemap XML file. Export as Google Sitemap Click on the Browse button to specify where in the filesystem to store the output XML sitemap file. Added feature Export to Google Sitemap
Search Engine Utility portlet configuration/editing option Default values: Update frequency: Priority: 0.5 Weekly See Google Sitemap for details
Dynamic and personalized content what crawlers will not get hold of Crawlers try to not fetch dynamic or personalized content there might be spoofing involved!?! in the past this was the main reason for truncation of URLs after the first? What can be done: have the Web content management system generate a link list of the non-default (dynamic) content append or reference via the website s sitemap better: feed to the crawler using the Sitemaps 0.90 protocol
Summary WebSphere Portal allows for safe and efficient crawling of Portal sites Efficiency increased through support for Sitemaps 0.90 protocol Good pages are determined by its contents link popularity is the additional boost factor Consult the 'webmaster guides' that the search engine sites publish if a SEO consultant suggests metadata 'spaming' or 'pretty URLs' ask them for proof which is webpages applying such techniques (before and after), or it is officially documented by Google et al 'dynamic Portal URLs' prevent from getting adequate ranking as long as the crawler and (disguised user) get the same page OK
Excellent book on Search Engine Marketing! Search Engine Marketing, Inc. Driving Search Traffic to Your Company's Web Site, Mike Moran, Bill Hunt, IBM Press http://www.amazon.de/search-marketing-driving-traffic- Companys/dp/0131852922/ref=sr_1_1?ie=UTF8&s=books-intl-de&qid=1202128301&sr=1-1 Acknowledgement: Overview slides taken from Mike s SEO presentation developerworks articles Basics on SEO Part 1-4 http://www.ibm.com/developerworks/search/searchresults.jsp?searchtype=1&pagelang=& displaysearchscope=dw&searchsite=dw&lastuserquery1=search+engine+optimization&las tuserquery2=&lastuserquery3=&lastuserquery4=&query=search+engine+optimization+bas ics&searchscope=dw&go.x=0&go.y=0
For More Information (1) 2010 Exceptional Web Experience Conference WebSphere Portal IBM Site http://www-3.ibm.com/software/genservers/portal/ WebSphere Portal Information Center http://www.ibm.com/developerworks/websphere/zones/portal/proddoc.html WebSphere Portal Business Solutions Catalog (on Lotus Greenhouse) https://greenhouse.lotus.com/catalog/home_full.xsp?fproduct=websphere%20portal WebSphere and Lotus Web Content Management Portal Open Beta https://www14.software.ibm.com/iwm/web/cc/earlyprograms/lotus/portalopenbeta/ WebSphere Portal Blog https://www.ibm.com/developerworks/mydeveloperworks/blogs/websphereportal/ 41
For More Information (2 ) 2010 Exceptional Web Experience Conference IBM Lotus Connections http://www.ibm.com/software/lotus/products/connections IBM Lotus Forms http://www.ibm.com/software/lotus/forms IBM Lotus Quickr http://www.ibm.com/lotus/quickr IBM Lotus Sametime http://www.ibm.com/lotus/sametime WebSphere Commerce http://www.ibm.com/websphere/commerce WebSphere Process Server and Business Process Automation http://www.ibm.com/software/integration/wps 42
We Value Your Feedback! Please complete the session survey for this session: TECH-B21 Session Speakers: Andreas Prokoph 43
IBM Corporation 2010. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. 44 If you reference Linux in your presentation, please mark the first use and include the following; otherwise delete: