Integrating the Google Search Appliance with WebSphere Portal and Lotus Web Content Management

Similar documents
Crawl Proxy Installation and Configuration Guide

The Challenges of Web single sign-on

Rich Media & HD Video Streaming Integration with Brightcove

New Single Sign-on Options for IBM Lotus Notes & Domino IBM Corporation

SharePoint 2013 Logical Architecture

Deploying the BIG-IP LTM and APM with Citrix XenApp or XenDesktop

nexus Hybrid Access Gateway

IBM Software Group Thought Leadership Whitepaper. IBM Customer Experience Suite and Enterprise Search Optimization

Identity Server Guide Access Manager 4.0

SECURITY DOCUMENT. BetterTranslationTechnology

IBM Digital Experience meets IBM WebSphere Commerce

Secure Identity Propagation Using WS- Trust, SAML2, and WS-Security 12 Apr 2011 IBM Impact

Reverse Proxy Scenarios for Single Sign-On

Google Search Appliance Google Search for Your Business

multiple placeholders bound to one definition, 158 page approval not match author/editor rights, 157 problems with, 156 troubleshooting,

Google Search Appliance

Security IIS Service Lesson 6

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

Step-by-Step guide for SSO from MS Sharepoint 2010 to SAP EP 7.0x

Securing access to Citrix applications using Citrix Secure Gateway and SafeWord. PremierAccess. App Note. December 2001

Robert Honeyman Honeyman IT Consulting.

How To Manage A Plethora Of Identities In A Cloud System (Saas)

OBIEE 11g Scaleout & Clustering

Deploying RSA ClearTrust with the FirePass controller

Secure Web Gateway 11.5 Release Notes

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

Integrating VMware Horizon Workspace and VMware Horizon View TECHNICAL WHITE PAPER

Apigee Gateway Specifications

Load Balancing Microsoft Sharepoint 2010 Load Balancing Microsoft Sharepoint Deployment Guide

Secure remote access to your applications and data. Secure Application Access

Ameritas Single Sign-On (SSO) and Enterprise SAML Standard. Architectural Implementation, Patterns and Usage Guidelines

User-ID Best Practices

CA SiteMinder. Implementation Guide. r12.0 SP2

Critical Issues with Lotus Notes and Domino 8.5 Password Authentication, Security and Management

Integrating WebSphere Portal V8.0 with Business Process Manager V8.0

Application Delivery Controller (ADC) Implementation Load Balancing Microsoft SharePoint Servers Solution Guide

Identity Management in Liferay Overview and Best Practices. Liferay Portal 6.0 EE

Copyright 2014 Jaspersoft Corporation. All rights reserved. Printed in the U.S.A. Jaspersoft, the Jaspersoft

Cisco Application Networking for IBM WebSphere

Access Gateway Guide Access Manager 4.0 SP1

Alliance Key Manager A Solution Brief for Technical Implementers

Acknowledgments. p. 55

Web Express Logon Reference

CA Performance Center

Ensuring the security of your mobile business intelligence

Oracle Collaboration Suite

Introductions. Christopher Cognetta Practice Manager Client Field Engineering Microsoft Dynamics CRM MVP

How to Configure Captive Portal

Google Search Appliance Google Search for your organization

How To Use Netiq Access Manager (Netiq) On A Pc Or Mac Or Macbook Or Macode (For Pc Or Ipad) On Your Computer Or Ipa (For Mac) On An Ip

Sametime 9 Meetings deployment Open Mic July 23rd 2014

WebSphere DataPower SOA Appliances

VMware Identity Manager Connector Installation and Configuration

Integrating SharePoint Sites within WebSphere Portal

ADFS Integration Guidelines

McAfee Web Gateway 7.4.1

Portals and Hosted Files

Managed File Transfer

Front Office Server 3.0

Spring Security 3. rpafktl Pen source. intruders with this easy to follow practical guide. Secure your web applications against malicious

owncloud Architecture Overview

msuite5 & mdesign Installation Prerequisites

Improve your mobile application security with IBM Worklight

ArcGIS for Server Reference Implementations. An ArcGIS Server s architecture tour

JMP105 JumpStart: Single Sign-on (SAML) Administration Basics

SharePoint 2010 Interview Questions-Architect

VMware vcenter Log Insight Getting Started Guide

Deploying the BIG-IP System with Oracle E-Business Suite 11i

2 Downloading Access Manager 3.1 SP4 IR1

Creating a Strong Security Infrastructure for Exposing JBoss Services

PingFederate. Salesforce Connector. Quick Connection Guide. Version 4.1

Reverse Proxy for Trusted Web Environments > White Paper

INTEGRATION GUIDE. DIGIPASS Authentication for VMware Horizon Workspace

Integrating the F5 BigIP with Blackboard

Securely Managing and Exposing Web Services & Applications

OIX IDAP Alpha Project - Technical Findings

White Paper. McAfee Cloud Single Sign On Reviewer s Guide

Introduction to IBM Worklight Mobile Platform

INTEGRATION GUIDE. IDENTIKEY Federation Server for Juniper SSL-VPN

Ensuring the security of your mobile business intelligence

This chapter describes how to use the Junos Pulse Secure Access Service in a SAML single sign-on deployment. It includes the following sections:

User Identification (User-ID) Tips and Best Practices

The Bomgar Appliance in the Network

Networking and High Availability

Flexible Identity Federation

Web Services Security: OpenSSO and Access Management for SOA. Sang Shin Java Technology Evangelist Sun Microsystems, Inc. javapassion.

A Java proxy for MS SQL Server Reporting Services

IBM WebSphere Application Server

Single Sign-on (SSO) technologies for the Domino Web Server

Setup Guide Access Manager 3.2 SP3

External authentication with Astaro AG Astaro Security Gateway UTM appliances Authenticating Users Using SecurAccess Server by SecurEnvoy

VMware vcenter Log Insight Getting Started Guide

Server Deployment and Configuration. Qlik Sense 1.1 Copyright QlikTech International AB. All rights reserved.

Securing ArcGIS Server Services: First Steps

IBM C Exam Name: IBM Sametime 9.0 Administration. Product: Demo

Installation Guide Access Manager 4.0 SP2

Transcription:

Integrating the Google Search Appliance with WebSphere Portal and Lotus Web Content Management Dave Hay Portal and Collaboration Architect IBM Software Services for Lotus (ISSL) david_hay@uk.ibm.com +44 7802 918423

About Me With IBM since 1992 Experienced with hardware, software and now services AS/400 and iseries Network Station WebSphere and Lotus software Linux advocate Collaboration evangelist Infrastructure Architect With ISSL since 2009

Introduction The Project Major UK financial institution Internal and external websites Content held in Lotus Web Content Management Existing intranet and internet sites using Google Search Appliance IBM team and Google partner engaged Solution adoption programme

Requirements To To To To To To To deliver access to unsecured AND secured content maintain security of content within search results present content in context via search results deliver personalized results with variance and relevance integrate with WebSphere Portal maintain access to existing search facilities perform in line with non-functional requirements

Lotus Web Content Management Role-based content management system Built upon WebSphere Portal Workflow-driven authoring, approval and publishing process Content accessible via portlets, standalone websites, API, feeds etc. etc. etc. Content stored in standards-based Java Content Repository (JCR) database

Google Search Appliance Search in a box Self-contained appliance Only requires power and data Different models for different requirements Client uses GB-7007 Can index 10,000,000 content items / documents GSAs can be scaled to meet non-functional requirements

Challenges Preserve existing search functionality Integrate with client's custom security solution Need to maintain segregation - GSA should never interact with WCM directly WCM supports standard Seedlist format GSA supports Google Feeds format User experience what and where

Terminology Crawling the process that the GSA goes through to build its onbox search index ( known as the default collection ) Serving the GSA provides search request form and search results to users Searching the process that the users go through Collections provide views into the default collection based upon URL patterns Front-Ends defines the user experience IN and OUT of GSA XSLT Extensible Stylesheet Language Transformations, used to drive the user experience

Seedlists and Feeds Google Feeds is the format that the GSA uses when crawling, and what our solution needed to produce WCM automatically produces a Seedlist, albeit on-demand Seedlist can also be scheduled and, perhaps, persisted Question about where seedlist would be persisted e.g. file sysem, database Both are XML structures What are the differences? IBM Seedlist format has features that Google Feeds doesn't offer: Pre-filtering by user groups stored in meta-data in the index Post-filtering at run-time Pagination useful for large content stores Embedded seedlists ( seedlists within seedlists ) Incremental indexing ( what has changed since the last crawl ) Long-term objective is for standardization around the Seedlist format

The Solution IBM team developed Crawling Proxy (CP) solution CP is based upon an established Google pattern, so not First Of A Kind (FOAK) CP is a standard JEE application deployed onto WebSphere Application Server 6.1 CP acts as broker between GSA and WCM GSA never connects to WCM direct CP can be scaled across clustered WebSphere environment to meet non-functional requirements

System Context Diagram CWS Administrator Core Web Security Content Authoring Server Content Author WCM Database Portal Administrator Database Administrator Web Server Portal/Content Delivery Cluster End User End User Portal Databases Google Search Appliance Existing Content (Insite) GSA Administrator Insite Administrator Admin Flow User data Flow Security flow

Crawling Process GSA makes a crawl request to Crawling Proxy via a specific URL CP requests Seedlist from WCM CP generates Jump Page HTML page of links, paged as needed GSA crawls Jump Page requesting each URL from CP CP returns content and meta-data to GSA Injected into GSA using Google Feeds format

Crawling Process Jump Page

Crawling Process - Feeds

Delivering Secured Search Content is secured in WCM using user groups Crawling proxy injects groups into GSA as meta-data via Feed process GSA needs to get the user groups to perform search across ACLsecured content in index How does the GSA know the identity and groups of the user? GSA can use LDAP, but client doesn't use it with a custom authentication mechanism used instead

The Cookie Cracker Like the Crawling Proxy, this is another pattern that GSA supports Cookie Cracker is used to decrypt and validate user's security token Then returns user ID and groups to GSA GSA can then perform search across ACL-secured content in index Also need a Redirect URL to force user to authenticate if anonymous or expired session

Serving Process User initiates search request Either by accessing GSA directly or via portal User indicates whether secured or unsecured search is required If unsecured, then GSA searches as usual If secured, GSA redirects user request to Cookie Cracker If no valid token, GSA redirects user request to Redirect URL to force logon Once valid token, Cookie Cracker returns user ID and groups to GSA GSA performs search across ACL-secured content

The Multiple GSA Scenario May be needed for performance and/or resilience Multiple patterns including Active/Active Crawl, Active/Active Search, Active/Passive Crawl etc. Option to use mirroring to keep passive GSA in sync with active GSA Crawling Proxy needs to be designed to know which GSA is making a request Crawling Proxy also needs to persist timestamp of last Seedlist request 2222 IBM Corporation

GSA Security GSA can use security mechanisms such as NTLM and FormBased Authentication to control crawler access We chose to use NTLM GSA also supports solutions such as Kerberos and SAML for client authentication essential for secure serving We chose to use Cookie Cracking We also needed to consider other aspects: Using HTTPS to encrypt access from GSA to Crawling Proxy Using IP whitelist and network ACLs to control access to GSA ports such as Feeds and Admin Using HTTPS to encrypt data being fed into the Feed port Using on-box user accounts ( administrator, manager ) rather than LDAP

End-user Experience Options to deliver UX from portal -or- from GSA GSA experience driven by front-end Front-end provides search request and search results Option to have multiple front-ends; each with different theme/style Front-ends delivered using Extensible Stylesheet Language Transformations (XSLT) Re-use existing styles e.g. CSS files, icons, logos etc.

Examples of UX

Component Design

Skills Client had previous experience with GSA Needed to acquire additional GSA administration experience Crawling Proxy, Cookie Cracker and Redirect URL applications realized in JEE XSLT skills needed to customize front-ends GSA has on-box front-end tooling XSLT expertise needed to modify over and above

Project Lifecycle Conduct requirements gathering exercise We started with a baseline requirement for secured search in Portal Equates to an agile project; we knew where we wanted to get to, but the way-points on the journey changed along the way Work with Google partner to understand art of the possible Patterns such as Crawling Proxy and Cookie Cracking came this way Identify dependancies Need GSA 6.8 software level to support content-level ACLs Needed additional fix for SSL support Develop and functionally test, iteratively Plan for non-functional testing, to build capacity model Using Crawling Proxy against WCM was a known unknown Plan to upgrade production GSAs to 6.8 Plan for administrator and developer training

The Future Client plans to make this Search Solution a standard part of all future Portal/WCM deployments This includes internal AND external web sites Option to re-use all/part of solution ( esp. Crawling Proxy ) for Collaboration project with Lotus Connections Extend solution to offer Personalization ( variance and relevance ) using meta-data Consider scheduling Seedlist generation, and caching across clusters Look at options to standardize XSLT across organization Consider search on mobile devices e.g. ipad, Android

Lessons Learned Need complete set of skills Portal/WCM GSA XSLT Security infrastructure Networking Project spans infrastructure, application and security disciplines Decide on UX as soon as possible Focus on requirements, requirements, requirements

Any questions?

How to contact me Lotus Sametime 07802 918423 Lotus Notes