Session 3: Leveraging Taxonomy Term Store for SharePoint: Defining a Multi-Taxonomy Structure for Content Management
Welcome Don Miller VP Business Development donm@conceptsearching.com Jill Hannemann Principal Consultant jhannemann@ppc.com 2
About Concept Searching Company founded in 2002 Product launched in 2003 Focus on management of structured and unstructured information Technology Delivered as a web service Automatic concept identification, content tagging, autoclassification, taxonomy management Only statistical vendor that can extract conceptual metadata 2009 and 2010 100 Companies that Matter in KM (KM World Magazine) and Trend Setting product of 2009 and 2010 Authority to Operate Enterprise wide USAF and Enterprise wide NETCON US Army Locations: US, UK, & South Africa Client base: Fortune 500/1000 organizations Managed Partner under Microsoft global ISV Program - go to partner for Microsoft for auto-classification and taxonomy management Microsoft Enterprise Search ISV, FAST Partner Product Suite: conceptsearch, concepttaxonomymanager, conceptclassifier, conceptclassifier for SharePoint, contenttypeupdater for SharePoint 3
About PPC Energy/Environment Green strategies for government and industry: Air quality and climate change Greenhouse gas reduction Carbon management Environmental risk mitigation Environmental impacts of transport Information and data management Infrastructure Systems Engineering and Technical Assistance (SETA) Capability Maturity Model Integration (CMMI) Earned Value Management Configuration Management Technical and Advisory Support Independent Verification & Validation (IV&V) 1,200-person multi-disciplinary team of scientific & technical experts Scientific subject matter experts Systems engineers and architects Policy and regulatory specialists Project management professionals Certified Information technology experts Security professionals Enterprise Solutions Master Data Management and Data Governance Business Intelligence Adaptive Data Warehousing Enterprise Architecture Infrastructure Systems Engineering Knowledge Management Portal Solutions Enterprise Content Management IT Optimization/Virtualization 4
Agenda About Managed Metadata Service Designing Taxonomy and Metadata for Term Sets Using Keywords as Folksonomy Basic Governance Principles 5
About Managed Metadata Service 6
About Managed Metadata Service Managed Metadata Service Metadata taxonomies and terms can be shared across multiple SharePoint site collections Multiple manage metadata services can be created Enables search filtering Two types of terms: Managed terms pre-defined by an enterprise administrator and may be hierarchical. Surfaced in the "managed metadata" column type Managed keywords non-hierarchical words or phrases that have been added to SharePoint 2010 items by users (folksonomy) 7
Understanding SharePoint 2010 Elements SharePoint 2010 Element Site Collection/Site Structure Document Library Structure Columns Term Term Set Managed Metadata Keywords Content Types Comments Can be organized by a hierarchical taxonomy structure Can be organized by a hierarchical taxonomy structure Where terms are applied to content in Document Libraries and Lists A metadata value Hierarchical metadata with values SP 2010 s ability to manage terms and term sets outside of columns Allows to add metadata from Term Sets or create new keyword Ability to manage metadata associated with particular types of content 8
About Taxonomy Term Store Term sets Behave as metadata facets Design a multi-taxonomy structure A facet/term set can have a hierarchical structure beneath it, following the behavior of a taxonomy Keyword term set flat structure of enterprise keywords, can be leveraged as folksonomy How to categorize content in a way that is intuitive for users? 9
Taxonomy and Metadata for Term Sets 10
Faceted Browsing and Searching Faceted classification Assignment of multiple classifications to an object, enabling the classifications to be ordered in multiple ways, rather than in a single, pre-determined, taxonomic order Faceted Browsing and Searching Uses faceted classification to allow users to explore and find information Examples: Products by price ranges, brands, and vendors Corporate documents by department and location 11
Faceted Browsing and Searching Find computers by category, weight, screen size, etc. 12
Term Set Structure Group Term Set Parent Term 1 Child Term A Child Term B Parent Term 2 Business Unit Document Types Proposal Document Pink Team Red Team Resume Project Description Forms Presentations 13
Editing Term Sets and Terms Parent Term Group Term Set Child Term 14
Editing Term Sets 15
Initial Planning: Discovering All Your Groups Determine Key Term Sets Define the Term Sets for document libraries Think about audience, business needs, content types Define the values/terms for each metadata field/term set 16
Categorization Schemas Method Definition Examples Subject-oriented Functional Organizational Information categorized by subject or topic Instantive - each child category is an instance of the parent category Partitive - each child category is a part of the parent category Information categorized by the process to which it relates Information categorized by corporate departments or business entities water pollution, soil pollution, air pollution employment, staffing, training Human Resources, Marketing, Accounting, Research Document Type Information categorized by the type of document presentations, expense reports, press releases Location Product/Customer Information categorized by the location where it originated or was conceived Information categorized by the product or customer it was developed for US State, Office locations Electronics > TVs, DVD Players, Computers 17
Using Keywords as Folksonomy 18
Keywords as Folksonomy Folk + taxonomy is a user-generated taxonomy. Folksonomy Collaborative Social = Social Indexing Tagging = Classification = = Social Tagging The practice of collaboratively creating and managing tags to categorize content As opposed to traditional indexing, metadata is generated not only by experts but also by creators and consumers of the content Usually, freely chosen keywords are used instead of a controlled vocabulary Tag Clouds are used to visualize tags of a folksonomy 19
Faceted Classification vs. Folksonomy Faceted Classification Folksonomy It requires someone to make a decision about which facets to record in the term store and, often, which values will be permitted. The information in each of the term sets can be organized into a hierarchy. Folksonomies are properties of social tagging systems in which individuals apply "tags usually without much control or coordination. No mechanism to indicate hierarchical relationships among tags. 20
Defining Governance 21
Taxonomy Governance Term set evolution Creating a new Group Term set creation/deletion Term creation/deletion/modification Governance committee representing Term Steering Committee Content Managers Content Contributors Ensure that the classification distribution and meaning of the Groups and Term Sets is not compromised Considering the volume of content that will be tagged using a Term Set schema, changes to terms should be formally presented as a business case and reviewed by the Steering Committee 22
Four Pillars of Governance In the Governance Plan, it is important to define and establish: 1. A vision 2. Roles and responsibilities 3. Policies and procedures 4. Communications, education, and marketing 23
Governance Structure Roles and Responsibilities Most Common Roles Roles Expectations Steering Committee Taxonomy Team Content Managers Content Contributors Strategic Tactical, controls Term Store, applies changes and modifications Content management, tag application, subject matter expertise Adding content to system 24
Policies and Procedures Procedures All primary actions related to the Term Store and the application of terms to content in libraries/lists should be driven by policies and procedures Guidelines will empower Content Managers and Taxonomy Team members to manage the Term Store appropriately Intuitive procedures will ensure the system continues to be developed consistently Simple rules and workflows will reduce bottlenecks Determine: Who will have access to Term Store? Who can apply tags to content in libraries? Who can create keywords when tagging content? 25
Policies and Procedures Procedures Basic workflow approval processes should be outlined for all aspects of the Term Store governance 26
Keys to Governance Loose vs. Tight 27
In Closing 28
Key Issues with Information Management Common issues related to managing content in SharePoint: Localized terminology Complex data sets Consistency of content tags Multiple site collections Manual subjective content tagging 29
Best Practices Define your Use Case Understand how and why you will be using taxonomy and metadata Start Small Select a business unit to begin classification within SharePoint Manage Scope Keep your Audience in Mind Recognize that users may think about and look for information in different ways Define Governance Roles, responsibilities, policies, and procedures Control Depth and Breadth A flat taxonomy ensures that users can find information quickly A focused taxonomy ensures that users can easily digest the scope of information Make a Long-Term Investment Taxonomy development is an iterative and on-going effort 30
Agenda About conceptclassifier Product Screen Shots Demo 31
About conceptclassifier 32
Common Issues and Solutions Pre Migration Search Records Management Data Privacy Protection Problem 60% of stored documents are obsolete 50% of documents are duplicates Requires resources to identify what should/not be migrated It s not about better search Less than 50% of content is correctly indexed, meta tagged or efficiently searchable 85% of relevant documents are never retrieved in search 67% of data loss in Records Management is due to end user error It costs and organization $180 per document to recreate it when it is not tagged correctly and cannot be found Average cost per exposed record is $197 and ranges from $90-$305 per record 70% of breaches are due to a mistake or malicious intent by an organization s own staff Solution Eliminate duplicate documents Identify privacy data exposures Identify and declare records that were not previously identified Identify high value content Migrating required content to a structure Eliminate manual tagging & replace with automatic identification of multi-word concepts Provide guided navigation via the taxonomy structure (i.e. concepts) Go beyond dynamic clustering with conceptual clustering based on the taxonomies Eliminate inconsistent end user tagging Automatically declare documents of record based on vocabulary and retention codes Automatically change the Content Type and route to the Records Management repository Identify any type of organizationally defined privacy data Combines pattern matching with associated vocabulary Automatic Content Type updating enabling workflows and rights management Benefit Reduces migration costs Ensures compliance and protection of content assets Taxonomy navigation is 36% - 48% faster Savings 2.5 hours per user per day Savings of $4.00 - $7.04 per record by eliminating manual tagging Ensures compliance and reduces potential litigation exposures Average cost runs from $225K to $35M 33
SharePoint 2010 Taxonomy & EMM Basics How does ConceptClassifier and TaxonomyManager fit into SharePoint 2010? Lets review the basics again, what is out of the box for taxonomy management and how does it work. 34
Definitions Introducing EMM, The Term Store and Term Store Management Definitions conceptclassifier for SharePoint 2010 SharePoint 2010 Enterprise Managed Metadata Service Term Store Management Auto Classification Content Type Updating Subscription Service Content Type Hub Term Store SharePoint 2010 Farm Site Collection Records Library 35
The Managed Metadata Service Enterprise Managed Metadata Service 30,000 Terms per Term Set (1 Taxonomy) 1,000 Term Sets Tested to 1,000,000 Preferred Terms Managed Metadata Service Manages Enterprise Content Types via the Content Type Hub Manages Term Store Term Sets (taxonomies) and terms can be shared across multiple SharePoint site collections Multiple manage metadata services can be created Enables search filtering Two types of terms: Managed terms pre-defined by an enterprise administrator and may be hierarchical. Surfaced in the "managed metadata" column type Managed keywords nonhierarchical words or phrases that have been added to SharePoint 2010 items by users (folksonomy) 36
The Managed Metadata Service conceptclassifier for SharePoint is the only native Term Store Management tool for 2010. Term Set Parent Term Child Term Grand Child Term Build term sets/taxonomies here in SharePoint 2010 EMM. Plan for 30,000 values. A content type can contain one or many taxonomies based on specific business user requirement. The values can be shown as columns or can be hidden from users for administrative or governance purposes only. 37
The Managed Metadata Service Traditional manual approach is subjective, cumbersome, and overwhelming. End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical. 38
conceptclassifier for SharePoint 2010 An automated solution for applying metadata and providing term store management to enhance SharePoint 2010 capabilities for Records Management, Governance Policies, Rights Management, Sensitive Information Removal, and Findability. Taxonomy Manager can also be used to accelerate Taxonomy creation. 39
Manual Metadata Approach A Manual Metadata Approach Will Fail 95%+ Of The Time Issue Organizational Impact Inconsistent Subjective Cumbersome - Expensive Malicious Compliance No perceived value for end user What have you seen Less than 50% of content is correctly indexed, meta-tagged or efficiently searchable rendering it unusable to the organization (IDC) Highly trained Information Specialists will agree on meta tags between 33% - 50% of the time (C. Cleverdon) Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers) End users select first value in list (Perspectives on Metadata, Sarah Courier) What s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non-conformance to policies Metadata will continue to be a problem due to inconsistent human behavior The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage knowledge assets in alignment with the unique corporate knowledge infrastructure. 40
Automated Metadata Approach conceptclassifier for SharePoint 2010 provides an automated metadata approach for an immediate ROI and drives business value Create enterprise automated metadata framework/model Average return on investment minimum of 38% and runs as high as 600% (IDC) Apply consistent meaningful metadata to enterprise content Incorrect meta tags costs an organization $2,500 per user per year in addition potential costs for noncompliance (IDC) Guide users to relevant content with taxonomy navigation Savings of $8,965 per year per user based on an $80K salary (Chen & Dumais) 100% Recall of content, 35% Faster access to content Precision Use automatic conceptual metadata generation to improve Records Management Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers) Improve compliance processes, eliminate potential privacy exposures 6. Life Cycle Management 5. Records Management and PII 1. Model and Validate 4. Business Processes 2. Automate Tagging 3. Findability 41
Native Integration conceptclassifier provides a native integration into Term Store Native integration into Term Store No custom property types Why do we work with native term store natively? Easy Upgrade No Service Pack Updates, no custom code. conceptclassifier is a native integration. Every item is synchronized with term store and is a part of managed metadata service. All search features work natively as they should. No custom search property values which require custom code updates and additional custom search controls. conceptclassifier is a native integration. Because it is the natural place that you should store metadata if you are driving economies of scale by leveraging Microsoft stack. That is Microsoft s road map for metadata management. If you want to go back to a pure manual application, there is no code rewrite. conceptclassifier is a native integration. You just unplug and you are back to native. 42
Automated Multi-Word Term Suggestions for Term Store Concept Searching s unique statistical concept identification underpins all technologies. Multi word suggestion is explicitly more valuable than single term suggestion algorithms. Concept Searching provides Automatic Concept Term Extraction Triple Baseball Three Heart Organ Center Bypass Highway Avoid conceptclassifier will generate conceptual metadata by extracting multi-word terms that identifies triple heart bypass as a concept as opposed to single keywords. Metadata can be used by any search engine index or any application/process that uses metadata. 43
Immediate Value conceptclassifier for SharePoint 2010 drives immediate value for end users for Search, Records Management, and Sensitive Information Removal conceptclassifier for SharePoint 2010 Automatically applies Metadata Automatically Applies Content Types Auto Applies Retention Code Policies Automatically applies Windows Rights Management Policies Automatic Term Boosting for FAST Pulls hierarchy directly from Term Store; therefore, updates are immediate and accurate for guided taxonomy navigation in FAST Can also classify File Shares/other repositories 44
Enterprise Taxonomy Management Enterprise Taxonomy Management and Auto-classification Multi User Distributed Branch and Term Support for Enterprise Native Term Store Integration for SharePoint 2010 Accelerate building out taxonomies by 75% with automatic Term/Clue Suggestion Enables the ability for information architects to build, model, and validate Automatic Term Boosting for FAST/Search Platforms Pragmatic Ontology Features for subject matter experts Broad to Narrow Preferred Term Non preferred terms Poly hierarchies not supported in Term Store Relations not supported in Term Store 45
conceptclassifier for FAST Search Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results Enables import of FAST Entities into the conceptclassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature Runs natively as a FAST Pipeline Stage eliminating integration and customization issues Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies Improves faceted search results as facets are based on concepts aligned with the taxonomy Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s) Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching Removes documents from search results that are confidential/sensitive through automatic Content Type updating and routing to secure server Automatically tags content with both vocabulary and retention codes and respects SharePoint security that could prevent access to the document once it has been declared a record 46
Product Screen Shots 47
Manual Metadata Approach Traditional manual approach is subjective, cumbersome, and ineffective End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical. 48
Automated Metadata Approach An automated approach ensures accurate Records Management, Sensitive Information Removal, and improved Search/Findability Metadata is automatically applied to content by ConceptClassifier via TaxonomyManager. contenttypeupdater can take it a step further and modify content type to redirect document/object to a different content type or migrate it to another site collection or document library. In this example, the documents are being changed from document content type to PII or Records Center Content Type. 49
Term Store Management Term Store Management is provided by Taxonomy Manager and conceptclassifier Deep capabilities to build out rules classification approaches including: standard term, phonetics, metadata, class ID, language, case sensitive, regular expression, and boosting. TaxonomyManager is an intuitive and elegant to tool to manage how and when term sets are applied within SharePoint 2010 and what new terms to add to the term store. 50
Automated Metadata Approach An automated approach ensures accurate Records Management, Sensitive Information Removal, and improved Search/Findability The documents with 10 in front of them have had their content types updated. In this example, the documents are being changed from document content type to PII or Records Center Content Type. They could have also been moved to a different folder if that was the desired outcome. 51
Intuitive Guided Navigation conceptclassifier for FAST and SharePoint 2010 Search conceptclassifier for 2010 Product Suite provides intuitive guided navigation for FAST Multi value select with in a term set is the single fastest approach you can provide for end users to get access to the correct content. It is just like picking values when you are on Best Buy or Amazon but it is with your personalized corporate term set vocabulary. 52
Demo How to automate the process of applying metadata in a SharePoint 2010 native term store environment to improve Findability and Records Management 53
Contact Don Miller VP Business Development donm@conceptsearching.com Jill Hannemann Principal of Information Management jhannemann@ppc.com 54
Upcoming Webinars in the Series SharePoint Governance: Managing Content Sprawl September 14th - 11:30am-12:30pm EST Once deployed within your company, SharePoint's popularity has the potential to become viral. This session will focus on how to apply a governance strategy against the SharePoint sites and objects, and how best to manage user expectations for leveraging SharePoint within your company. We also look at how using Concept Searching s Concept Classifier for SharePoint you might automate much of the process designed to deliver a consistent user experience at retrieval time using taxonomy and automatic content tagging. Furthermore we explore using the tool to apply your Governance strategy to identify and lock down sensitive information such as PII from being published on uncontrolled portals. De-mystifying Content Types: Four Key Content Types to Leverage October 19th - 11:30am-12:30pm EST Content types are a powerful feature of SharePoint 2010 and are largely under-utilized. Learn more about content types, what they can do and how to implement them across your SharePoint environment. PPC will also share four key content types to implement that span multiple industries. We also review Concept Searching s Content Type Updater, an automatic content tagging solution that can apply content types based upon vocabulary and metadata. The solution, fully integrated with SharePoint 2010 and the Term Store can then workflow specific types of content based upon policy and guidelines addressing such business issues as preservation and disposition, risk, and Governance. 55