Enterprise Content Management (ECM) Taxonomy AIIM ECM Certificate programme ECM ECM ECM Case Study Strategy Practitioner Specialist 2 1
ECM Practitioner Course Outline Foundations Tools & Instruments 1. Introduction 4. Create & Capture 7. Security & Control 10. Delivery & Presentation 2. Technologies & Functionality 5. Metadata 8. Process & Automation Futures 11. Trends & Directions 3. Information Architecture 6. Taxonomy 9. Findability 3 Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 4 2
Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 5 Defining taxonomy (1) Taxonomy is the science of classifying information A taxonomy is a law for classifying information Taxonomies are nearly ubiquitous, but poorly understood Source: Dictionary.com 6 3
Defining taxonomy (2) In recent years, the business world has fallen in love with the term taxonomies. We use it specifically to refer to a hierarchical arrangement of categories within the user interface of a website or intranet. Source: Information Architecture for the World Wide Web (Louis Rosenfeld and Peter Morville, 2002) 7 AIIM website 1 2 3 4 8 4
Understanding taxonomies A taxonomy is a classification scheme Such as the way that an individual classifies the content of their e-mail inbox, a personal CD collection, or the contents on an ipod A taxonomy is a knowledge map Reflects how it s owner conceives a given body of content (a knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information A taxonomy is semantic Indicating the relationships between concepts, such as the relationships between a car and a steering wheel, in that the steering wheel is a part of a car Source: Organising Knowledge (Patrick Lambe, 2007) 9 Category perspectives Business function Geo-political Company focus vs. industry focus Product or service Business issues, conditions, events Type/Source of content 10 5
Representations of taxonomies (1) Lists Trees Hierarchies Polyhierarchies Matrices Facets System Maps Source: Organising Knowledge (Patrick Lambe, 2007) 11 Representations of taxonomies (2) Lists Simple collection of related things. The relationship is defined by the purpose of the list. Good when domain is simple, amount of content is small. Basic building blocks of all other taxonomical representations Examples: Country codes, types of diseases Source: Organising Knowledge (Patrick Lambe, 2007) Source: Wikipedia 12 6
Representations of taxonomies (3) Trees Represents a transition from general to more specific relationships or whole to part. Good when a list gets to be too long, and naturally breaks into subcategories. Examples: Yellow pages (phone directories) Source: Organising Knowledge (Patrick Lambe, 2007) Source: CoreFiling.com 13 Representations of taxonomies (4) Hierarchies A specific tree structure that has inclusiveness, consistency, and maintains the same type of relationship at each level. The child inherits all of the characteristics of the parent and each child can only belong in one place in the taxonomy Works best with mature, formal, logical schemes Examples: Military rank, Biological, Family Genealogy Source: Organising Knowledge (Patrick Lambe, 2007) 14 7
Representations of taxonomies (5) Polyhierarchies Used when an item belongs in more than one place in the real world, and multiple organising principles are required. Provides virtual linking between hierarchies. Example: a single collection of content concerning diseases can be organised/taxonomised via affected body part and causes Source: Organising Knowledge (Patrick Lambe, 2007) 15 Source: Rosenfeld, Morville (2006) Representations of taxonomies (6) Matrices Provides a 2 or 3-dimensional cross linking of taxonomies, and an ability to provide differing views into the same body of content. Example: The same content could be located based on project manager, project initiation, and/or affected standards Source: Organising Knowledge (Patrick Lambe, 2007) 16 8
Representations of taxonomies (6) Facets Amultidimensional multi-dimensional taxonomy comprised of multiple tags, each tag representing an individual taxonomy, thus the content is categorised in multiple ways, within a single interface. Example: selecting wines based on characteristics such as type, price, varietals, regions, appellations, and price. Source: Organising Knowledge (Patrick Lambe, 2007) Source: wine.com 17 Representations of taxonomies (7) System maps Visual representations of a domain of knowledge Labelled representing taxonomy categories Example: A collection of medical content relating to the human nervous system is accessible via a diagram of the human body. Each component of that system is illustrated in context, and labelled appropriately. Source: Organising Knowledge (Patrick Lambe, 2007) 18 9
Defining classification Classification: The systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods and procedural rules represented in a classification system Source: ISO 15489 19 What is classification? In simple terms, it s just grouping information together Common examples of classification: Cars by make, model, performance Food tinned/fresh, type (meat, vegetable, grain) TV programmes comedy, thriller, quiz show Clothes adult/child, expensive/cheap, winter/summer 20 10
Dewey Decimal system Used to classify information throughout the western world Very Euro-centric Dewey Decimal system 000 General & Bibliography 100 Philosophy & Psychology 200 Religion 300 Social Science 400 Languages & Linguistics 500 Sciences 600 Technology 800 Literature 900 Geography & History 21 Chinese library classification 43,600 categories. Constantly expanding to meet the needs of a rapidly changing nation Political considerations drive some organisation 1) Marxism, Leninism, Maoism & Deng Xiaoping Theory 2) Philosophy and Religion 3) Social Sciences 4) Politics and Law 5) Military Science 6) Economics 7) Culture, Science, Education, and Sports 8) Languages and Linguistics 9) Literature 10) Art 11) History and Geography 12) Natural Science 13) Mathematics, Physics and Chemistry 14) Astronomy and Geoscience 15) Life Sciences 16) Medicine and Health Sciences 17) Agricultural Sciences 18) Industrial Technology 19) Transportation 20) Aviation and Aerospace 21) Environmental Science 22 11
US Library of Congress Used to categorise books published in the United States Expanded categories emphasise USA-specific history and interests A) General Works B) Philosophy, Psychology, Religion C) History: Auxiliary Sciences D) History: General and Old World E) History: United States F) History: Western Hemisphere G) Geography, Anthropology, Recreation H) Social Science I) Political Science J) Law K) Education 23 L) Music M) Fine Arts N) Literature & Languages O) Science P) Medicine Q) Agriculture R) Technology S) Military Science T) Naval Science U) Bibliography & Library Science What are classification schemes? A classification scheme Is the structure an organisation uses for organising, accessing/retrieving, storing and managing its information Can be used to classify records A Business Classification Scheme (BCS) is a classification scheme based on an organisation s business functions and activities These are predominately used for Records Management purposes 24 12
Classification schemes: Types Deployment Generally preferred Principles of class sification Keyword / thesaurus-based Functional Subject / thematic Organisational Hierarchical / tree style 25 Hierarchical / tree style BCSs: Key CLASS C FILE F RECORD R DOCUMENT D 26 13
Schematic example: Hierarchical / tree BCS C C C C C C C C C C C C C C C C C F F F F F F F F 27 Populated example: Hierarchical / tree BCS Innovation, Knowledge Transfer and Technical Infrastructure (super function) Innovation (function) Knowledge Transfer (function) Technical Infrastructure (function) Standards and Accreditation (sub function) Policy Management (activity) Infrastructure Support (activity) National Measurement System (sub function) Policy Management (activity) Civil Space Activity (sub function) Space Regulation (activity) 28 Sub Function Function Super Function Sub Function Function Sub Function Activity Activity Activity Activity 14
Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 29 Toward subject-based classification It s often valuable to create multiple classifications Users: Intended audience Content: Inherent subject matter Context: Temporal, organisational or political drivers User-understood terms are critical Especially important for e-commerce People search Google for cheap flights 75x more than low fares (Source: Gerry McGovern) Who are the users? Scientists? Consumers? Context matters Why this user with this content? Source: Louis Rosenfeld LLC 30 15
Taxonomies in context Source: Yahoo! 31 Hierarchies as implicit semantics Divides information space into categories & subcategories, relating broader & narrower concepts via parent-child relationship Generic = Class-species: Species B (crow) is a member of Class A (Bird) & inherits characteristics of its parent} Whole-Part = B is a part of A (i.e., Index Finger is part of Hand) Instance = B is an instance of A (i.e., Indian Ocean is an Ocean) A B 32 16
Differing views Simple truth: People see (and label!) the world differently Sand trap, or bunker? 33 Personal taxonomy Personal classification of information E-mail folders -- most common manifestation Can improve relevance and findability to an individual Some approaches enable personal classification in addition to authorised taxonomy Gmail and some other systems employ faceted classification as well From enterprise perspective, personal taxonomies can be quite problematic No interoperability, linguistic istic chaos Impossible to establish enterprise-wide standards and vocabularies When combined with peers, can become a folksonomy 34 17
Folksonomy Collaborative tagging of content with minimal controls Relevance between metadata and content may be determined by users in a democratic fashion Clusters emerge and communities typically self-organise around them ( Wisdom of the crowd ) Typically arise in Web-based communities where individuals id share content, t then create and use tags Best used when there is a critical mass of taggers Can be a useful bottom-up approach to developing taxonomies 35 Folksonomy example Source: flickr.com 36 18
What is an ontology? Explicit specification or conceptualisation of a domain Often subsume thesauri, but employ richer semantic relationships among terms and attributes Apply rigid rules specifying terms and relationships Do more than just control vocabulary; are a knowledge representation Semantic technologies are typically centered around ontologies An ontology for salad would contain the structure for how it relates to everything, from ingredients to growers to the rodents that might eat it, and how a salad is different in Japan vs. Italy 37 Why develop an ontology? To improve knowledge sharing and reuse, and make software more adaptable to an environment Share common understanding of the structure of information among people or software agents Enable reuse of domain knowledge Make domain assumptions explicit Separate domain knowledge from operational knowledge Analyse domain knowledge Source: http://www.alphaworks.ibm.com/contentnr/introsemantics 38 19
The challenge of meaning Meaning is a hard problem for machines and humans alike Same term can have multiple meanings Multiple terms can have the same meaning Ultimately meaning is contextual Dublin Core designed to disambiguate at a fundamental level E.g., distinguishes definitively among Creator and Contributor, and Publisher But in the wild, it is much harder to achieve semantic agreement 39 Controlled vocabularies Supporting tools based on collections of terms used to tag, track and describe content For example, users may wish to organise content according to business sector geographical location product type organisation type policy topic Allow content to be described using only 'official terms' 40 20
Controlled vocabularies: Types Simple lists Lists of terms allowed to be used to describe an information resource Synonym rings A 'ring' of connected terms, all treated as equivalent for searching Synonym rings can be used to link acronyms, variant spellings or scientific / popular terms Thesaurus Hierarchical arrangement of broader and narrower meanings 41 Simple lists and synonym rings Simple list of bovine diseases Anaplasmosis Babesiosis Bovine spongiform encephalopathy (BSE) Cysticercosis Synonym ring for a BSE Bovine spongiform encephalopathy BSE Mad cows disease Prion disease 42 21
Thesaurus A networked collection of controlled vocabulary terms, using associative relationships Used to manage and identify the relationships among and between terms E.g. Equal to, Related to, Opposite of Some examples from a hypothetical domain Lettuce = Frisée (a.k.a, a synonym ring ) Lettuce is a narrower type of Greens Coriander is related to Cilantro; but they are not equal Useful to reconcile different lexicons across business units or functional groups 43 Sample thesaurus 44 22
Ontologies and taxonomies and thesauri How does this relate to Taxonomies and Thesauri? We have all agreed to call this thing lettuce. Lettuce is a vegetable. There is a much larger potential pool of semantic information that a taxonomy may or may not contain: Lettuce grows in the ground. Rabbits are a hazard to lettuce growers. Tomatoes and cucumbers are often eaten with lettuce, and the three of these things together make what is called a salad. But, a salad is not only defined by the collection of these three things in Japan, a mixture of seaweed and sesame seeds is a salad. In the Midwestern United States, a collection of Jell-O and radishes is called a salad, and there is no lettuce involved. 45 Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 46 23
Benefits of classifying records (1) Providing linkages between individual records which accumulate to provide a continuous record of activity Ensuring records are named in a consistent manner over time Assisting in the retrieval of all records relating to a particular function or activity Determining security protection and access appropriate p for sets of records Allocating user permissions for access to, or action on, particular groups of records 47 Benefits of classifying records (2) Distributing responsibility for management of particular sets of records Distributing records for action Determining appropriate retention periods and disposition actions for records 48 24
Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 49 Standards and guidelines ISO 15489 - the international standard for records management MoReq2 - the Model Requirements for the Management Of Electronic Records DIRKS - the Design and Implementation of Record- Keeping Systems methodology ISO 2788 - Guidelines for the Establishment of Monolingual Thesauri 50 25
Agenda Defining taxonomies and classification Subject-based classification Taxonomies Folksonomies Ontologies Thesaurus and Semantic networks Business case for classification Standards and guidelines Classification challenges 51 Classification challenges (1) Laborious and difficult to develop Tendency to over-analyse May need more than one Classification of content into a categorisation scheme is ongoing work 52 26
Classification challenges (2) Categories need ongoing care and feeding (including thesauri, taxonomies, controlled vocabularies and ontologies) Content changes, Context changes Vocabularies change Experience may breed new perspectives 53 What you have learned How to leverage classification in general and taxonomies in particular as part of an ECM strategy Different approaches to subject-based organisation schemes: Taxonomies Thesauri Semantic networks Ontologies Folksonomies Managing classification challenges 54 27