714: Metadata Encoding Records in XML: RDF, METS Margaret E.I. Kipp - kipp@uwm.edu https://pantherfile.uwm.edu/kipp/public/courses/714
Encoding Metadata in XML metadata encoding covers the process of encoding the metadata records in an encoding scheme such as XML metadata may be stored in two principle forms: internal storage or external storage internal storage: metadata embedded in the object itself (e.g. metadata in an HTML header) external storage: metadata is stored separately (a surrogate record, many digital libraries)
Internal vs External Storage Advantages to Internal Storage metadata is always with the item so there is no need to add the extra surrogate record and a link between them Advantages to External Storage no need to modify item to accept metadata record allows for situation where item is not available electronically
Expressing Metadata in HTML/XML HTML embedded in HTML document uses <meta name="name" content="content"> and <link rel="property" href="uri"> tags may also use lang attribute XML uses XML schema to define namespace i.e. element names (HTML already defined) XML files are ideal for storage in databases DBs also designed with E-R models
Expressing Metadata in XML/RDF RDF (Resource Description Framework) is a standard designed for encoding web metadata for the semantic web RDF uses the URI (often a URL) as the mechanism for identifying objects objects may be URIs or constant values (i.e. the date, time or language) RDF also provides information about and relationships between web resources and real world concepts such as people, places, concepts, etc.
RDF Model every statement can be structured as a triplet consisting of a subject, predicate and object information on the web has no obvious structure to a computer, but structured statements can be encoded in RDF making them machine readable e.g. could indicate the subject, object and verb in a sentence In "Alice lives in Florida" a computer would not know that Alice is a person who lives in Florida (a US State)
Structured Statements Alice (person) [subject] -- lives in [verb] -- Florida (US State) [object] In this simple encoding format: round brackets () indicate information about a term, clarifications square brackets indicate the part of speech in a grammatical sense.
RDF Example Wisconsin (subject)--has the postal abbreviation (predicate)--wi (object) <?xml version="1.0"?> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdfsyntax-ns#" xmlns:terms="http://purl.org/dc/terms/"> <rdf:description rdf:about="urn:xstates:wisconsin"> <terms:alternative>wi</terms:alternative> </rdf:description> </rdf:rdf>
RDA/DC in RDF Example <?xml version="1.0"?> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.0/" xmlns:dcq="http://purl.org/dc/qualifiers/1.0/"> <rdf:description about="155192398x (pb)"> <dc:title>harry Potter and the philosopher's stone /</dc:title> <dc:creator>rowling, J. K.</dc:creator> <dc:format>223 p. : 20 cm.</dc:format> <dc:publisher>raincoast</dc:publisher> <dc:publisher>vancouver, BC</dc:publisher> <dc:date>2000</dc:date> <dc:identifier>155192398x (pb)</dc:identifier> <dc:identifier>9781551923987 (pb)</dc:identifier> <dc:language>eng</dc:language> <dc:subject><rdf:description><dcq:subjectqualifier>namepersonal</dcq:subjectqualifier><rdf:value >Potter, Harry (Fictitious character)--juvenile fiction.</rdf:value></rdf:description></dc:subject> <dc:subject> <rdf:description> <dcq:subjectqualifier>topical</dcq:subjectqualifier> <rdf:value>hogwarts School of Witchcraft and wizardry (Fictitious place)--juvenile fiction.</rdf:value> </rdf:description> </dc:subject> <dc:subject> <rdf:description> <dcq:subjectqualifier>topical</dcq:subjectqualifier> <rdf:value>wizards--juvenile fiction.</rdf:value> </rdf:description> </dc:subject> <dc:relation>harry Potter ; bk. 2</dc:relation> <dc:type>text</dc:type> </rdf:description> </rdf:rdf>
Combining Metadata Descriptions modules/chunks of metadata records can be combined into a single structure for transmission to other systems METS - Metadata Encoding and Transmission Standard provides a framework for incorporating components from various metadata schemes within one structure METS can package descriptive, administrative and structural metadata into one XML document for exchange with other repositories
METS METS is an XML Schema which expresses: 1) the hierarchical structure of digital library objects 2) the names and locations of the files that constitute those objects 3) all associated metadata (Zheng and Qin 2008, p. 200) each part of the metadata record may be another record, to which METS record points http://www.loc.gov/standards/mets/
METS header (req) descriptive, administrative metadata file section structural map (required) structural links METS Records behaviour http://www.dlib.org/dlib/june06/zeng/06zeng.html
METS Examples <mets:mets><mets:dmdsec ID="MODS1"> <mets:mdwrap MDTYPE="MODS"> <mets:xmldata> <mods:mods version="3.3"> <mods:titleinfo> <mods:title>great conversations: the pianists</mods:title> </mods:titleinfo>... </mets:xmldata></mets:mdwrap> </mets:dmdsec></mets:mets> http://www.loc.gov/standards/mets/mets-examples.html
Multiple Schemas in a Namespace Common practice to add elements to DC via another schema. e.g.: <record xmlns="http://example.org/learningapp/" xmlns:xsi="http://www.w3.org/2001/xmlschemainstance" xsi:schemalocation="http://example.org/learningapp/ http://example.org/learningapp/schema.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ims="http://www.imsglobal.org/xsd/imsmd_v1p2" > http://dublincore.org/documents/dc-xml-guidelines/
Multiple Schemas using RDF <rdf:rdf xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:js="http://js.org/meta/"> <rdf:description about="http://js.org/doc/1"> <dc:title>metadata sharing and XML</dc:title> <dc:creator>john Smith</dc:creator> <js:rating>3</js:rating> </rdf:description> </rdf:rdf> http://www.ukoln.ac.uk/interop-focus/gpg/metadata/
Multiple Schemas using METS METS can be used to store multiple metadata schemas for the same object e.g. LC METS record contains MODS and MARCXML http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.00011/full.html http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.00011/mets.xml
Parallel Metadata used for handling multilingual materials in a digital collection two approaches: inline parallel metadata metadata record includes multilingual terms external parallel metadata multiple metadata records for each language e.g. <dc:subject scheme="lcsh" xml:lang="en">united States History</dc:subject> <dc:subject scheme="rvm" xml:lang="fr">histoire des Etats-Unis</dc:subject>
In-Class Exercise: Record Creation Create an RDF record for the course textbook or create a METS record containing two different record formats (you can use records you have previously created).
XML Schema Definitions
DTD (Document Type Definition) original style for defining the elements in an XML schema defines: elements if repeatable if required not written in XML DTD Tutorial: http://www.w3schools.com/dtd/default.asp
Example External DTD <!ELEMENT books (book+)> <!ELEMENT book (authors,title)> <!ELEMENT authors (author+)> # + means >1 <!ELEMENT author (#PCDATA)> <!ELEMENT title (#PCDATA)> specifies a books object which can contain multiple book objects each book has at least one author and a title
Example Internal DTD <?xml version="1.0"?> <!DOCTYPE books [ <!ELEMENT books (book+)> <!ELEMENT book (title, authors)> <!ELEMENT authors (author+)> <!ELEMENT author (#PCDATA)> <!ELEMENT title (#PCDATA)> ]> <books> <book><title>metadata</title> <authors><author>zeng</author><author>qin</author></authors></book> </books>
Components of a DTD All XML documents (including HTML and XHTML) are made up of the following elements: Elements - the elements named in your schema Attributes - these refine the elements (e.g. the href attribute in the <a> or anchor tag for URLs Entities - special characters e.g. PCDATA - character data which will be parsed for special characters or markup CDATA - character data
Declaring Elements in a DTD Declaring an element: <!ELEMENT element-name (#PCDATA)> e.g. <!ELEMENT title (#PCDATA)> Elements which contain other elements: <!ELEMENT element-name category> or <!ELEMENT element-name (element-content)> e.g. <!ELEMENT book (title, author)> # elements must be declared in the order listed here, this element has two subelements
Declaring Elements in a DTD 2 Number of occurrences of element: Once: <!ELEMENT book (author)> One or more: <!ELEMENT book (author+)> Zero or more: <!ELEMENT book (author*)> Zero or one: <!ELEMENT book (author?)> Choice between two elements: <!ELEMENT book (title,author,publisher,(isbn url))> can contain either isbn or url Tutorial: http://www.w3schools.com/dtd/dtd_elements.asp
Declaring Attributes in a DTD <!ATTLIST element-name attribute-name attribute-type default-value> DTD example: <!ATTLIST unit type CDATA "metric"> XML example: <unit type="metric" /> You can also specify a list of values as the attribute-type DTD example: <!ATTLIST unit type (metric, imperial) "metric"> Instead of specifying a default-value you can also specify #REQUIRED, #IMPLIED (optional) or #FIXED (value is fixed) http://www.w3schools.com/dtd/dtd_attributes.asp
Choosing Elements or Attributes XML does not enforce the choice between elements and attributes normally, attributes should be used for data which is specific to a single metadata element (for example, information about the language of the summary) data which refers to the entire object being described would go in an element
Example Elements or Attributes Using attributes: <book author="marcia Zeng" title="metadata" /> Using subelements: <book> <author>marcia Zeng</author> <title>metadata</title> </book> Using subsubelements: <book> <author> <firstname>marcia</fir stname> <lastname>zeng</last name> </author> </book>
Defining Entities Entities are special characters or special information, in programming languages these are called constants (e.g. ) e.g. you could define the base URL for your site and insert it using an entity, then if this changes you only need to update one thing Declarations: Internal declaration: <!ENTITY entity-name "entity-value"> External declaration: <!ENTITY entity-name SYSTEM "URI/URL"> Example declaration: <!ENTITY baseurl " http://www.example.com/"> XML Example: <url>&baseurl;</url>
Verifying XML and DTDs The following URL contains a set of validators for determining if your XML is correct It will also allow you to validate a DTD http://www.w3schools.com/xml/xml_validator.asp More DTD example: http://www.w3schools.com/dtd/dtd_examples.asp
XML Schemas (XSD) a language for defining XML elements and structures "the semantic and structural definition of metadata elements and the relationships between the elements" [Zeng and Qin, 131] the structure and elements in an XML document can be defined by a DTD (Document Type Definition) or an XSD (XML Schema) XSD is itself encoded in XML
Simple DTD: Book <!ELEMENT books (book+)> <!ELEMENT book (author,title)> <!ELEMENT author (#PCDATA)> <!ELEMENT title (#PCDATA)> specifies a books object which can contain multiple book objects (+) each book has an author and title
Simple XSD: Book <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/xmlschema"> <xs:element name="books"> <xs:complextype><xs:sequence> <xs:element ref="book" maxoccurs="unbounded"/> </xs:sequence></xs:complextype></xs:element> <xs:element name="book"> <xs:complextype><xs:sequence> <xs:element name="author" type="xs:string"/> <xs:element name="title" type="xs:string"/> </xs:sequence></xs:complextype> </xs:element></xs:schema>
XML Schema Elements tag: <xs:element> defines an XML element <xs:element name="[element name]" type="[type]"/> element name: the name of your element type: from a list, use xs:string unless you must have a specific data type (e.g. number, datestamp, boolean (true/false)) e.g. <xs:element name="author" type="xs:string"/>
Elements: Mandatory, Optional, Repeatable attributes maxoccurs and minoccurs can be used to specify how often an element occurs, they can take values of 0 (optional), unbounded (repeatable) or a number specifying exact number of repeats e.g.: repeatable: <xs:element name="[element]" type="[type]" maxoccurs="unbounded"/> optional: <xs:element name="[element]" type="[type]" minoccurs="0"/> only once: <xs:element name="[element]" type="[type]"/>
Attributes elements can have attributes <xs:attribute name="[attribute]" type="[type]"/> optional by default, can also take a use attribute to specify that it is required or optional <xs:attribute name="[attribute]" type="[type]" use="required"/>
An Element with Attributes <xs:complextype name="date"> <xs:simplecontent> <xs:extension base="xs:string"> <xs:attribute name="format" type="xs:string"/> <xs:attribute name="land" type="xs:string"/> </xs:extension> </xs:simplecontent> </xs:complextype> we are defining a new type here, an element with attributes, based on xs:string e.g. <date format="yyyy-mm-dd" lang="en">2009-09- 30</date>
Sequences of Elements <xs:sequence> <xs:choice minoccurs="0" maxoccurs="unbounded" > <xs:element ref="title"/> <xs:element ref="creator"/> </xs:choice> </xs:sequence> from DC xs:sequence specifies a list of elements to include xs:choice specifies that you can choose which to use
XML Schemas Simple DC XML Schema http://dublincore.org/schemas/xmls/simpledc20021212.xsd no required elements and no required schemes http://www.ukoln.ac.uk/metadata/dcmi/dc-xml-guidelines/ http://dublincore.org/schemas/xmls/ EAD XML Schema http://www.loc.gov/ead/ead.xsd Markup Languages: http://en.wikipedia.org/wiki/list_of_xml_markup_languages
XML Schema Examples and Tutorials http://www.codalogic.com/lmx/xsd-overview.html a short introduction to XML Schema with examples http://www.w3schools.com/schema/default.asp tutorials for XML and XML Schema
Toybrary: Final Element List Form a small group. Select one of the toys to test catalogue. Based on the test cataloguing, we will discuss the following: elements to keep required/optional/repeatable etc. elements descriptions of elements suggested value standards/controlled vocabularies for elements
In Class Exercise: Designing an XML Schema Create a simple DTD or XML Schema to describe a schema for toys using our current application profile/data dictionary.