VIVO Dashboard A Drupal-based tool for harvesting and executing sophisticated queries against data from a VIVO instance! Paul Albert, Miles Worthington and Don Carpenter
Chapter I: The Problem
Administrators are avid consumers of institutional data.
In 2011, the Dean s Office requested the following reports List of publications in which a Weill Cornell author was first or last author List of publications in which a Weill Cornell author was first or last author, appearing in journals of impact factor > 15
This is a fairly reasonable request compared to others
Chapter II: Legacy Approach
Why this approach is poor Asynchronous, can t sort or facet, and requires a lot of sweat
Sample SPARQL query SELECT distinct?article1_pmid?person1_cwid? Authorship1_authorRank! WHERE{?Article1 rdf:type bibo:document.?article1 vivo:informationresourceinauthorship?authorship1.?article1 bibo:pmid?article1_pmid.?authorship1 rdf:type vivo:authorship.?authorship1 vivo:authorrank?authorship1_authorrank.?authorship1 vivo:linkedauthor?person1.?person1 rdf:type foaf:person.?person1 wcmc:cwid?person1_cwid. }!
Chapter III: Let s create a prototype
Goal of VIVO Dashboard Empower untrained users to run sophisticated semantic queries on Weill Cornell faculty publications * Secondary directive: kill Sarah Connor
Publications Graph List Export Date 2009 - Present The following publications are for all publications by active Weill Cornell Medical College faculty as represented in VIVO. 100 75 Publication Type Research Article (657)! In Process (55)! Review (45)! Clinical Guideline (32)! more... Journal ranking 15.4-68.3 50 Journal Name 25 Author Name
Invention is 1% inspiration and 99% perspiration.! Thomas A. Edison Source: Yahoo Answers
Chapter IV: Technology Stack of next version
VIVO Dashboard is an installed profile based on Drupal
Why Drupal? Familiar platform - Used at many institutions - Developers are familiar with the technology - Easy to host Existing solutions - Many modules available to solve common use cases - Same application could be built with any other web platform, but Drupal saves a huge amount of development effort.
VIVO Dashboard leverages existing Drupal modules Faceted search - Search API module - Facet API module Data import - Feeds module - Linked data import module * All of which are actively maintained and supported
Key Module: Linked Data Import Feeds plugin for linked data as a data source Uses open source library called ARC2 for requesting and parsing RDF Authored by Miles Worthington Originally created for Cornell's CALS Research & Impact site
Key Module: Feeds Offers a generic import system for Drupal Handles scheduling Maps various data sources (RSS, CSV, SQL) into Drupal content (nodes, taxonomy terms) Manages custom data sources via plugins
Other technology Stores content using the robust indexing application, Apache Solr AJAX Key modules - Apache Solr - Elysia cron - D3.js (visualization library) - Charts and graphs - VIVO Dashboard Core, Publications, and Import (custom module)
Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration Demonstration
Chapter V: Import process
Get a list of all publication URIs from VIVO
Go through list of URIs, request RDF for each
Request necessary related data for each article
Map VIVO RDF to Drupal structures
Repeat for all publication URIs
Chapter VI: How to install VIVO Dashboard
I am not a smart programmer, but even I can set up a VIVO Dashboard Me
Code and instructions github.com/paulalbert1/vivodashboard
Install Drush locally and create a repository Install Drush on local machine by following the instructions at: https://github.com/drush-ops/drush Run following command: drush make https://raw.github.com/paulalbert1/ vivodashboard/master/distro.make vivo-dashboard
Create a new site in Pantheon Also easily done in Acquia
Put the Pantheon site in SFTP mode
Using SFTP connection info on previous screen, copy local code to Pantheon.
Wipe database
Visit development site
Install profile (resulting screen)
You are now taken to VIVO Dashboard s home page.
Select a fetcher (VIVO Class Fetcher requires 1.7 if you have > 30,000 objects)
Setup the publications import
http://vivo.med.cornell.edu http://purl.org/ontology/bibo/document Enter site URL and top level article class
Define URIs of custom classes
Hide certain types
Journal ranking data can be imported from Scimago Journal Rank (Pearson correlation between Impact Factor and ScImago Journal ranking c. 2006 is 0.915)
Chapter VII: Final Thoughts
Advantages with this approach Leverages standard VIVO features No SPARQL endpoint required No authentication necessary Configured through an admin UI All but major ontology changes require code changes
You can use VIVO Dashboard s data harvesting approach to create new apps to do other kinds of data visualization and analysis
Disadvantages with this approach Takes a long time Drupal/PHP not designed for long-running jobs
Future Work Make VIVO Dashboard VIVO-ISF 1.6 compatible
Data dashboards tend to elicit a Highlander-type response among administrators