Collaborative Document Review System for Community Engagement



Similar documents
Sisense. Product Highlights.

DiskPulse DISK CHANGE MONITOR

multiple placeholders bound to one definition, 158 page approval not match author/editor rights, 157 problems with, 156 troubleshooting,

Manage Workflows. Workflows and Workflow Actions

Category: Business Process and Integration Solution for Small Business and the Enterprise

CatDV Pro Workgroup Serve r

Portals and Hosted Files

Advanced Web Development SCOPE OF WEB DEVELOPMENT INDUSTRY

CommonSpot Content Server Version 6.2 Release Notes

Developing ASP.NET MVC 4 Web Applications

latest Release 0.2.6

<Insert Picture Here> Oracle SQL Developer 3.0: Overview and New Features

Up and Running with LabVIEW Web Services

LoadRunner and Performance Center v11.52 Technical Awareness Webinar Training

Cache Configuration Reference

Software Development Kit

PROJECT REPORT OF BUILDING COURSE MANAGEMENT SYSTEM BY DJANGO FRAMEWORK

OpenText Information Hub (ihub) 3.1 and 3.1.1

GUI and Web Programming

Visualizing a Neo4j Graph Database with KeyLines

InfoView User s Guide. BusinessObjects Enterprise XI Release 2

DreamFactory & Modus Create Case Study

SelectSurvey.NET User Manual

Programming Fundamentals of Web Applications Course 10958A; 5 Days

Vector HelpDesk - Administrator s Guide

ORACLE APPLICATION EXPRESS 5.0

Terms and Definitions for CMS Administrators, Architects, and Developers

An introduction to creating JSF applications in Rational Application Developer Version 8.0

Developing ASP.NET MVC 4 Web Applications MOC 20486

HP LoadRunner. Software Version: Ajax TruClient Tips & Tricks

Cloud Administration Guide for Service Cloud. August 2015 E

Glyma Deployment Instructions

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Hypercosm. Studio.

The Django web development framework for the Python-aware

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

<Insert Picture Here> Oracle Web Cache 11g Overview

Simple Tips to Improve Drupal Performance: No Coding Required. By Erik Webb, Senior Technical Consultant, Acquia

WIRIS quizzes web services Getting started with PHP and Java

Performance in the Infragistics WebDataGrid for Microsoft ASP.NET AJAX. Contents. Performance and User Experience... 2

Analytics Configuration Reference

Tutorial: Building a Dojo Application using IBM Rational Application Developer Loan Payment Calculator

A Tool for Evaluation and Optimization of Web Application Performance

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Shipbeat Magento Module. Installation and user guide

Elgg 1.8 Social Networking

Modern Web Application Framework Python, SQL Alchemy, Jinja2 & Flask

General principles and architecture of Adlib and Adlib API. Petra Otten Manager Customer Support

Jet Data Manager 2012 User Guide

Developing ASP.NET MVC 4 Web Applications Course 20486A; 5 Days, Instructor-led

System Requirement Specification for A Distributed Desktop Search and Document Sharing Tool for Local Area Networks

EECS 398 Project 2: Classic Web Vulnerabilities

Drupal CMS for marketing sites

5 Mistakes to Avoid on Your Drupal Website

VIRTUAL LABORATORY: MULTI-STYLE CODE EDITOR

XMailer Reference Guide

Configuring the JEvents Component

Framework as a master tool in modern web development

Chapter 15: Forms. User Guide. 1 P a g e

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Course Information Course Number: IWT 1229 Course Name: Web Development and Design Foundation

COURSE SYLLABUS EDG 6931: Designing Integrated Media Environments 2 Educational Technology Program University of Florida

Site Configuration Mobile Entrée 4

Web Application Guidelines

Pattern Insight Clone Detection

Project 2: Web Security Pitfalls

Integrity Checking and Monitoring of Files on the CASTOR Disk Servers

Oracle Application Development Framework Overview

LifeSize UVC Access Deployment Guide

The following multiple-choice post-course assessment will evaluate your knowledge of the skills and concepts taught in Internet Business Associate.

Customization & Enhancement Guide. Table of Contents. Index Page. Using This Document

Your Blueprint websites Content Management System (CMS).

MARKETING MODULE OVERVIEW ENGINEERED FOR ENGAGEMENT

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Installing and Sending with DocuSign for NetSuite v2.2

Drupal Performance Tuning

Qlik REST Connector Installation and User Guide

Nintex Workflow 2013 Help

Web Frameworks. web development done right. Course of Web Technologies A.A. 2010/2011 Valerio Maggio, PhD Student Prof.

Page Editor Recommended Practices for Developers


Log Analyzer Reference

SiteCelerate white paper

Oracle Forms Services Secure Web.Show_Document() calls to Oracle Reports

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT

LifeSize UVC Video Center Deployment Guide

Microsoft Office System Tip Sheet

Jobs Guide Identity Manager February 10, 2012

Dynamic Web Programming BUILDING WEB APPLICATIONS USING ASP.NET, AJAX AND JAVASCRIPT

CS169.1x Lecture 5: SaaS Architecture and Introduction to Rails " Fall 2012"

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Web Development I & II*

High Level Design Distributed Network Traffic Controller

Building and Using Web Services With JDeveloper 11g

Transcription:

SEG-N-013-2011 Collaborative Document Review System for Community Engagement L.S. Chin March 2011 Abstract This report describes the design and implementation of a web-based collaborative document review and editing system used to support community driven proposals. The application was written in Python using the Django Web Framework which allowed for rapid development and deployment. Keywords: Collaboration tool, community engagement, web application, django Email: shawn.chin@stfc.ac.uk Reports can be obtained from http://www.sesp.cse.clrc.ac.uk Software Engineering Group Computational Science & Engineering Department STFC Rutherford Appleton Laboratory Harwell Science and Innovation Campus Didcot, OX11 0QX

c Science and Technology Facilities Council Enquires about the copyright, reproduction and requests for additional copies of this report should be address to: Library and Information Services STFC Rutherford Appleton Laboratory Harwell Science and Innovation Campus Didcot, OX11 0QX Tel: +44 (0)1235 445384 Fax: +44 (0)1235 446403 Email: library@rl.ac.uk STFC reports are available online at: http://epubs.stfc.ac.uk ISSN 1358-6254 Neither the Council nor the Laboratory accept any responsibility for loss or damage arising from the use of information contained in any of their reports or in any communication about their tests or investigations

Contents 1 Introduction 1 2 Brief Introduction to Django 1 2.1 Data model....................................... 1 2.2 Application Logic.................................... 2 2.3 User Interface (templates)............................... 3 3 The Collaborative Document Review Component 3 3.1 Requirements...................................... 3 3.2 Document Data Format................................ 4 3.3 Online Editor...................................... 4 3.4 Document Parsing and Decomposition........................ 4 3.5 Data Model....................................... 5 3.6 Contextual Comment UI................................ 6 4 Miscellaneous Issues 8 4.1 Spam Control...................................... 8 4.2 caching......................................... 9 4.2.1 Caching in Django............................... 9 4.2.2 Our Approach................................. 10 4.2.3 Caching database queries........................... 11 4.3 Site Statistics...................................... 12 4.4 Visitor Tracking.................................... 12 5 Conclusion 14 i

1 Introduction When EPSRC issued a call for Statement of Needs for new Collaborative Computational Projects (CCPs) [1], the Software Engineering Group deployed a website [2] designed to engage the Agent- Based Modelling (ABM) community in a bid to establish a CCP in ABM. The key component of the site is the collaborative document review system that allowed registered supporters to review the latest version of the Statement of Needs document and provide targeted feedback on individual page elements. While the bid was not successful, we view the website as a success with over 60 academicians and industry players registering their support for the proposal and over 30 statements of support published. Based on 46 contributions by supporters, seven major revisions were made to the Statement of Needs document before it was submitted to EPSRC. This report describes the design and implementation of the web application which was written in Python using the Django Web Framework [3]. Using such a framework enabled rapid development; the whole system was designed, developed and launched within weeks. The website itself is easily deployable and can be customised or extended to support future activities with similar customer engagement needs. In addition, the document review component has since been refactored as an isolated pluggable application for Django and published as open source [4]. 2 Brief Introduction to Django Django is high-level python web application framework originally developed for the fast-paced environment of a newsroom. It was designed to simplify and accelerate the creation of complex data-driven websites by automating and encapsulating many aspects of web projects such as database interactions, authentication, internationalisation, and administration. It emphasises clean design and reusability allowing standalone features such as forums, comment management, wiki, etc -- to be isolated as pluggable components that can be reused in other projects. Django follows the model-view-controller [5] architectural pattern where the data models, application logic and user interface are isolated allowing for independent development, testing and maintenance. 2.1 Data model Django comes with an Object-Relational Mapper (ORM) [6] which hides away the complexity of interacting with the database. The rich data-model syntax provides a simple way for developers to represent their model. The described model is automatically converted at run-time to a set of model-specific APIs for interacting with the database, providing a simple and consistent interface to the various databases support by Django 1. For example, to create a data model for storing articles and linking them to specific authors the following definition can be used: class Reporter ( models. Model ): 1 Django currently supports PostgreSQL, MySQL, sqlite3 and Oracle databases. 1

name = models. CharField ( max_ length =70) class Article ( models. Model ): content = models. TextField () headline = models. CharField ( max_ length =200) pub_ date = models. DateTimeField () reporter = models. ForeignKey ( Reporter ) When the application is first initialised the corresponding tables will be created in the assigned database. No SQL knowledge is required of the developer. In addition, a suite of APIs are generate at run time which allows users to interact with the data using standard python idioms. For example: # Get data for reporter -- M a l c o l m Raynold? r = Reporter. objects. get ( name exact =" Malcolm Raynold ") # print all headlines for articles written by that reporter for article in r. article_ set. all (): print article. headlines # Create a new article, assign it to that reporter a = Article ( content =" The article content ", headline =" short and sweet ", pub_date = datetime. now (), reporter =r) a. save () # write to database Once the data model is written, a data administration site can be automatically generated to allow data entry to begin even before the other components are ready. This interface can be customised and extended to form the administrative backend of the website. Figure 1: Recommended workflow for transforming source code 2.2 Application Logic The functionality of a web component is written as standard Python functions which are called when an associated URL is requested. Instead of the traditional method where each web address corresponds to a path within the file system, Django employs an elegant URL mapping scheme which uses Regular Expressions [7] to match web addresses to functions. For example: 2

# in the urls. py file urlpatterns = patterns ( y o u r _ a p p. v i e w s, (r"^ article /(?P<art_id >\d+)/ $", " show_article "), (r"^ reporter /(?P<rep_id >\d+)/ $", " articles_by_reporter "), ) When the web address http://yoursite.com/article/423/ is accessed, this matches the first pattern in the list resulting in the show article function being used for handling that request. The art id variable captured from the web address is passed on the function as an argument. The function itself would look like this: def show_ article ( request, art_id ): a = get_ object_ or_ 404 ( Article, id = art_id ) # get article with that id return render_to_response (" templates / show_article. html ", {" article ":a}) Note that at this level no HTML scripting is done. The functions simply queries the data model, performs any intermediate processing if required, then renders the user interface using a templating engine. 2.3 User Interface (templates) The actual web pages that end users see are rendered dynamically based on template files which can be designed independent of the application logic and data models. Template files are generally standard HTML files which some additional markups called template tags. Template tags allow data passed in during the rendering stage to be inserted into the page. I also supports operations such as loops, if/else conditions, filter operations to transform data, etc. Custom tags can also be easily written to further customised the way data can be processed and displayed. The templating engine supports template inheritance whereby one template can inherit the contents of a parent template, changing only what is necessary. In practice this feature allows the site layout to be defined in a parent template which all other templates use as a base. This allows for templates which are succinct and have minimal repetitions thus simplifying updates and maintenance. <!-- File : templates / show_article. html ( Extends base / site_layout. html ) --> {% extends " base / site_layout. html " %} <!-- we just need to define the contents section --> {% block " content " %} <h1 >{{ article. headlines }} </h1 > <div class =" byline ">By {{ article. reporter. full_name }} </ div > <div class =" article - body " >{{ article. content }} </ div > {% end block %} 3 The Collaborative Document Review Component 3.1 Requirements The collaborative document review system was designed with the following requirements in mind: The document can be easily modified and published by multiple authors 3

Users can provide feedback on specific elements in the document Authors can acknowledge comments posted by users and respond to the comments 3.2 Document Data Format Selecting a suitable document format was a key design decision as it affects the implementation complexity and impacts the workflow of document authors. Using a format supported by common document editors (such as *.doc or *.odt) makes it easy for authors to create and edit the document on their desktops, but would make online updating and parsing a nightmare. A plain text format on the other hand would be easy to parse and can be edited online or offline in a text editor, but at the cost of limiting formatting options. The format eventually chosen for the job was Markdown [8] which is essentially plain text with special formatting syntax. Text written in Markdown can be translated to HTML using a markdown engine. The advantages are: The formatting syntax are non-intrusive so the text file itself is very readable even before it is rendered to HTML Plain text can be easily edited using existing desktop applications or an online editor Markdown text are easily diff -able making it easy to compare one version to another Markdown generates clean HTML making it readily parseable using existing tools Django comes with support for Markdown (as well as other comparable formats such as Textile and rest ) [9] The following is an example of a text formatted using Markdown: This is a first level heading ----------------------------- A paragraph is one or more consecutive lines of text separated by one or more blank lines. Si * Bulleted lists can be created too * This is another item in the list * Yet another item 3.3 Online Editor For our online editor, the document is entered using Markdown in a standard text box. A preview of the parsed document is displayed next to the text box to give authors an idea of the final output. Figure 2 shows the document editor that was used in the website. It is also possible to include third-party components (such as Markitup [10]) to produce a much fancier online editor. For example, Figure 3 shows the editor that is used in the example project included in the open source release. 3.4 Document Parsing and Decomposition In order to assign comments to individual elements in the document, we need to parse the document and demarcate each element. 4

Figure 2: Online document editor with preview window Figure 3: A fancier online editor used in the open source version of doccomment Once the document is translated to HTML, we use the excellent BeautifulSoup [11] library to parse the document and split it into a list of top-level elements. The code achieve this is trivial. The following function takes in a string representation of the document and returns an array (or list in Python parlance) of HTML elements. The returned elements can be readily printed as a page snippet, or printed in turn to form the full page. from markdown import markdown from BeautifulSoup import BeautifulSoup, Tag def markdown_ to_ html_ elements ( text ): " Returns a list of HTML strings, encoded I unicode " soup = BeautifulSoup ( markdown ( text )) return [ unicode ( e) for e in soup. contents if type ( e) == Tag ] 3.5 Data Model Figure 4 shows a simplified diagram of the data model used in the application. For the sake of brevity and clarity, some fields have been renamed or left out. Document the Document model stores the representation of a document. It stores the title 5

Figure 4: Simplified data model for DocComment and document text as entered in the most recent update, as well as other metadata such as the dates of creation/modification, publication status, modification status, etc. DocumentVersion this model stores a snapshot of the document each time it is published. The title and document text are store for that version along with the publication date and version string (in the form of <major>.<minor>.<revision> ). To improve performance in exchange for larger storage requirements, a pre-rendered version of the document can also be stored. DocumentElement during the publication process, the rendered document is broken down into separate page elements and stored as a DocumentElement. This data is associated with a DocumentVersion entry and is used as a placeholder to link comments to specific elements within the document. A copy of the rendered text is stored so that it can be used to give context to a comment listing page. DocumentComment comments posted for each document element are stored using the DocumentComment model. This model stores the comment text and date, and is associated with a specific user and document element. The acknowledgement field is a Boolean field used to indicate whether the document author has read and acknowledged that comment. This field can be extended further to include different modes of acknowledgment such as accepted, will consider, or rejected. 3.6 Contextual Comment UI The design goal for the commenting user interface was one that is non-cluttering, intuitive, and practical. To achieve these goals, we drew inspiration from the Django Book website [12]. It uses a combination of AJAX and CSS to position the commenting interface unobtrusively as a sidebar that blends into the page and only shows up when the mouse cursor hovers over. We managed to achieve a similar effect using CSS knowledge gleaned from the Django book website, and using the jquery [13] library to simplify tasks that requires javascript and AJAX. 6

Figure 5: The commenting UI only shows up when the mouse hovers over a page element. The associated text is highlighted. Figure 6: Elements that are commented on will have a speech bubble showing the number of comments Figure 7: Clicking on the speech bubble will display the list of comments 7

4 Miscellaneous Issues 4.1 Spam Control To avoid getting swamped by irrelevant posts from automated spam engines, we decided to allow postings only from registered users. This meant that we could concentrate on gate-keeping the user registration process in order to limit spam on the site. Registration on the site is implemented as a two stage process: 1. Prospective users fill out a form along with their details and email address 2. An email is sent to the given address with an encoded key and instructions on how to activate their account This process ensures that only registrations with validated email addresses are allowed. The site administrators have access to a list of registration that failed; this information is used to manually activate applications (followed by an email) in the case of typos in their email addresses, or to improve the spam detection feature in the case of actual spam. A common approach to reduce spam posted in forms is to use a CAPTCHA [14] system to identify and filter out automated bots. However, we find this approach rather annoying and attempted a less obtrusive method. To detect automated bots that scour the internet in search for form-driven sites to terrorise, we made the following assumptions about bots: They do not run a full blown web browser and therefore may not have javascript support. They rely on standard field names to guess how a form should be filled. They sometimes harvest details of forms in advance and only attempt submissions later on. Conversely, they may respond immediately, possibly faster than a mortal can. Based on these assumptions, we implemented a combination of the following techniques: Honeypot the form contains hidden fields that should be left empty. Spam bots that fill in all fields in the form will get rejected. However, some bots may ignore hidden fields and therefore defeat this approach. Javascript check the form contains control fields that are initialised with wrong answers. When the page is loaded, javascript is used to fill in the correct answers and hide the fields from users. Bots without javascript capabilities will submit incorrect control data and get rejected. Users using browsers with javascript disabled will be presented with the control fields, so questions have to be chosen such that it can be easily answered. Encoded data every time a form is rendered it is accompanied by an encoded string that contains information such as the session key, requesters IP address, and timestamp. The encoded data is submitted along with the form, and the decoded can be used for validation. Invalid or incomplete data: form has been tempered with Timestamp: used to detect submissions that took too long or too soon 8

Session Key and IP: used to detect submissions from a different session or machine Rejected form submissions are accompanied by a suitable error message and error code. Should the problem persist, users are encouraged to report the problem along with the error code. The error code can be encoded with additional information to help us tweak settings to prevent future false positives. A possible future enhancement would be to fall back to a CAPTCHA system on repeated failures. This would allow users to proceed in the event of false positives. Note that the techniques above would only be effective against bots targeting generic forms; each of the mechanism above can be trivially defeated if targeted specifically. 4.2 caching One issue with dynamically generated web pages is the overhead involved in producing a page for each user request - the request needs to be processed, the database queried, and the response rendered from templates. At high load, this process will require large amounts of resources in terms of processing power, memory, and I/O bandwidth (disk and network). High-traffic sites employ various methods to alleviate the problem, one of which is caching. Content that changes infrequently or used repeatedly can be cached such that it is processed only once and reused for multiple requests. Caching can be done at multiple levels views, page snippets, data objects and one can use different cache replacement strategies depending on the site requirements. In the realms of highly scalable web infrastructure this gets a lot more complicated with techniques like replication, partitioning and fault tolerance coming into play. Cache implementation at that level is of course way beyond our extremely modest requirements. Considering the amount of traffic we anticipate, one can say that our attempts at employing a caching strategy is merely out of academic interest. Even so, one benefit we gained from this effort is that our data-driven site runs efficient off a simple SQLite [15] database. In addition, since data is stored in a single file with no need for a database server, tasks such as backups and deployment are greatly simplified. 4.2.1 Caching in Django Django comes with a built-in caching framework [16] which provides different levels of cache granularity. Per-site cache Every page is cached unless specified otherwise or if the page has GET or POST parameters. Each page is cached for a specific duration before it is refreshed. Pre-view cache Individual views can be marked for caching. The cache duration will be specified and can be different for each view. Template fragment caching A cache tag can be used to demarcate sections of the template that should be cached. Cache duration and cache key is specified within the tag. Cache API A set of methods which provide low-level access to caching functionality such as the assignment, retrieval and deletion of cache entries. It allows one to designing their own caching strategy. 9

Figure 8: Django Debug Toolbar output before caching Figure 9: We can also browse details of each database query The caching mechanism can be configured to use various back ends for storage. This includes databases, memcached [17] servers, in-memory, or as files within a specific directory. Django also provides the facility to insert headers to provide instructions to upstream caches, i.e. web caches that sit between your website and the end user. These can include reverse proxies that can be placed in front of a web server to handle web requests, web proxies and caches hosted by ISPs or internet gateway, and even caches in the users browser. 4.2.2 Our Approach To avoid premature optimisation, caching is applied incrementally just before the application is ready for deployment. We use the Django Debug Toolbar [18] to profile each page and based on the output determine if caching is required. The toolbar is displayed as an overlay on the existing page provides information such as the time taken to render the page, the number of database requests involved, full listings of database queries and HTTP headers sent, etc. For example, in Figure 8 we see the toolbar displaying the stats for the document display page. 10

Figure 10: The same page, with fragments of the template cached It show a total rendering time of over 1100ms (688ms CPU time) with 30ms used to make 50 database queries. This was obviously a very expensive page to render and a prime candidate for caching. In Figure 10, we see the same page after template caching was implemented. Segments of the page that relied on database content were cached in rendered form and the cache is only invalidated when the database entries are updated. Once the cache is primed, it only takes a fraction of the original time to produce (60ms) with only 4 database queries required. 4.2.3 Caching database queries Apart from caching template fragments, we can also implement caching at the model manager 2 [19] level. This enables the application code to perform database queries in the usual manner which caching performed automatically in the background. A manager that performs caching would be look like this: from django. db. models import Manager from django. core. cache import cache ONE_ WEEK = 60 * 60 * 24 * 7 # in seconds class DocumentManager ( Manager ): def published ( self ): cache_ key = " document_ published " object_list = cache. get ( cache_key ) if object_ list is None : # not in cache # perform actual database query, then cache it object_list = self. get_query_set (). filter ( published = True ) cache. set ( cache_key, object_list, ONE_ WEEK ) return object_ list Assigning a manager to a model is trivial: 2 A manager is the interface through which database query operations are provided to Django models. 11

# definition of Document model class Document ( models. Model ): objects = DocumentManager () # assign manager #... model fields... Once the manager is assigned to the Document model, data query can be done as such: # retrieve all documents that are published from models import Document docs = Document. objects. published () The first time that code is called, the cache is still empty so a database query is performed. Subsequent calls will retrieve the data only from the cache and no database access is required. We can ensure that the returned data is valid by clearing that cache entry whenever the associated tables in the database changes. This can be automated using the signal dispatcher [20] in Django to trigger a cache invalidation function whenever the model data is updated. # sample code to delete a series of cache items whenever model data is updated def invalidate_ document_ cache ( sender, instance, signal, * args, ** kwargs ): for key in (" document_published ", " any_other_keys ", ): cache. delete ( key ) # run function whenever a document is created / updated / deleted from django. db. models. signals import post_ save post_ save. connect ( invalidate_ document_ cache, sender = Document ) 4.3 Site Statistics To allow supporters (and proposal reviewers) to track the amount of activity and interest the site generates, we publish values such as the number of registered supporters, volume of activity, and visitor stats. These values are easy to produce but require access to the database and external resources; they are therefore cached for a short period to reduce overheads while still giving reasonably fresh values. 4.4 Visitor Tracking To handle the capturing and reporting of visitor statistics we opted for the Google Analytics [21] service instead of rolling our own. While keeping a simple tally of the number of page views is trivial, there is a lot more to visitor tracking than meets the eye. Among other things, a useful tracking system should provide: Data filtering to ignore hits from web crawlers as well as from specific hosts (such as that of the website administrator and developers) so as not to skew the statistics. Activity tracking determine how long each visitor stays at each page, which pages they visited, how they stumbled upon our page (Referred by another site? From a search engine? Search keyword used?), whether they are a new or returning visitor, and so on. Visitor details more information about the visitor such as their location and ISP/institution (determined from IP address), the type of operation system and browser used, etc. 12

Figure 11: Site stats page on the ABM CCP site Figure 12: Overview page within the Google Analytics site 13

The Google Analytics service includes a very rich reporting facility which provide a lot more information than we need, as well as a set of APIs [22] which we can use to import specific values for displaying on our site. 5 Conclusion The collaborative document review system as it currently stands is still quite basic with only the bare requirements implemented. Even so, it was enough to convince us of its potential. We believe that such a system will be useful in many situations and it should be developed further to provide a richer environment for future users. The Open Source version is a first step towards achieving that goal. It was written from scratch based on the strengths and pitfalls identified in the project discussed in this document. All the basic functionality is now in place and it is ready for integration into any Django project. That is just the beginning. Many other features have been earmarked for inclusion, including: Revision control More advanced access control and permissions Optional integration with other commenting systems, such as Disqus [23] For those that may find a collaborative document system useful but are not in a position to develop their own Django site, we should consider setting up a hosting site. Registered users on the site would be able to publish their documents for public (or member-only) viewing. Such a system would naturally require some effort to develop and maintain, and can only be attempted if there were sufficient interest (and funding!). 14

References [1] Collaborative Computational Projects, [Online] http://www.ccp.ac.uk/ [2] CSE Software Engineering, STFC, Agent-Based Modelling CCP - Community Support Site, [Online] http://www.softeng.rl.ac.uk/abm ccp/ [3] Django - the web framework for perfectionists with deadlines, [Online] http://www. djangoproject.com/ [4] Chin, Shawn, django-doccomment, GitHub, [Online] https://github.com/shawnchin/ django-doccomment [5] Model-View-Controller, Wikipedia, [Online] http://en.wikipedia.org/wiki/model%e2% 80%93view%E2%80%93controller [6] Object-relational mapping, Wikipedia, [Online] http://en.wikipedia.org/wiki/ Object-relational mapping [7] Regular expression., Wikipedia, [Online] http://en.wikipedia.org/wiki/regular expression [8] Gruber, John, Markdown, Daring Fireball, [Online] http://daringfireball.net/ projects/markdown/ [9] django.contrib.markup, Django Documentation, [Online] http://docs.djangoproject. com/en/dev/ref/contrib/markup/ [10] Salvat, Jay, markitup! universal markup jquery editor., [Online] http://markitup. jaysalvat.com/home/ [11] Richardson, Leonard, Beautiful Soup, [Online] http://www.crummy.com/software/ BeautifulSoup/ [12] Holovaty, Adrian and Kaplan-Moss, Jacob, The Django Book, [Online] http://www. djangobook.com/ [13] Resig, John, jquery, [Online] http://jquery.com/ [14] Carnegie Mellon University, CAPTCHA: Telling Humans and Computers Apart Automatically, [Online] http://www.captcha.net/ [15] SQLite, [Online] http://www.sqlite.org/ [16] Django s Cache Framework, Django Documentation,[Online] http://docs. djangoproject.com/en/dev/topics/cache/ [17] Memcached, [Online] http://memcached.org/ [18] Hudson, Rob, django-debug-toolbar, [Online] http://robhudson.github.com/ django-debug-toolbar/ [19] Managers, Django Documentation, [Online] http://docs.djangoproject.com/en/dev/ topics/db/managers/ [20] Signals, Django Documentation, [Online] http://docs.djangoproject.com/en/dev/ topics/signals/ [21] Google, Google Analytics, [Online] http://www.google.com/analytics/ 15

[22] Google, Google Analytics for Developers, Google Data Protocol, [Online] http://code. google.com/apis/gdata/ [23] Disqus, [Online] http://disqus.com/ 16