Comparative Analysis Report: Visualization Tools & Platforms By Annabel Weiner, Erol Basusta, Leah Wilkinson, and Quenton Oakes
Table of Contents Executive Summary Introduction Assessment Criteria Publishability Insight Workability Integration Varying Methods & Limitations Applications & GUI s JavaScript & Libraries Choice of Tool or Platform Platforms & Libraries Cytoscape D3.js NodeGoat Neo4j Conclusions
Executive Summary We analyzed five tools (Cytoscape, Gephi, D3.js, NodeGoat, and Neo4j) on the basis of four criteria (Publishability, Insight, Workability, and Integration). We discovered a natural division in the tools available: Applications and Javascript libraries. Applications are generally easier to use but more difficult to publish. This category includes Cytoscape, Gephi, and NodeGoat. Libraries are programming languages designed, in this context, to visualize data. This category includes D3.js. Neo4j is less easily categorized, as it features a programming environment within a user interface. The main tension we found is the learning curve of Javascript libraries as opposed to the relatively limited functionality of GUI applications. This tension is connected with the concern for future actions taken by the Digital Scholarship Team in whether to offer visualizations that can be offered as products or visualization training offered as a service. Our recommendations moving forward regarding visualization tools are as follows: Gephi seems to be the most appropriate GUI tool due to unique publishability features, while D3.js and Neo4j are promising tools that require more expert knowledge. Gephi is increasing its market share, but D3.js is very well established with a knowledgeable user community, potentially of great use in anticipating new users' needs when designing training. In broader terms, we would recommend introducing user testing as early as possible in this process to determine the added value for researchers in either scenario. Anticipating users needs before they can be clearly articulated is a great accomplishment for a library service, but a clear image of what these needs may be is of the highest priority in offering a radically novel service such as visualization assistance.
Introduction Before our team set out to create visualizations using Cytoscape and D3.js we decided to first assess whether the two tools would be the best platforms / libraries we could use that were available to us. The fact that Cytoscape is designed for visualizing molecular interaction networks and biological pathways was what led us to this decision. If we can find a platform that has been developed to accomplish specifically what we would ideally like Cytoscape and D3.js to do, it would be the better platform to pursue and invest more time in. For our assessment we focused on four key criteria we thought would be essential for the final platform and/or library to have when it comes to the creation and sharing of visualizations. We decided on these criteria from analyzing our meetings with the Digital Scholarship Team. Assessment Criteria Publishability This criterion concerns how easy is it to recreate visualizations in a format that is accessible to a wider audience. Details such as how visualizations are created and how much tweaking is required for publication are also part of this criteria. The quality, features, and size of the visualization are also keys components to be evaluated as well. Insight This criterion discerns how powerful the platform is when it comes to research. Does the platform features and interface facilitate the discovery of new networks and relationships? Are there any limitations on the capabilities or capacity of the analysis done on the platform? These are all analyzed as implications for insight. Workability This criteria examines how easy the platform or library is to actually use. It encompasses the size of their respective communities, variety of third party plugins and tutorials, as well as overall prevalence in the field of information visualization. If a platform / language is easy to pick up and has quality documentations or tutorials it is considered to have high workability. Integration Our final criterion is how well the platform or library works with various data formats as well as other platforms and tools. Questions such as built in features for massaging the data and flexibility in data input format also fall under this criteria. The gap and number of steps between actual data analysis and the visualization recreation online is another aspect of this criteria.
Varying Methods & Limitations From our assessment of the different platforms and tools we noticed there were two categories when it came to visualization research and creation. Either there was an application with an easy to use graphical user interface, or there was a HTML/CSS friendly JavaScript library that had no GUI but were very flexible and strong. Applications & GUI s Application are what modern end users are most comfortable using, with features such as menus, file selection dialogs, tabs, and form like option configuration. They usually offer simple tutorials and easily navigable interfaces. These traits make visualization apps high in workability and in insight, since usually the very purpose of the app is to conduct network / node analyses. The tradeoff is that they offer little to nothing when it comes to the publishability and integration features: means to export the visualization are rare, maintaining its interactability or putting it online straight from the app, and when these features are present they compete with the user interface for the developer's time and energies, leading to lower quality output. JavaScript & Libraries JavaScript seems to be a requirement to posting visualizations on the internet: it is the language that allows for the creation of interactive graphical networks. As a programming language it also offers a variety of plugins and tweaks that enable it to work with many other languages, and all kind of datasets. Thus, it offers very high publishability as well as integration. This kind of tool does have drawbacks: it poses a significant challenge to understand and use to people that have no background in programming. It offers no real GUI, and can be cumbersome when it comes down to actually conducting analyses. This translates into very low workability somewhat decreased insight (due to the challenge of needing to fully comprehend the system before fine control is achievable) when using JavaScript libraries. Choice of Tool or Platform Depending on whether people will be trained in offering visualizations, compared to enabling users to create visualizations for themselves for research purposes, the category of tool chosen will have a significant impact. That is, the JavaScript route would prove to have a high learning curve and would entail some amount of instruction and specialization on how to use it effectively. On the other hand, an application that could conduct analyses as well as offer some amount of online publishing capabilities would empower individuals.
Platforms & Libraries For our comparative analyses we looked at a total of five different platforms / libraries. We chose them based on the prevalence of their use, which we felt would positively correlate with their total number of users. Bigger user communities often translates to there being more plugins, more comprehensive faqs and tutorials, as well as more refined documentation. Cytoscape Cytoscape is what has been used so far in regards to network and node analyses. As an app it is stronger in workability and insight criteria, but lower on publishability and integration. The installation is simple if you have Java installed. Cytoscape offers its own Javascript library, Cytoscape.js, for creating visualizations in HTML5. After the visualization is designed, the Javascript library help substantially in terms of online publishability. There is quite a bit of documentation and tutorials on Cytoscape, which helps with the fact that need for javascript programming may be a drawback in terms of internal and external training. Cytoscape requires Java 8, the latest and most secure version. Though it has a graphical user interface, it looks out of date and is not very easy to use. This is because it is geared towards experts, and specifically designed for genetics researchers. It is possible to view the visualizations in a web friendly method by exporting your files through Cytoscape.js, which is a javascript library for creating visualizations in HTML5. However, the visualizations do not necessarily satisfy all of the users needs: bigger networks and visualizations fail to display, and interaction is severely limited. Using Cytoscape to begin with requires some amount of manual labor in massaging and cleaning the data. Also, to even use Cytoscape.js users would require a background in programming. The drawbacks of Cytoscape.js negatively impact the Cytoscape app Javascript bundle, and seeing as D3.js is more powerful, flexible, and prevalent, moving to another app would have minimal cost. If the new app offers publishing features separate from Javascript, it would even be a benefit. Considering the limitations of its Javascript, it would still be possible to use the Cytoscape app / Javascript combo provided different information architectures were experimented with and used in the creation of visualizations.
Gephi Gephi is another app primarily focused on conducting network and node analyses. Like Cytoscape it has high workability and insight. Unlike Cytoscape it doesn t offer its own official Javascript library per se, but it does offer several open source user created ones. For example Gexf Javascript Web Viewer allows for drag and drop recreation of Gephi visualizations online. It does not accept xml, xlsx, or txt formatted files, but provided these are converted to csv files, they can be processed. Gephi can also export networks to SVG files which have the potential to be interactive and published online via different plugins. There seems to be a lot of online documentation and community support forums. One major issue with Gephi is that it doesn t work with Java 8; older versions of Java are needed for it to work. The fact that it doesn t run on the latest version of Java can be problematic in installation, and is a security risk. Gephi is also incompatible with xml, xlsx, and txt files, which can be a problem with the data provided by the Digital Scholarship Team. Gephi is in open beta, and will likely have a few bugs. It should be fine for internal use, but might not be otherwise. The interface and some simple interactions such as zooming in and out can be unintuitive at times. Being an app, Gephi focuses on data analysis and manipulation more so than visualization sharing, leading to lower publishability. D3.js D3.js is one of the more popular javascript libraries that allows for the creation of complex data visualizations online and in the browser. It offers extensive documentation, user tutorials, and plugins, all user generated and open source available on Github. Being primarily a javascript library, D3 scores exceptionally high in both integration and publishability, being one of the best and most used libraries for any and all online data visualization needs. D3.js s power and capabilities comes at a cost: effective navigation and use of the library, documentation, and its many features require a strong working knowledge of javascript programming. For those unfamiliar with programming D3 offers very low workability. Also, D3 is primarily about creating visualizations of data, and therefore is considered to do poorly on our insight criteria, as it does not offer any kind of network / node analyses features. Having to conduct analyses separately, it might be necessary to clean the exported data so D3 recognizes, further slowing and complicating the process.
NodeGoat NodeGoat requires no installation or Java, and does all of its analyses and visualizations online and in the cloud. Once the user logs in, they do the rest of their work in a browser environment. Its UI is simple and intuitive, making it easy to learn. It scores high on workability and insight, and average on publishability since visualizations are already on the web. Though it offers no clear indication on how to get started (read disadvantages), it does have a demo along with several video tutorials that demonstrate its features. Network visualizations are to be very interactive: a user can click on a node for more information and can view the development of the network by adjusting its timeline. It has the potential to be very cross platform. Making an account to explore NodeGoat with a demonstration set of data is not a straightforward process, and a local installation can only be assumed to be more challenging. According to NodeGoat's About page, a potential user must send them an email to discuss using NodeGoat. There appears to be no integration with other tools or systems, giving it very low accessibility and flexibility. Since it seems like it is a standalone tool, it is unknown if it can be hosted on another site. In addition, a user cannot export an image or a visualization; they can only export a csv file. To display the visualizations to others, a link to the NodeGoat site may be necessary. All these traits negatively impact this platforms publishability. Neo4j Neo4j might be a good compromise between the two categories of apps and Javascript libraries. It has a good interface with very good documentation. Plus, its engine appears to be totally server side, not requiring any user installation or logins. Neo4j offers both a community and enterprise edition. The community edition is free and may be adequate for use. However, the enterprise edition may still be worth looking into as the added support and customization options could prove to be important if the team plans to offer visualizations as a service. Although it is worth looking into the enterprise edition, the cost is unknown. It may be necessary to email the developers to discuss which edition would be the best to use in this instance. It is important to consider the functionality, support, and cost differences between the two editions. Like with some of the other tools we examined, opening files proves to be a challenge in Neo4j.
Conclusions All tools surveyed are potentially as workable as the current tool (Cytoscape). Gephi is a promising alternative thanks to its unique publishability feature. While it has severe challenges for installation, these are likely to be worked out through the development cycle. Javascript libraries clearly have power far beyond the GUI tools, and may be more useful in conjunction with a GUI tool to create a prototype visualization to be replicated programmatically, reducing time spent developing a solution that may be discarded. This approach is common in other situations (wireframing). Neo4j is worthy of further inquiry as a unique tool, but if the enterprise version is necessary, for full access to its tools or to exceed a certain data size limit, for example, it could be a sub optimal tool for the Digital Scholarship Team. The largest choice to be made is in the approach moving forward, between ease of use and power, split between GUI applications and Javascript libraries. The GUI tools are likely easier to teach to researchers with little programming background, but the capabilities desired may not be present in these tools, while expert users would be able to create very novel visualizations with programmatic tools. One example is hyperlinking nodes in a network visualization. This features appears to be exclusively available in Javascript based tools, be it Cytoscape.js exports or D3.js. Ease of installation, as an opportunity cost, was also a concern. Javascript tools are easily installed if the underlying concept of them is understood in a real way there is no installation, if hotlinking is acceptable. However, to a novice user the installation can be extremely daunting, as daunting as general use. This was a more major concern with the GUI tools, as some required outdated versions of Java, potentially leading to actual system insecurity, as well as challenging workarounds created by the community in some cases. This process is just as alien to novice users as hotlinking a Javascript library. The next step moving forward should be user research to examine the choice at hand, between GUI tools and Javascript libraries. Understanding researcher's skill level with computer tools will communicate a great deal of useful information about possible services to be offered, such as detailed data about the time it takes to train an individual to use a tool, the quality of data that is expected to be used, and similar questions. For creating visualizations in house the challenge of user interaction will lie more on the experience of users discovering it on the web and if it is seen as useful and insightful or simply decoration, as well as revealing the amount of time and effort required to master the creation and hosting of the visualizations, as well as other back end concerns that can only be truly discovered through implementation, working through the process as it would be performed if it were an established service.