DKAN Data Warehousing, Visualization, and Mapping
Acknowledgements We d like to acknowledge the NuCivic team, led by Andrew Hoppin, which has done amazing work creating open source tools to make data available to the world; it s been a pleasure improving DKAN together over the past two years. Gemima Barlow and the NDI Nigeria team initially supported the development of color shaded maps, teaching us the meaning of the world choropleth in the process, and NDI s Gender, Women and Democracy team for significant user identified and funded important usability improvements. This content is available under a Creative Commons Attribution ShareAlike 4.0 International Public License. You are free to: Share copy and redistribute the material in any medium or format; Adapt remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. The license terms include: Attribution You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use; ShareAlike If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original; No additional restrictions You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits. Table of Contents Acknowledgements Table of Contents Introduction Purpose of DKAN Features Adding Data to DKAN Adding a New Dataset and Resource(s) Step 1: Create the Dataset Step 2: Add one or more Resources to the Dataset Step 3: Adding Metadata to a Dataset Visualizations Charts Choropleth Maps Publishing Visualizations Questions Data Stories
Introduction Many governments, institutions, and organizations are now moving towards open data, collecting and publishing large quantities of information in an effort to increase transparency and use data to inform policy. However, open data is not enough to improve lives, as the raw data has to be presented in meaningful and accessible ways to both citizens and policymakers. Data needs to be organized, processed, and presented in human-readable formats so that citizens, analysts, and policymakers can effectively use the information. However, many organizations lack the resources and technical capability to use commercial data visualization services or develop platforms of their own. That often means that the organizations in the best position to collect data and work closely with the communities the data comes from lack the ability to present and share this information in effective ways. Purpose of DKAN Spreadsheets of raw numbers are difficult for most of us to easily understand. With DKAN, organizations can take large amounts of data and instantly organize, display, analyze, and visualize this information. This data-driven storytelling can help policymakers quickly understand the data to make better decisions, and each form of visualization can be instantly created as needed. Choropleth maps instantly show regional trends and variations, and a large dataset can instantly be organized into multiple charts and graphs comparing changes over, time, region, funding, or any number of variables. While other programs can easily be used to create individual graphs or sort lists of data, DKAN provides a comprehensive data warehousing, browsing, and visualization solution for large sets of data tagged with multiple variables, with highly customizable options based on the same set of data. DKAN is especially useful for rapidly prototyping multiple visualizations, aggregating data, and displaying changes over time or by geographic region. It has been particularly successful in releasing data from elections, censuses, health monitoring, and economic analysis. Features Ready to use out of the box, DKAN boasts powerful data warehousing, publishing, and visualization capabilities. With this tool, users can quickly publish and display open data, creating powerful data narratives with charts, graphs, and maps. The content management system (CMS) can be integrated with blogs and DKAN is compatible with major open data
standards, including the White House s Project Open Data and data.gov. Since DKAN is open source, users can download the source code from our Github or Drupal for free and use the tool used by governments pursuing open data and used by NDI in multiple elections for publishing and visualizing data. Adding Data to DKAN DKAN s data publishing model is based on the concept of datasets and resources. A dataset is a collection of one or more resources; a resource is the actual data being published, such as a CSV table or a GeoJSON data file. Adding a New Dataset and Resource(s) In our example, we ll be adding a dataset with Wisconsin polling places to a DKAN site. The data may look familiar; it's one of the sample datasets provided with DKAN upon installation. Step 1: Create the Dataset By default, only authenticated ( logged in ) users can add new Datasets and Resources to a DKAN website. Once logged in, we can use the "Add Dataset" link in the main navigation bar. Depending on your user permissions, you may have access to the administration menu; in that case, you may also navigate to Content >> Add Content >> Dataset link to access the Create Dataset form.
The Dataset is simply the container or folder for the actual data resource files and contains basic higher level information that applies across all the data, such as title, description, category tags, and license. Once we ve entered information about the data, we can click the Next: Add data button to begin adding data. Step 2: Add one or more Resources to the Dataset
After creating a dataset, we re prompted to add one or more data resources to it. There are three types of Resources that can be added to a Dataset, depending on the type and location of the Resource: Upload a file this option allows publishers to upload data files to the DKAN site. As in the link to a file option, the data within the file will be imported into your DKAN site s Datastore for preview and analysis by your users. See The DKAN Datastore for more information. Link to a file this option allows publishers to create a link to a data file published on another Internet website. Although the file itself will remain on the other site, the data within the file can be imported into your DKAN site s Datastore for preview and analysis by your users. See The DKAN Datastore for more information. Link to an API some data resources aren t standalone files but queryable online databases; the interface to these databases is known as an API. Adding links to these types of online database interfaces to your DKAN data catalog can be very useful for developers interested in working with your data. Typically, you ll need to upload a file (almost always a.csv), so please feel free to ignore the linking options if you don t need them. To continue with our Wisconsin Polling Places example, we ll add one resource file to the Dataset we created in Step 1. Our resource file is a CSV that is, comma separated values format; this is a popular file format for exchanging tabular data. Let s explore the example resource shown here and the various fields within: Resource / Choose File upload a file from your local hard drive. Resource / Recline Views DKAN s Data Preview feature allows visitors to preview published data in three views: Map data with latitude and longitude coordinates can be previewed in a map interface Graph tabular (spreadsheet) data can be graphed by users, letting them create their own meaningful visualizations (Please note this is a method for the data intake, not for rendering the graphs themselves) Grid by default, tabular data is presented in a basic spreadsheet view, with filter, sort, and search capabilities
Title this is the title of the individual data file, not the parent dataset container. Description a rich text editor field is provided so publishers can offer detailed and useful descriptions Format entering the file format here will allow users the ability to search for data by specific format Dataset this is the parent dataset container; this field should already be populated if you re adding a Resource subsequent to adding a Dataset At the bottom of the Add Resource page, we can choose: Save Save progress on this resource and immediately return to it for further editing Save and add another Save this resource and add another resource to the same dataset Next: Additional Info Save this resource and enter optional metadata In our example, we re only adding a single resource, so we ll click Next: Additional Info to move onto Step 3. If we had more than one resource to add to this dataset, we would choose the Save and add another option. Step 3: Adding Metadata to a Dataset Organizations may be interested in providing valuable information about their dataset to both human visitors to the website and machines discovering the dataset through one of DKAN's public APIs. All the below fields are optional, but provide important context on data type, kind and function. Adding additional metadata to the dataset serves to further clarify how the data can be used by others.
Let's take a closer look at some of the metadata fields available on this form: ** Author** The data set's author, in plain text. Spatial / Geographical Coverage Area Lets us define what region the data applies to. In this case, the US State of Wisconsin. You can use the map widget to draw an outline around the state borders, or, click the "Add data manually" button if you already have a GeoJSON string you can paste in. Spatial / Geographical Coverage Location The region the data applies to, written in plain text. This can be used instead of or in addition to the Coverage Area field.
Frequency How often is this dataset updated? We might expect our list of polling places to be updated every year, so we could select "annually." However, often we don't expect the data to be updated (even in this case, perhaps we plan to post the next version of the data as a separate dataset), in which case we can leave this blank. Temporal Coverage Like Geographic Coverage, this field lets us give some context to the data, but now for the relevant time period. Here we could enter the year or years for which our polling places data is accurate. Granularity This is a somewhat open ended metadata field that lets you describe the granularity or accuracy of your data. For instance: "Year". Data Dictionary Another open ended field, this is a space for almost any kind of explanation for understanding the terminology/units/column names/etc. in our dataset. In most cases, this will be a simple URL to a Data Dictionary resource elsewhere on the web. Additional Info Lets us arbitrarily define other metadata fields. See Additional Info field for more information. Resources This field is a reference to the resources you have already added. You should generally leave this field alone and use the workflows outlined here and in Updating Datasets in DKAN to add, edit and remove resources from your Dataset. After you click "Save", the metadata we enter will appear on the page for this Dataset:
Visualizations Charts For numeric data that s best rendered comparatively, you ll want to make charts with your resources. You can make bar charts, pie charts, scatterplots, or line graphs. Navigate to the dataset you want to base your chart on, then Click the Explore Data button
Right click (or on Macs, control click) the download button to copy the URL of the resource file. Saving this link will allow you to directly revisit your resource in the future. Now use the administration menu at the top to navigate to Structure» Entity types» Visualization» Chart» Add Chart
Enter values for the title, description, categories and tags fields. At the bottom of the form, paste the resource link you just copied into the Source field. Now, click the Next button. If the URL was loaded properly you will have two fields to fill under the title 'Define Variables'. The first one, 'Series' stands for the Y axis, and the second field, X Field, stands for the X axis. On these fields you have to choose the columns that you are going to display. Only the Series field can contain multiple values. If the column names are not displayed properly, check again that your source URL was correct. Keep the radio buttons checked in 'auto'. After making sure that everything is correct, click the Next button.
Now you can select the type of chart you want to create. Click on the image of the chart type you would like to use. The charts on this screen are generic images and not based on the data you loaded. To see the actual chart, click the Next button. If everything went ok, you should see your chart displayed. The data might be slightly misplaced so on the right column, you can edit the X Format for the labels (number, date, etc), Label Rotation, Color of the lines / columns / etc, X and Y labels for the axis themselves and margins to move not only the labels but the chart as well. If you would like to see what this data looks like in another type of chart or graph, click Back on the bottom on the page and repeat these steps with another chart or graph selection. After editing and customizing the chart to your liking, click the Finish button.
Now you have created your chart. On the chart s page, there will be an Embed button. Click on it to reveal the HTML Embed code which you can add to any website to embed a live, dynamic chart which will update if you change the chart on your DKAN site. You can also set the height and width of the embedded chart by typing it into the Height and Width boxes above the Embed code. Choropleth Maps A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region. Choropleth Maps can effectively be used to report area values at virtually any scale, from global to local and the data can be thought about in many different ways at many different levels of analysis, from general overall patterns to the detection of details. They are especially helpful for finding intriguing hot spots. 1. Look for Content > Add Content > Resource in the admin menu and click on it.
2. Upload a csv file for the resource. 3. Fill the required fields and save the resource
4. Look for Structure > Entity Types > Geo File > geojson > Add geojson in the admin menu and click on it. GeoJSON is a widely used data format for displaying vectors in web maps. It is based on JavaScript object notation, a simple and minimalist format for expressing data structures using syntax from JavaScript. In GeoJSON, a vector feature and its attributes are represented as a JavaScript object, allowing for easy parsing of the geometry and fields.
5. Set Title 6. Upload a geojson file 7. Fill name attribute with the column name in the data (csv resource) that will match the name property for the features in the geojson file.
8. Click Save.
9. Look for Structure > Entity Types > Visualization > Choropleth Visualization > Add Choropleth Visualization in the admin menu and click on it.
10. Fill Title 11. Select the geojson file we created for the geojson field. 12. Select the resource file we created for the resource field.
13. Select the colors you like to use for the choropleth map. 14. Fill data column with the column or columns in the csv of your data that you want to display in the map. Separate multiple columns with a comma. The columns that you choose will appear as radio buttons on the side of your visualization, which you can then toggle between to see the effect of different data. If you leave this field blank, you'll get a list of radio buttons for all of the columns in your data sheet. The select of certain columns in your data can be helpful when, for instance, trying to show change of data over a certain time period you could for example choose the April, May, June columns, but leave out July, August, September.
15. Fill the data breakpoints with comma separated numbers. If you leave this field blank, breakpoints will be calculated for you based on the data. You will use breakpoints to determine what data values will be captured by different colors on the visualization. For instance, if you use 25, 50, 75, 100 as your data breakpoints, your visualization will display 4 different shades one for those values between 0 25, a slightly darker shade for values 25 50, an even darker shade for values 50 75, and the darkest shade for values 75 100. Remember to choose your breakpoints wisely based upon the data that you want to display!
16. Click Save & Enjoy! Publishing Visualizations After you finish creating the visualization, click on the blue Embed button to get an embed code for sharing the file on other platforms. You can alter the height and width of the file to be embedded by entering the desired values in the corresponding text boxes. Once you ve copied the code, you can now implant your visualization anywhere with a field for embedding an HTML element. Even on other sites, the graph will automatically update to any change made to the source data or settings on DKAN.
Questions DKAN not only renders data visualizations, it can serve as a standalone data storytelling platform as well. The first function available for telling data stories is creating a question, which allow users to combine visualizations with companion text and images. Fill in the fields as desired, attach files, and categorize the question as fits the content. Fields marked with a red asterisk ( * ) are required to create the question. Make sure the entity URL matches the one auto generated for the question. Previously rendered visualizations can be added to the question by pasting the embed code into the corresponding field. Click Save at the bottom and your question is ready for viewing. Data Stories Telling stories based on data is a primary goal of DKAN. Visualizations can be used to create a clear understanding of a complex situation. Furthermore, elements of storytelling can be used to illustrate what the findings actually mean. The best method for leveraging the narrative in your data with DKAN is creating a data story. Data stories consist of multiple elements and pieces of content, allowing you to build unique and engaging bulletins showcasing your data.
Title it and add any images, body text, or tags, then select the layout that best fits how you want to represent your data and content. Click Save and you ll be greeted with a screen prompting you to add and define your content. The functional icons do the following: plus icons allow you to add content gear icons permit you to modify formatting options paintbrush icons allow you to change the style of content s pane arrow icons enable you to change the position of the content trash can icons allow you to delete the content
You can add all kinds of content, new or existing, and organize it as you see fit. When you ve finished building and organizing content, click the save button at the bottom and your data story is ready!