Overview of edx Analytics



Similar documents
GETTING STARTED SECURE FILE TRANSFER PROCEDURES A. Secure File Transfer Protocol (SFTP) Procedures

PDG Software. Encryption Guide

Molina Medicaid Solutions EDI Unit sftp Companion Guide 9/5/2012

EdX Research Guide. Release

Published. Technical Bulletin: Use and Configuration of Quanterix Database Backup Scripts 1. PURPOSE 2. REFERENCES 3.

File Share Service User guide

An Introduction to Secure . Presented by: Addam Schroll IT Security & Privacy Analyst

PDG Software. PDG Key Manager User Guide

PDG Software. Keyman Encryption Guide

Secure File Transfer Protocol Updated Procedures. June 20, 2011

HMRC Secure Electronic Transfer (SET)

Signing and Encryption with GnuPG

Global TAC Secure FTP Site Customer User Guide

Ashgate FTP Web Login

1. Product Information

Online Backup Client User Manual Linux

MIS Export via the FEM transfer software

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

Networks & Security Course. Web of Trust and Network Forensics

PayFlex FTP - Wired Commute

RecoveryVault Express Client User Manual

Sophos for Microsoft SharePoint Help

Capture Pro Software FTP Server System Output

Online Backup Linux Client User Manual

Student Project 1 - Explorative Data Analysis with Hadoop and Spark

Online Backup Client User Manual

Receiving Secure from Citi For External Customers and Business Partners

Experian Secure Transport Service

Online Backup Client User Manual

HW/Lab 1: Security with PGP, and Crypto CS 336/536: Computer Network Security DUE 09/28/2015 (11am)

Please note that a username and password will be made available upon request. These are necessary to transfer files.

Amazon WorkDocs. Administration Guide Version 1.0

MTRS 2.0 Transaction Reporting Gateway Guide

TRUD Service Download Guide

File Space / Web Space / Database Space - Self-Service Allocation August 2009

GPG installation and configuration

Pretty Good Privacy with GnuPG

Using the Filex File Distribution System

Sophos for Microsoft SharePoint Help. Product version: 2.0

Security Correlation Server Backup and Recovery Guide

Online Backup Client User Manual

Signing and Encryption with GnuPG

1. Please login to the Own Web Now Support Portal ( with your address and a password.

ShadowLink 2. Overview. May 4, ONLINE SUPPORT emdat.com/ticket/ PHONE SUPPORT (608) ext. 1

Managing Your Class. Managing Users

SonicWALL CDP Local Archiving

Internet Programming. Security

We begin with a number of definitions, and follow through to the conclusion of the installation.

Unless otherwise stated, our SaaS Products and our Downloadable Products are treated the same for the purposes of this document.

LiteCommerce Advanced Security Module. Version 2.8

Tutorial: Encrypted with Thunderbird and Enigmail. Author: Shashank Areguli. Published: Ed (August 9, 2014)

Guidance for IA DMM: Connecting Your Computer to FSU Video File Server

This survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple times - once for each project.

GPG - GNU Privacy Guard

XCloner Official User Manual

DSI File Server Client Documentation

Privacy Policy. Introduction. Scope of Privacy Policy. 1. Definitions

How To Encrypt A Traveltrax Report On Gpg On A Pc Or Mac Or Mac (For A Free Download) On A Thumbdrive Or Ipad Or Ipa (For Free) On Pc Or Ipo (For An Ipo)

How to Create and Maintain an Anonymous Identity Online

Report and Dashboard Template User Guide

Secure Data Transfer

A SHORT INTRODUCTION TO CYBERDUCK WITH CLOUD OBJECT STORAGE. Version

Kickstart Your Profits With. Prosper 202. Your Quickstart Guide to Profitable Tracking. By Sheldon Gray

USER GUIDE for Salesforce

SIS Data Importer Guide for Administrators 1.0 (Beta) Contents

Archive Document Management for Dynamics CRM

PGP from: Cryptography and Network Security

Introduction to Directory Services

Simple Solution. Brighter Futures. TSDS Technical Course Module 3: Loading Data Using the DTU

vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide

Webmail Using the Hush Encryption Engine

How to Use the File Transfer (FTP) Service

SAP HANA Cloud Platform Frequently Asked Questions - Business

SIX Trade Repository AG

HOW TO CONNECT TO FTP.TARGETANALYSIS.COM USING FILEZILLA. Installation

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, Integration Guide IBM

Optimization of LMS for Improving User Response Time

Quick Reference Guide. Online Courier: FTP. Signing On. Using FTP Pickup. To Access Online Courier.

Online Backup Client User Manual Mac OS

Online Backup Client User Manual Mac OS

AJ Matrix V5. Installation Manual

EdX Learner s Guide. Release

ClockWork Enterprise 5

FAQ. Hosted Data Disaster Protection

1. Navigate to Control Panel and click on User Accounts and Family Safety. 2. Click on User Accounts

SOLUTION GUIDE AND BEST PRACTICES

VIPERVAULT STORAGECRAFT SHADOWPROTECT SETUP GUIDE

Amazon Web Services EC2 & S3

Submitting UITests at the Command Line

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, Integration Guide IBM

Transcription:

Overview of edx Analytics I. Data Available from edx EdX provides researchers with data about your institution's classes running on edx.org and edge.edx.org. This includes: Course data Student information Event tracking Courseware data Discussion forum data Wiki data Student state For more details, see Research Data Package Details. Complete data formats are described in edx Data Documentation. In addition, course staff can access some real-time course data from the Instructor Dashboard in the edx Learning Management System II. Description of edx Research Packages EdX provides two types of research data to partners who are running classes on edx.org and edge.edx.org: Log (event tracking) data Database data, including student information To access log and database data packages, you download files from edx, then extract the information from those files for analysis, as described in How Do I Get my Research Data Package? Event tracking data The edx platform gathers tracking information on almost every interaction of every student. Details about collected information are available in the Tracking Log documentation.

Event tracking data for your institution is collected in a file named: Date- Institution-tracking.tar. For example: 2013-10-27-UniversityA-tracking.tar When you extract the contents of this TAR file, sub-directories are created for each edx server that the course is running on. For example, you may see the following sub-directories: prod-edxapp-003 prod-edxapp-004 prod-edxapp-005 Each of these sub-directories contains a file of tracking data for each day. The TAR file is cumulative; that is, it contains files for all previous days your course was running on that server. The filename format for event tracking data files is: Date_Institution.log.gpg. For example: 2013-10-22_UniversityA.log.gpg. You must decrypt these files. Note: Because a course runs on multiple servers, during analysis you must combine events from each server to get a complete picture of course activity. Database data Database data files are collected in a ZIP file named: Institution-Date.zip For example: UniversityA-2013-10-27.zip When you extract the contents of this ZIP file, files are placed in the same directory as the ZIP file. The filename format of extracted files is: institution-course-date-data_type-serveranalytics.sql.gpg For example: UniversityA-Physics101-2013_user_id_map-prod-analytics.sql.gpg You must decrypt these files. Data file types The data files are views on database tables used by the edx Learning Management System. The following table describes the types of data files that edx delivers.

Type Filename Format Description edx Data Formats Documentation Authorized Users Institution-Course-Dateauth_user-Serveranalytics.sql.gpg about users authorized to access the course. auth_user table Authorized User Profiles Institution-Course-Dateauth_userprofile-Serveranalytics.sql.gpg about student demographics. auth_userprofile table Generated Certificates certificates_generatedcertificate- Server-analytics.sql.gpg Certificate status for graded students after course completion. certificates_generatedcertificates table Courseware courseware_studentmodule- Server-analytics.sql.gpg about courseware state for each student. There is a separate row for each (UNIT?) the course. For courses that do not have any records in this table no file is produced. courseware_studentmodule table Forums Server.mongo.gpg Course discussion forum data. Discussion forum data Course Enrollment student_courseenrollment- Server-analytics.sql.gpg about students enrolled in the course, enrollment status, and type of enrollment. student_courseenrollment table User IDs Institution-Course-Dateuser_id_map-Serveranalytics.sql.gpg A mapping of user IDs and obfuscated user_id_map table

Type Filename Format Description edx Data Formats Documentation IDs used in surveys. Wiki articles Institution-Course-Datewiki_article-Serveranalytics.sql.gpg Course wiki data. Wiki data III. Frequently Asked Questions 1. What kind of data does edx store? EdX collects course data of two different types: stateful and event data. The stateful data includes course XML, the self-reported demographic data that students supply when they register and the posts they make to course discussions and wikis, and student answers to assessments. The event data is a timestamped record of page requests and explicit events made in a course over a period of time. 2. Who has access to edx data? Partner institutions can arrange to download raw stateful and event data for their edx courses, even while they are still in progress Course staff have access to some of the data for a course from the Instructor Dashboard, including some aggregate statistics, as soon as they create the course. 3. How is data delivered? What is a data package? To package course data and deliver it to researchers, edx uses Amazon Web Services (AWS) Simple Storage Service (Amazon S3). EdX creates an account for each partner institution on Amazon S3 that a single designated "data czar" can access. Only an institution's data czar can access S3 to download the data package, which is the collection of files that contain raw, unprocessed stateful and event data. 4. How often are data packages delivered? Data packages are available for download from Amazon S3 weekly. They are usually available on Saturdays or Sundays. 5. What do the data packages contain? Do data packages include custom reports about each of my courses? The data package contains a ZIP file with the database state, that is, the

stateful data, and a compressed TAR file with daily event data. The data packages contain only raw, unprocessed data, without aggregation or customizations. 6. Our university does not yet have live courses. Can we get a sample of all the data formats, so that we can begin setting up research projects now? Sample data is available only on request; however, the data formats are described in the edx Data Documentation guide. 7. What is the typical size of the data for a 7-10 week course? Data packages contain the data for all courses offered by a partner institution. They include data for courses that are in progress, not yet started, and complete, and that have different course assets and numbers of enrolled students. All of these factors have an effect on the amount of data collected for a each course. That said, in general the stateful data for a course can be approximately 100 MB or larger, and the event data can be approximately 1 GB or larger in size. 8. Is there a sample data package? An obfuscated data sample is available from edx on request. 9. What resources does a university need to have in order to start doing research with the data? Different skills and areas of expertise participate in educational research, so you are likely to need a team of contributors. The team is likely to include database administrators who can work with raw, no-sql data to set up a SQL database and queries, engineers who can interpret files in JSON and XML format, statisticians and data analysts to mine the data, and educational researchers to pose questions and interpret results. 10. What is a Data Czar? How do we get one? A data czar is the single representative of a partner institution who is given credentials to securely download and decrypt edx data packages, and who is the primary contact for data within the organization. The data czar is also responsible for transferring the data securely after it is received by your organization. After partners select an individual to be their data czar, they work with their edx Program Manager to get the required credentials set up. 11. What technical qualifications should a Data Czar have? Data czars have experience working with sensitive student data, are familiar with encryption/decryption and file transfer protocols, and can sanity check, copy, move, and store large files. Some data czars are also

database administrators who can work with SQL and NoSQL databases and write queries on the data after it is downloaded. 12. I am a Data Czar. How do I get data for my university? You work with your edx Program Manager to set up a public/private key pair for GNU Privacy Guard. EdX creates an account on Amazon S3, and provides your Program Manager with the credentials for account access. (See How Do I Get my Data Research Package?) You download your data packages from Amazon S3 and decrypt them using the private key. 13. I am a course author, and would like some statistics about my course. Whom should I contact? The Instructor Dashboard provides access to certain demographics, and provides options to download CSV files of student data and course grades. For complete course data and help working with it, contact your data czar and the team that is working with your institution's data to conduct research. 14. Is there documentation? The edx Data Documentation guide provides information about the data and its structure. EdX also hosts a wiki with information and a discussion forum. Getting the first Data Package 1. I am a Data Czar for an xconsortium partner University or Organization. How do I get my data packages? You download data packages from an Amazon S3 account. To access the account, you use the credentials provided by edx. Review and contact your edx Program Manager if you have not received your Amazon S3 username and credentials. 2. How many files should I be downloading for one data package? To deliver the data each week, the edx team uploads two archives of encrypted files to Amazon S3. 3. What format are the files in the data package in? What do I do to "open" the data package? The data package contains a TAR file of event data and a ZIP file of stateful data. To open the package, you uncompress each archive. Each archive contains a folder with with subfolders and encrypted GPG files.you then use your private GPG key to unencrypt the GPG files.

4. What encryption mechanism does edx use? You define a GNU Privacy Guard (GPG) key pair, which consists of a public key and a private key. You share only your public key with edx. EdX uses the public key to encrypt your data files before compressing them. 5. How do I decrypt the files in my data package? You use your private GPG key to decrypt the data package. Different utilities for the decryption process are available for the Windows and Mac operating systems. 6. What do all the acronyms mean what is S3, PGP/GPG, and AWS? AWS is the Amazon Web Service, an online file service for storing files. S3 is the Simple Storage Service from AWS that edx uses for transferring data packages. PGP, or pretty good privacy, is a data encryption and decryption program. GPG, or GNU Privacy Guard, is an OpenPGP replacement for PGP. Sources: https://www.edx.org= https://edge.edx.org http://edx.readthedocs.org/projects/devdata/en/latest/ https://edx-wiki.atlassian.net/wiki/display/oa/open+edx+analytics+home