Computer Programming for the Social Sciences



Similar documents
USC VITERBI SCHOOL OF ENGINEERING INFORMATICS PROGRAM

Digital Collections as Big Data. Leslie Johnston, Library of Congress Digital Preservation 2012

Social Media Case Study: A Look at API s Vote 4 Energy Campaign

Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach

Computer Programming for Lawyers: An Introduction. Spring Paul Ohm Jonathan Frankle. Syllabus Version 0.57

Note: Hands On workshops are Bring Your Own Laptop (BYOL), unless otherwise noted. Some workshops are Bring Your Own Mobile Device(BYOD).

INTRODUCTION TO DATA SCIENCE USING R

GETTING STARTED CREATIVE ENTREPRENEUR VERSION A PINES UP NORTH WORKBOOK

This is the type of Brand we want to help you create using Social Media.

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

2015 Workshops for Professors

SEO: What is it and Why is it Important?

Real-Time Analytics: Integrating Social Media Insights with Traditional Data

Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.

Agile BI with automated, real-time Data Collection

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

PDF File: Python: Programming, Master's Handbook: A TRUE Beginner's Guide! Problem Solving, Code, Data 1Science,

GETTING STARTED WITH R AND DATA ANALYSIS

Big Data a big issue for Official Statistics?

Social Sentiment Analysis Financial IndeXes ICT Grant: D3.1 Data Requirement Analysis and Data Management Plan V1

Module C5 Longitudinal Data Analysis

This survey addresses individual projects, partnerships, data sources and tools. Please submit it multiple times - once for each project.

Business Training Courses. Autumn Supporting small businesses in County Kildare

Data Analytics in Organisations and Business

Webinars. bp&r BRITISH PLASTICS &RUBBER

Oracle WebCenter Sites Mobility Server Enabling exceptional mobile and tablet web applications and web sites without compromise

web analytics ...and beyond Not just for beginners, We are interested in your thoughts:

MicroStrategy Course Catalog

2013 Ruby on Rails Exploits. CS 558 Allan Wirth

Report for Project Proposal. Web-based Revenue Receipt & Sate Land Record Management System (WBRR & SLRMS)

HPC Wales Skills Academy Course Catalogue 2015

Social Media and Digital Marketing Analytics Professor Anindya Ghose Tuesday-Thursday: 2-3:15 pm FALL 2013 twitter: aghose

XML enabled databases. Non relational databases. Guido Rotondi

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Data Users Curriculum

How To Learn To Use Big Data

The Certified Irish Internet Recruiter Course CIIR 1.0

Hadoop. Sunday, November 25, 12

The Internet of Everything

Girls Make Apps Workshop FAQ

Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited

The Center for Research Libraries

Big Data and Analytics

HARNESSING TWITTER FOR ENTERPRISE SALES

What is Prospect Analytics?

Kentico CMS 5 Developer Training Syllabus

Tables in the Cloud. By Larry Ng

SUMMER SCHOOL ON ADVANCES IN GIS

AUGUST Register for our workshops on Academica. Workshops. Consultation Services. otl.wayne.edu

Perfecto Mobile. All rights reserved.

MSc Finance & Business Analytics Programme Design. Academic Year

4/25/2016 C. M. Boyd, Practical Data Visualization with JavaScript Talk Handout

PRACTICAL DATA SCIENCE

[SOCIAL MEDIA MANAGEMENT PACKAGE]

Marketing for Managers Online Format

GeoInt 2015 Watson Workshop

How To Handle Big Data With A Data Scientist

Apache Hadoop: The Big Data Refinery

ONLINE MAPPING EDUCATION FOR A NEW ERA OF MAPS

Big Data Analytics Reference Architecture and Business Value Roadmap

Workshop on Using Open Source Content Management System Drupal to build Library Websites Hasina Afroz Auninda Rumy Saleque

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

Introduction to Social, Mobile, and Local Marketing

Practical Image Management for


Six Signs. you are ready for BI WHITE PAPER

Predictive Analytics Certificate Program

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

FROM PROVIDER TO PARTNER: THE CHANGING ROLE OF LIBRARIES AND DATA MINING

Geography 676 Web Spatial Database Development and Programming

HTML5. Turn this page to see Quick Guide of CTTC

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus sandhaus@cs.nyu.edu evan@nytimes.com

Mark E. Pruzansky MD. Local SEO Action Plan for. About your Local SEO Action Plan. Technical SEO. 301 Redirects. XML Sitemap. Robots.

Apache Hadoop Patterns of Use

Business Training Courses Autumn 2015

Programme Specification Postgraduate Programmes

Based on: Google searches for language tutorials

MS-55052: SharePoint 2013 End User Level II

Business Information Systems. IT Enabled Services And Emerging Technologies. Chapter 4: Facilitated e-learning Part 1 of 2 CA M S Mehta, FCA

I have experience in building large Community websites, Business websites, Products and Services websites and E-commerce websites using Drupal.

Better Sales Leads and Conversion Rates in a 360-Degree World

Automatic data collection on the Internet (web scraping)

ONS Big Data Project Progress report: Qtr 1 Jan to Mar 2014

The Evolution of MERLOT

MACHINE LEARNING BASICS WITH R

#mstrworld. No Data Left behind: 20+ new data sources with new data preparation in MicroStrategy 10

Microsoft Big Data. Solution Brief

DIGITAL AND SOCIAL MEDIA TRAINING WORKSHOPS

Assignment 5: Visualization

Smartphone Applications for Medication Management

Marketing Solution

COURSE DESCRIPTION. Students participating in the course should have proficiency in one or more of the following areas:

D5.5 Initial EDSA Data Management Plan

SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Building Big with Big Data Now companies are in the middle of a renovation that forces them to be analytics-driven to continue being competitive.

A Review of "Free" Massive Open Online Content (MOOC) for SAS Learners

Visualizing Big Data. Activity 1: Volume, Variety, Velocity

DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights

TRAINING & CONSULTANCY GUIDE

Transcription:

Department of Social and Political Sciences Computer Programming for the Social Sciences This two day workshop will teach beginner level, practical computer programming skills for use in social science research. There will be a focus on the automatic collection of data (from websites, twitter feeds, etc.) and its processing into clean, formatted datasets ready for analysis. Researchers will leave the course with an installed programming environment and a variety of code snippets for use in their own projects. 4 th and 5 th of June 2012 The Emeroteca, Badia Fiesolana Register with Adele Battistini adele.battistini@eui.eu Organized by Jonathan Bright jonathan.bright@eui.eu Sponsored by Professor Fabrizio Bernardi 10 credits For more information see: http://www.eui.eu/seminarsandevents/index.aspx?eventid=77143

Overview The rise of what is often known as big data presents a huge opportunity for social scientists. The amount of information available to researchers is growing exponentially: a recent study estimated that in 2010 the quantity of digital data in the world exceeded one trillion gigabytes. Much of this information is available for free for use in research: for example, many governments have started to move towards open data, opening up their own archives and releasing huge quantities of previously unseen data. This flood of data creates a wealth of opportunities both to test and refine existing theories and to create new ones about the way social life works. However, while it might be available for the first time, much of this information does not come prepared and ready to plug into a statistical analysis package, and is often not quite as open and easy to use as its proponents imagine. To fully take advantage of the opportunities provided by big data, social scientists will increasingly require the ability to program : to instruct their computer to understand this information, to capture, analyse and format it, and to put it to use. This course is intended as an introduction to this type of practical computer programming, geared towards data collection, and created specifically with the needs of social scientists in mind. The workshop is open to all students at the EUI and is specifically aimed at beginners with no previous programming experience. The aim will be to convince even those who think of themselves as computer illiterate that basic programming ability is both a vital part of a researcher s skill set, and something that can be relatively easily learnt. Those participating in the course will: Outcomes Acquire a basic knowledge of computer programming, and the functioning of elementary programming language statements such as if else statements and for loops Learn how to mine the wealth of data that is available on the internet in both structured and unstructured formats Get an insight into available tools for harvesting data from public information services such as Google and Twitter Learn about existing open data repositories on political systems in the US and the EU, and how to access them Learn how to output clean, formatted datasets for easy use with statistical software packages Leave with a fully functioning programming environment installed on their personal laptop, ready for application to their own research projects The language which will be used is Python, which is often considered one of the simplest language to pick up yet contains within it a lot of power and flexibility. Where possible some translations of the concepts into STATA and R will also be offered.

Format & Requirements The course will be structured around a series of workshop sessions. In each session, a new programming concept will be introduced, and then students will be encouraged to try it out with help from the course instructor. A variety of practical examples will be deployed, using data from government archives, Twitter, Wikipedia, etc., with an emphasis on creating re-usable code which can easily be deployed in future research projects. This is a practical course, hence attendees should if possible bring a laptop (if you are unable to do so please contact the course instructor at: jonathan.bright@eui.eu), and should also install the Python programming language which will be used throughout the workshop (see http://www.python.org/getit/). Please install Python version 2.7.3 (please complete this step even if you own a Mac which already contains a version of Python on it, as this version may well be out of date). Participation in all the required sessions on the course will be sufficient to obtain the 10 credits on offer.

Programme Day 1 Monday June 4 th 2012 Emeroteca Before coming to the workshop, please install Python version 2.7.3. See: http://www.python.org/getit/. If you have any problems please come along between 9.00 and 9.30. 09.30 11.00 Session 1: Introduction to computer programming This session will set the context for the rest of the workshop by examining some basic questions about the subject. What is computer programming, and why should social scientists be familiar with basic programming techniques? We will then move on to introducing the specific language being learnt during the workshop (Python), and look at some of the fundamentals of programming: how to create and run a script in python, how to assign variables, etc. 11.00 11.30 Coffee break 11.30 13.00 Session 2: Basic computer programming This session examines elementary programming techniques which are fundamental to the use of computing for the social sciences. Simple if else statements, loops, different types of data structure and file handling will all be discussed. These techniques will be applied towards the manipulation and reformatting of a simple dataset, and the basic analysis of a text file. 13.00 14.30 Lunch 14.30 16.00 Session 3: Web Scraping I: Using APIs One of the advantages of popular scripting languages such as Python is that many special interfaces to a huge variety of data sources are available for use; hence many data collection tasks can be achieved without complex programming. This session discusses the general principles behind such interfaces, which are known as APIs. These concepts will be then put into practice using some existing APIs to access publicly available data generated by Google, Twitter and Facebook, as well as on political systems in the US and the EU. 16.00 16.30 Coffee break 16.30 18.00 Review session This is an optional session, and its content will depend on the progress made during the day. It can be used to have another look at concepts covered during the day or to ask the lab instructor any questions.

Day 2 Tuesday June 5 th 2012 Emeroteca 9.30 11.00 Session 4: Web Scraping II: Semi-structured and unstructured data Even if APIs and structured data are growing in number, the majority of information on the web is still largely unstructured, which means in practice that it is difficult for computers to read and interpret in a fully flexible way. This session discusses how to access such data, looking at interaction with HTML (the structure of an ordinary web page) and XML (which is used by many webpages for communicating new content). 11.00 11.30 Coffee break 11.30 13.00 Session 5: Further resources: text processing, network analysis geocoding, etc. This session will introduce a range of further python modules which may be of use in certain types of social science project, such as ones which tackle text processing and content analysis, network analysis, and geocoding. The aim is not to offer an introduction into the theoretical or statistical bases of these types of techniques, merely to show how they can be (simply) executed in Python, thus building familiarity with the language and showcasing the range of tools on offer. 13.00 14.30 Lunch 14.30 16.00 Session 6: Large Project Handling (putting it all together) This session integrates the previous five sessions, discussing the management of large data projects in the context of social science research. Topics covered include data storage, continuous sampling periods (such as gathering tweets during an election campaign), error handling, the management of code bases and the writing of reusable code. 16.00 16.30 Coffee 16.30 18.00 Review session This is an optional session, and its content will depend on the progress made during the day. It can be used to have another look at concepts covered during the day or to ask the lab instructor any questions.