Data Analytics fr Campaigns Assignment 1: Jan 6 th, 2015 Due: Jan 13 th, 2015 These are sample questins frm a hiring exam that was develped fr OFA 2012 Analytics team. Plan n spending n mre than 4 hurs n this assignment and feel free t use any nline resurces and tls/sftware yu want. Yu are nt expected t knw everything in this exam. This will help me evaluate where everyne is in the beginning f the curse and adap the class t fit the needs and level f the students. Have fun! Questin 1: Yu are an Electins Analyst fr a majr plitical campaign. Yu have the fllwing infrmatin available t yu: A natinal database with a recrd fr each individual registered vter in the cuntry including their name, address, registered party, past vte histry, and demgraphic infrmatin (age, gender, ethnicity). Yur campaign s statistical mdeling team has built a set f individual- level (vter- specific) prpensity scres using the database abve. Each vter in yur database is given three distinct scres frm yur mdeling team: Demcratic supprt scre (0-100): Prbability this individual will vte fr the Demcratic candidate rather than the Republican candidate, given that s/he casts a ballt fr ne f these tw candidates 2012 turnut scre (0-100): prbability this individual will cast a ballt in the next electin Persuasin scre (0-10): prbability this individual will switch his r her vte frm supprting the Republican candidate t supprting the Demcratic candidate in respnse t a single cntact frm the Demcratic campaign. The scre distributin frm 0-10 indicates that the peple mst likely t switch frm supprting the Republican t supprting the Demcrat have a 10% prbability f making that switch in respnse t a single cntact frm the Demcratic campaign. A. Hw wuld yu use these scres and ther infrmatin in the database t cnstruct a universe f targets t cntact fr persuasin? Hw wuld yu rank r tier persuasin targets in pririty rder? Wh wuld yu NOT want t cntact fr persuasin? B. Hw wuld yu use these scres and ther infrmatin in the database t cnstruct a universe f targets t cntact fr GOTV? Hw wuld yu rank r tier GOTV targets in pririty rder? Wh wuld yu NOT want t cntact fr GOTV? (Nte: GOTV stands fr Get Out the Vte and its purpse is t increase turnut amng targeted vters.) C. One day, yu receive an email frm a senir campaign staffer wh asks the fllwing questin:
Accrding t yur supprt mdel, my friend has a supprt scre f 50 (ut f 100), but I knw she always vtes fr Demcratic candidates. Why has yur mdel assigned her a scre f nly 50? Hw wuld yu explain this t the campaign staffer? Assume this campaign staffer is a smart, educated persn with extensive plitical experience and little r n backgrund in statistics. Then suggest a different way that the campaign staffer can cnfirm the accuracy f the predictive mdel created by yur clleagues. Questin 2: It s early 2011, and yu are an Electins Analyst fr an ff- year special electin. Yu are asked t design a cntrlled experiment t measure the impact f a vlunteer telephne call r a vlunteer dr- knck n vter turnut the likelihd that a cntacted individual will vte in a given electin. Yu have available t yu a vter file database f public infrmatin frm the state Secretary f State. This database includes the name and address f every registered vter in the state. It als includes each individual s past vte histry which electins they did/did nt vte in, the party with which they are registered t vte, their birthdate and gender. After the electin, yu will btain an updated vter file with all f the same infrmatin and ne additinal field whether they vted in this special electin. A. Hw wuld yu design this experiment? What data wuld yu cllect? Hw wuld yu supervise the cnduct f this experiment? Hw wuld yu use the pst- electin vter file frm the state Secretary f State t determine the impact f vter cntact n turnut in the 2011 electin? After winning the special electin in 2011, the same candidate has t run again fr the same ffice in 2012. In the 2012 electin, yu d NOT cnduct an experiment. But after the electin, the campaign manager asks yu the same questin: what was the impact f a phne call r dr knck in 2012 n 2012 turnut? Yu btain a new vter file including an additinal field indicating whether each individual did r did nt vte in the 2012 electin. Yu als have additinal fields that indicate whether each persn was called r kncked in 2012, the date f attempted cntact and the result f cntact (nt hme, canvassed, etc.). B. What is the single mst imprtant difference between the experiment in part A and this analysis? Is this analysis easier r harder? Why? Hw wuld yu analyze this data? What caveats wuld yu include with yur analysis? Questin 3:
It s August 2012, and yu re wrking as a Statistical Mdeling Analyst fr a state Demcratic campaign. Yur campaign has access t a state database with a recrd fr each individual registered vter in the state including their name, address, party registratin, past vte histry, demgraphic infrmatin and mre. This infrmatin has been cmbined with a recent telephne pll f 5K randm cnstituents where each persn was asked what candidate they planned n supprting: Demcrat Herman Madisn r Republican Martha Whistler. There are n ther candidates. Using this cmbined data set, ne f yur fellw mdeling analysts has built a lgistic regressin mdel that predicts the prbability an individual vter will supprt the Demcrat. Belw are cefficients frm this mdel. The definitins f the variables are belw. Variable cefficient standard errr z scre Demcrat 1.45 0.09 15.81 Republican - 2.11 0.1-21.95 Ln_Incme - 0.109 0.041-2.63 Age - 0.013 0.0096-1.4 Age_Sq 0.0001 0.00009 1.52 Census_Cllege 1.77 0.33 5.37 AfAm 2.07 0.399 5.18 AfAm_Demcrat - 0.872 0.437-2.01 Cnstant 1.2 0.484 2.48 Demcrat Cded as 1 if the vter is a registered Demcrat, 0 if he/she is nt Republican Cded as 1 if the vter is a registered Republican, 0 if he/she is nt Ln_Incme The natural lgarithm f the vter s incme (in dllars) Age The vter s age (in years) Age_Sq The vter's age (in years) squared Census_Cllege The percentage f residents in the vter s neighbrhd wh have a cllege degree (scaled frm 0 t 100) AfAm Cded as 1 if the vter is African American, 0 if he/she is nt AfAm_Demcrat Cded as 1 if the vter is bth African American and a registered Demcrat, 0 if he/she is nt Cnstant The cnstant term A. Cnsider 4 vters, Adam, Bb, Chris and David. Adam and Chris share identical characteristics except fr their incmes. Bb and David als share identical characteristics (with each ther, nt necessarily Adam and Chris), except fr their incmes.
Name Incme Mdeled Supprt Adam $50,000 50% Bb $200,000 50% Chris $40,000? David $190,000? Based n the cefficients abve, wh wuld yu think has a higher prbability f supprting Herman Madisn? Chris David They have the same prbability Cannt tell based n the infrmatin prvided What is yur reasning? (yu need nt calculate an exact prbability t answer this questin. Just explain yur reasning in general terms.) B. The cefficient fr AfAm_Demcrat is negative. Hw d yu interpret this? Des this mean that African- American registered Demcrats supprt Herman Madisn at lwer rates than African- American independents? What abut relative t white registered Demcrats? C. Hw d we interpret the difference in supprt between vters f different ages? Hw d the variables in the mdel estimate such supprt? D. Are there any variables in this mdel that yu wuld chse t drp? Why r why nt? Wuld yu need mre infrmatin in rder t make this decisin? Questin 4: Yu are asked t predict the prbability each individual registered vter will turn ut t vte in the next electin. Yu have a database that includes the fllwing infrmatin fr each registered vter: Name, street address, city, state, zip, phne Past vte histry (whether r nt the individual vted in each past electin) Registered party (in states with party registratin) Birthdate, gender Additinal fields (such as educatin, incme, ethnicity) In additin t this database, yu als have survey data fr a small sample (ten thusand vters) indicating stated likelihd f vting in the next electin and strength f supprt fr the Demcratic candidate. This survey sample has been matched t the larger database. Yu als have similar data frm past electins. Hw wuld yu use this infrmatin t assign each individual registered vter a prbability f vting in the next electin? Wuld yu use the first data set by itself r wuld yu als use the
secnd data set? Hw wuld yu cmbine them if yu decide t use bth? Hw wuld yu validate yur mdel prir t the next electin? Questin 5: It s Octber, and yu ve been asked t build a statistical mdel t help identify likely supprters fr the campaign s Get Out the Vte peratin. The campaign s data team has assembled the attached dataset and yur task is t use it t build this supprt scre mdel. There are tw steps: PART A: Mdel building yu will build a mdel using sme r all f the attached data (cnsider part B befre starting part A). PART B: Validatin yu will validate this mdel using sme r all f the attached data. PART A: Fr yur cnvenience, we have put the file int Excel (attached). Yu may imprt r cpy and paste this data int any statistics package f yur chice (Stata, R, SAS, SPSS) t build yur mdel. Yur jb is t prduce a simple mdel that predicts the prbability f identifying fr the Demcratic candidate based n the attached data. We have als included a data dictinary which defines each variable fr yur reference. Feel free t use nt nly the variables included in the attached data set, but als ther variables built upn these (such as interactins r transfrmatins). The data may have sme missing values. Please keep this in mind, and explain hw yu will deal with this missing data and missing data in general. Please tell us what kind f mdels and algrithms yu wuld cnsider and explain yur chice f the mdel yu decided t build. Once yu have selected a single mdel type (regressin, decisin tree, supprt vectr machine, etc.), please build at least tw different variatins f that mdel. Fr example, yu may want t vary which variable(s) are included, r yu may want t try a variable transfrmatin r interactin. Please cpy and paste the results f each variatin int yur MS Wrd dcument. Discuss why yur final mdel is superir t ther mdels yu tried. Fr yur final mdel, please explain what variables are mst imprtant and hw the results shuld be interpreted. Additinally, using yur final mdel please create a clumn n the MS Excel spreadsheet that gives a prbability that each vter will supprt the Demcratic candidate (return the spreadsheet with yur exam). Please nte that we wuld like scres fr thse vters fr whm the dependent variable (supprt_demcrat) is missing. Please describe ne r mre graphics yu culd generate t use as a diagnstic tl t evaluate the quality f the mdel r as a visual tl t demnstrate the effectiveness f the mdel. (Yu may create ne r mre f these graphics if yu have extra time, but it is nt necessary.)
If yu had mre time, what else wuld yu d? What ther variables wuld yu ask fr / cnstruct? What ther mdel specificatins wuld yu explre and why (briefly)? PART B: Use sme r all f the attached data t validate yur mdel. Hw well des yur mdel validate? Why d yu say that? Nw suggest anther way yu culd validate yur mdel using external data rather than the attached data. What additinal value wuld this validatin prvide?