DATA SCIENTIST TRAINING FOR LIBRARIANS #DST4L. C. Erdmann DST4L @ Designing Libraries IV @libcce

Similar documents
INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10

Research Data Management Guide

The Libraries Role in Research Data Management: A Case Study from the University of Minnesota

State Library of North Carolina Library Services & Technology Act Grant Projects Narrative Report

MATLAB as a Collaboration Platform Marta Wilczkowiak Senior Applications Engineer MathWorks

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

Big Data for Science (and other) Students

Igniting young minds through computer programming

OpenAIRE Research Data Management Briefing paper

University Libraries Strategic Plan 2015

Library Strategic Planning

Digital Public Library of America (DPLA)

Great Expectations: How Digital Project Planning Fosters Collaboration between Academic Libraries and External Entities

COPO: Collaborative Open Plant Omics. Rob Davey Data Infrastructure and Algorithms Group Leader

Using GitHub for Rally Apps (Mac Version)

Digital Marketplace - G-Cloud

Packrat: A Dependency Management System for R

DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2

Software Configuration Management Best Practices

Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain

Engineering Technical Practices Management at BP

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

Reconciliation Best Practice

Core Competencies for Visual Resources Management

CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)

Software Continuous Integration & Delivery

RESEARCH DATA MANAGEMENT POLICY

Cypress Bay High School Broward County Public Schools Weston, Florida

THE HELMHOLTZ INVENIO REPOSITORY PROJECT :

Pitfalls and Best Practices in Role Engineering

Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP)

Data quality Vision at SBBr Danny Vélez

Middlesex Community College Library Strategic Plan

KNIME Enterprise server usage and global deployment at NIBR

Document Management. Document Management for the Agile Enterprise. AuraTech Pte Ltd

Data Is Integral To Our Culture

Entering its Third Century

Strategic Agenda for Library-based Research Data Support Services

How To Write A Blog Post On Globus

Informatica for Tableau Best Practices to Derive Maximum Value

When companies purchase an integrated learning

Web Application Development for Scientific Data

Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools

Simplifying the audit through innovation. PwC Accounting Symposium August 7, PwC Audit Transformation 19

Appendix A. Functional Requirements: Document Management

In depth study - Dev teams tooling

Action Plan towards Open Access to Publications

FUTURE OF THE LIBRARY AND INFORMATION SCIENCE PROFESSION: TERTIARY EDUCATION LIBRARIES

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

An In-Depth Look at In-Memory Predictive Analytics for Developers

Content creation remains important as ever. Lead generation is still important, but lead nurturing is growing

Judy Jeng, Ph. D. International Conference of Digital Archives and Digital Humanities Taipei, Taiwan December 1-2, 2009

Choosing an LMS FOR EMPLOYEE TRAINING

Library, Information and Technology Services

Monitoring the team s performance

DATA SCIENCE ADVISING NOTES David Wild - updated May 2015

Anne Karle-Zenith, Special Projects Librarian, University of Michigan University Library

Tools for Researchers

Transcription:

DATA SCIENTIST TRAINING FOR LIBRARIANS #DST4L C. Erdmann DST4L @ Designing Libraries IV @libcce

On the Same Page We started speaking the same language. A side conversation with a Harvard faculty member has developed into a partnership involving interactive data analysis in the classroom. In the faculty s own words, the library moved from being moribund to a must have service. Another Example: http://lorenabarba.com/blog/

Breaking Down Barriers I was afraid of lost connections or opportunities, that our patrons saw us as old fashioned, that we weren t seen as being tech or data savvy. Breaking down the library fortress - Christian Laursen

Experience the Research Data Lifecycle Just by having some idea, background of where our researchers are coming from, we can make a connection when the opportunity presents itself and build on it, develop better services. Jeffrey Heer

What are they saying? > Guidance on Tools & Workflows Sharing Scholarly Output Linking to Data Programming Skills (Training) Librarians Working w/ Scientists

Core Goals 1. Allow librarians to experience and understand the research data lifecycle. 2. Train librarians to have skills and practical knowledge to build and use data tools (OpenRefine). 3. Forward a culture within libraries that enables datacentric approaches (change our mindset). 4. Grow a community of data savvy librarians. <empower librarians>

Winning Recipe: Tools, Concepts & Context

#DST4L @ DTU http://podcast.llab.dtu.dk/feeds/dst4l-2015/

Take away message To build better services, librarians can help by engaging in communities, understanding their infrastructural needs, and to experiment with new scholarly objects from these communities.

The library perspective Introduce core goals of the course, provide additional context from the librarian perspective and reassure everyone (that this stuff is really hard to learn).

Free your metadata Much of the work in data science is spent on data wrangling. Using OpenRefine, you can extract data, clean, reconcile, link to data and explore the data.

Academia must learn from open source Importance of reproducibility (explain, repeat, learn). CERN & the EU s commitment to open access. Big data at CERN. Advance from 17th century publishing model. The benefits of GitHub w/ examples from CERN (by reducing friction). Using it in services such as Zenodo but changing how we do curation.

GitHub Fundamentals Getting started (w/ GitHub for Desktop and Atom), introduction to Git(Hub), creating repositories, Markdown 101, using issues, demonstrating workflows, making commits, pushing changes, branching and merging. How GitHub can be used in libraries and academic settings.

Visualization Demystifying visualization. Why visualization, basic guidelines, visualization in libraries, overview of techniques & methods, and tools to start with.

Evolving Scholarly Communication Familiarity with interactive notebooks (Jupyter/IPython), baby Python, text analysis, Naives Bayes and building a classifier (for automated curation). Demonstrate the use of online algorithm services (such as Algorithmia).

Feedback, Experiences & Next Steps

Blog/Student Perspective http://altbibl.io/dst4l/category/blog-posts/ Projects & Presentations http://altbibl.io/dst4l/category/data-stories/ Hackathons http://goo.gl/ly6dxu

DST4L Student Comment: http://www.youtube.com/watch? v=u5zym085bno&t=1m21s...very helpful, preparing for long career, going to see it more and more, will keep using skills set, like doing it, fun problem solving Library Connect Article: Data Scientist Training for Librarians: A course and a community

Find your data scientist training librarian & weave the training into your organizational fabric.

Training Astronomers

Thank you! Now let s get our hands dirty with data!