DATA SCIENTIST TRAINING FOR LIBRARIANS #DST4L. C. Erdmann Designing Libraries
|
|
|
- Mavis Clarke
- 10 years ago
- Views:
Transcription
1 DATA SCIENTIST TRAINING FOR LIBRARIANS #DST4L C. Erdmann Designing Libraries
2 On the Same Page We started speaking the same language. A side conversation with a Harvard faculty member has developed into a partnership involving interactive data analysis in the classroom. In the faculty s own words, the library moved from being moribund to a must have service. Another Example:
3 Breaking Down Barriers I was afraid of lost connections or opportunities, that our patrons saw us as old fashioned, that we weren t seen as being tech or data savvy. Breaking down the library fortress - Christian Laursen
4 Experience the Research Data Lifecycle Just by having some idea, background of where our researchers are coming from, we can make a connection when the opportunity presents itself and build on it, develop better services. Jeffrey Heer
5 What are they saying? > Guidance on Tools & Workflows Sharing Scholarly Output Linking to Data Programming Skills (Training) Librarians Working w/ Scientists
6
7 Core Goals 1. Allow librarians to experience and understand the research data lifecycle. 2. Train librarians to have skills and practical knowledge to build and use data tools (OpenRefine). 3. Forward a culture within libraries that enables datacentric approaches (change our mindset). 4. Grow a community of data savvy librarians. <empower librarians>
8 Winning Recipe: Tools, Concepts & Context
9 DTU
10 Take away message To build better services, librarians can help by engaging in communities, understanding their infrastructural needs, and to experiment with new scholarly objects from these communities.
11 The library perspective Introduce core goals of the course, provide additional context from the librarian perspective and reassure everyone (that this stuff is really hard to learn).
12 Free your metadata Much of the work in data science is spent on data wrangling. Using OpenRefine, you can extract data, clean, reconcile, link to data and explore the data.
13 Academia must learn from open source Importance of reproducibility (explain, repeat, learn). CERN & the EU s commitment to open access. Big data at CERN. Advance from 17th century publishing model. The benefits of GitHub w/ examples from CERN (by reducing friction). Using it in services such as Zenodo but changing how we do curation.
14 GitHub Fundamentals Getting started (w/ GitHub for Desktop and Atom), introduction to Git(Hub), creating repositories, Markdown 101, using issues, demonstrating workflows, making commits, pushing changes, branching and merging. How GitHub can be used in libraries and academic settings.
15 Visualization Demystifying visualization. Why visualization, basic guidelines, visualization in libraries, overview of techniques & methods, and tools to start with.
16 Evolving Scholarly Communication Familiarity with interactive notebooks (Jupyter/IPython), baby Python, text analysis, Naives Bayes and building a classifier (for automated curation). Demonstrate the use of online algorithm services (such as Algorithmia).
17 Feedback, Experiences & Next Steps
18 Blog/Student Perspective Projects & Presentations Hackathons
19 DST4L Student Comment: v=u5zym085bno&t=1m21s...very helpful, preparing for long career, going to see it more and more, will keep using skills set, like doing it, fun problem solving Library Connect Article: Data Scientist Training for Librarians: A course and a community
20
21
22 Find your data scientist training librarian & weave the training into your organizational fabric.
23 Training Astronomers
24 Thank you! Now let s get our hands dirty with data!
INDEX. Introduction Page 3. Methodology Page 4. Findings. Conclusion. Page 5. Page 10
FINDINGS 1 INDEX 1 2 3 4 Introduction Page 3 Methodology Page 4 Findings Page 5 Conclusion Page 10 INTRODUCTION Our 2016 Data Scientist report is a follow up to last year s effort. Our aim was to survey
Research Data Management Guide
Research Data Management Guide Research Data Management at Imperial WHAT IS RESEARCH DATA MANAGEMENT (RDM)? Research data management is the planning, organisation and preservation of the evidence that
The Libraries Role in Research Data Management: A Case Study from the University of Minnesota
13 September 2012 The Libraries Role in Research Data Management: A Case Study from the University of Minnesota Meghan Lafferty, Chemistry, Chemical Engineering, and Materials Science Librarian, and Lisa
State Library of North Carolina Library Services & Technology Act 2013-2014 Grant Projects Narrative Report
Grant Program: NC ECHO Digitization Grant State Library of North Carolina Library Services & Technology Act 2013-2014 Grant Projects Narrative Report Project Title: Content, Context, & Capacity: A Collaborative
MATLAB as a Collaboration Platform Marta Wilczkowiak Senior Applications Engineer MathWorks
MATLAB as a Collaboration Platform Marta Wilczkowiak Senior Applications Engineer MathWorks 2014 The MathWorks, Inc. 1 Agenda Use other people s code, apps and toolboxes Share your code with others Collaborate
The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics
The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that
Big Data for Science (and other) Students
Big Data for Science (and other) Students Randall Pruim Calvin College Questions to My Colleagues 1. What is the largest data set that students encounter in your respective majors? 2. In the context of
Igniting young minds through computer programming
Igniting young minds through computer programming igniting young minds W riting computer programs is a challenging, yet extremely satisfying personal experience that develops essential skills in logic,
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
University Libraries Strategic Plan 2015
University Libraries Strategic Plan 2015 Our Mission: The University Libraries are a dynamic partner with our users in the research, discovery, and creation of knowledge, where state-of-the-art technology
Library Strategic Planning
Library Strategic Planning Develop campus partnerships to collect, manage, share, and preserve Georgia Tech digital research data 1. Needs Assessment Research data survey & interviews NSF DMP content analysis
Digital Public Library of America (DPLA)
Digital Public Library of America (DPLA) Front End Design and Implementation Request For Proposal Summary The Digital Public Library of America (DPLA) seeks a skilled interactive agency to design and develop
Great Expectations: How Digital Project Planning Fosters Collaboration between Academic Libraries and External Entities
Ms. Shelley Barba Metadata Librarian Library Great Expectations: How Digital Project Planning Fosters Collaboration between Academic Libraries and External Entities This article looks at how project management
COPO: Collaborative Open Plant Omics. Rob Davey Data Infrastructure and Algorithms Group Leader [email protected].
: Collaborative Open Plant Omics Rob Davey Data Infrastructure and Algorithms Group Leader [email protected] @froggleston Toni Etuk Felix Shaw Acknowledgements Oxford eresearch Centre Susanna Sansone
Using GitHub for Rally Apps (Mac Version)
Using GitHub for Rally Apps (Mac Version) SOURCE DOCUMENT (must have a rallydev.com email address to access and edit) Introduction Rally has a working relationship with GitHub to enable customer collaboration
Digital Marketplace - G-Cloud
Digital Marketplace - G-Cloud SharePoint Services Core offer 22 services in this area: 1. SharePoint Forms SharePoint comes with out-of-the-box web-based forms that allow for data to be captured for your
Packrat: A Dependency Management System for R
Packrat: A Dependency Management System for R J.J. Allaire June 27, 2014 3/23 Reproducible Research Foundational as a basis for scientific claims "The goal of reproducible research is to tie specific instructions
DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
Software Configuration Management Best Practices
White Paper AccuRev Software Configuration Management Best Practices Table of Contents page Executive Summary...2 Introduction...2 Best Practice 1: Use Change Packages to Integrate with Issue Tracking...2
Talend Metadata Manager. Reduce Risk and Friction in your Information Supply Chain
Talend Metadata Manager Reduce Risk and Friction in your Information Supply Chain Talend Metadata Manager Talend Metadata Manager provides a comprehensive set of capabilities for all facets of metadata
Engineering Technical Practices Management at BP
INSIGHT# 2009-12ALM FEBRUARY 26, 2009 Engineering Technical Practices Management at BP By Clint Reiser Keywords Technical Practices, Document Management, AIM, Documentum Overview Asset Information Management
Copyright 2005-2010 Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement
NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
Reconciliation Best Practice
Adra Match Ltd Reconciliation Best Practice Technical whitepaper Bjørn Loe managing director London, June 2010 Introduction This whitepaper highlights some of the challenges with transaction reconciliation
Core Competencies for Visual Resources Management
Core Competencies for Visual Resources Management Introduction An IMLS Funded Research Project at the University at Albany, SUNY Hemalata Iyer, Associate Professor, University at Albany Visual resources
CPSC 491. Today: Source code control. Source Code (Version) Control. Exercise: g., no git, subversion, cvs, etc.)
Today: Source code control CPSC 491 Source Code (Version) Control Exercise: 1. Pretend like you don t have a version control system (e. g., no git, subversion, cvs, etc.) 2. How would you manage your source
Software Continuous Integration & Delivery
November 2013 Daitan White Paper Software Continuous Integration & Delivery INCREASING YOUR SOFTWARE DEVELOPMENT PROCESS AGILITY Highly Reliable Software Development Services http://www.daitangroup.com
RESEARCH DATA MANAGEMENT POLICY
Document Title Version 1.1 Document Review Date March 2016 Document Owner Revision Timetable / Process RESEARCH DATA MANAGEMENT POLICY RESEARCH DATA MANAGEMENT POLICY Director of the Research Office Regular
Cypress Bay High School Broward County Public Schools Weston, Florida
ABC-CLIO S ONLINE HISTORY DATABASES SUPPORT SOCIAL STUDIES INSTRUCTION AT CYPRESS BAY HIGH SCHOOL IN BROWARD COUNTY, FLORIDA Cypress Bay High School Broward County Public Schools Weston, Florida A (2008):
THE HELMHOLTZ INVENIO REPOSITORY PROJECT :
THE HELMHOLTZ INVENIO REPOSITORY PROJECT : BETWEEN ORGANIZATIONAL TASKS, NEEDS OF SCIENTISTS, AND CONSTRAINTS 1 Structure of the presentation Libraries of DESY, FZJ and GSI in the Helmholtz Association
Pitfalls and Best Practices in Role Engineering
Bay31 Role Designer in Practice Series Pitfalls and Best Practices in Role Engineering Abstract: Role Based Access Control (RBAC) and role management are a proven and efficient way to manage user permissions.
Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP)
Project Title: Total Project Budget: Environment and Natural Resources Trust Fund 2016 Request for Proposals (RFP) Scientific Asset Management: Digital Preservation for Future Generations Category: Proposed
Data quality Vision at SBBr Danny Vélez
Data quality Vision at SBBr Danny Vélez 4th workshop SiBBr: data quality and ecological data 25-29 August 2014 SiBBr: national level Community Universities NGO s Government agencies Research centers Citizens
Middlesex Community College Library Strategic Plan 2013-18
Middlesex Community College Library Strategic Plan 2013-18 Table of Contents Introduction: Page 2 Goal 1: Instruction Page 3 Goal 2 Services Page 4 Goal 3 Staffing Page 5 Goal 4 Facilities Page 6 Goal
KNIME Enterprise server usage and global deployment at NIBR
KNIME Enterprise server usage and global deployment at NIBR Gregory Landrum, Ph.D. NIBR Informatics Novartis Institutes for BioMedical Research, Basel 8 th KNIME Users Group Meeting Berlin, 26 February
Document Management. Document Management for the Agile Enterprise. AuraTech Pte Ltd
Document Management Document Management for the Agile Enterprise AuraTech Pte Ltd 30 Robinson Road, #04-01B Robinson Towers, Singapore 048546 http://www.consultaura.com PH: 6224 9238 Try it! Call AuraTech
Data Is Integral To Our Culture
Data Is Integral To Our Culture In the news On the streets Progress Update Recommendation #3 of ITSM Top 5 Service Recommendations from February 2013 Recommendation #3 summary Expand the Enterprise Data
Entering its Third Century
the University Library Entering its Third Century SEPTEMBER 2015 A long with their universities, the best academic libraries constantly adapt to the forces reshaping research, teaching, and learning. This
Strategic Agenda for Library-based Research Data Support Services
Strategic Agenda for Library-based Research Data Support Services Background The Lamar Soutter Library (LSL) provides the University of Massachusetts Medical School (UMMS) research community with the relevant
How To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, [email protected] Jim Pruyne, University of Chicago Computation Institute, [email protected]
Informatica for Tableau Best Practices to Derive Maximum Value
for Best Practices Guide Informatica for Tableau Best Practices to Derive Maximum Value What is Informatica for Tableau Are you struggling to get the most out of Tableau because you need to pull, combine,
When companies purchase an integrated learning
Feature 2. Project team members are required to perform their regular responsibilities in addition to committing their time to the implementation. Organizations can overcome these challenges if they find
Web Application Development for Scientific Data
Web Application Development for Scientific Data Nathaniel Bennett Rebekah David Emily Watson Computer Science The University of North Carolina at Asheville One University Heights Asheville, North Carolina
Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools
Development at the Speed and Scale of Google Ashish Kumar Engineering Tools The Challenge Speed and Scale of Google More than 5000 developers in more than 40 offices More than 2000 projects under active
Simplifying the audit through innovation. PwC Accounting Symposium August 7, 2015. PwC Audit Transformation 19
Simplifying the audit through innovation PwC Accounting Symposium August 7, 2015 PwC Audit Transformation 19 Simplifying the audit through innovation A closer look What s driving our transformation Globalization
Appendix A. Functional Requirements: Document Management
Appendix A. Functional Requirements: Document Management Document Management technology helps organizations better manage the creation, revision, and approval of electronic documents. It provides key features
In depth study - Dev teams tooling
In depth study - Dev teams tooling Max Åberg mat09mab@ Jacob Burenstam Linder ada09jbu@ Desired feedback Structure of paper Problem description Inconsistencies git story explanation 1 Introduction Hypotheses
Action Plan towards Open Access to Publications
OOAAction Plan Draft Version, 17.01.2013 Action Plan towards Open Access to Publications endorsed during the 2 nd Annual Global Meeting, 27 29 May 2013, Berlin / Germany Assuming that providing research
FUTURE OF THE LIBRARY AND INFORMATION SCIENCE PROFESSION: TERTIARY EDUCATION LIBRARIES
FUTURE OF THE LIBRARY AND INFORMATION SCIENCE PROFESSION: TERTIARY EDUCATION LIBRARIES TERTIARY EDUCATION LIBRARIES Tertiary covers both university and vocational education. However, they are very different
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
An In-Depth Look at In-Memory Predictive Analytics for Developers
September 9 11, 2013 Anaheim, California An In-Depth Look at In-Memory Predictive Analytics for Developers Philip Mugglestone SAP Learning Points Understand the SAP HANA Predictive Analysis library (PAL)
Content creation remains important as ever. Lead generation is still important, but lead nurturing is growing
Introduction As consumers flock to the internet, the marketing industry continues to evolve and expand. As this happens, traditional models of marketing begin to lose efficiency and new types of marketing
Judy Jeng, Ph. D. International Conference of Digital Archives and Digital Humanities Taipei, Taiwan December 1-2, 2009
Judy Jeng, Ph. D. International Conference of Digital Archives and Digital Humanities Taipei, Taiwan December 1-2, 2009 Outline of Presentation New Jersey Digital Highway NJVid 2 New Jersey Digital Highway
Choosing an LMS FOR EMPLOYEE TRAINING
Choosing an LMS FOR EMPLOYEE TRAINING As organizations grow it becomes more challenging to scale your internal learning culture. You must be certain that your staff is trained in the entire organizational
Library, Information and Technology Services
Library, Information and Technology Services Academic Quality and Engagement [I_AQE] UNIT PRIORITY Support Scholarship & Creative Activity [LITS.01] DIVISIONAL PRIORITY Strengthen scholarship [AA.06] Support
Monitoring the team s performance
Monitoring the team s performance Why does your team need to be monitored? How can performance be monitored? You should ensure that you monitor only what is really important. In the two BS2 sessions Making
DATA SCIENCE ADVISING NOTES David Wild - updated May 2015
DATA SCIENCE ADVISING NOTES David Wild - updated May 2015 GENERAL NOTES Lots of information can be found on the website at http://datascience.soic.indiana.edu. Dr David Wild, Data Science Graduate Program
Anne Karle-Zenith, Special Projects Librarian, University of Michigan University Library
Google Book Search and The University of Michigan Anne Karle-Zenith, Special Projects Librarian, University of Michigan University Library 818 Harlan Hatcher Graduate Library Ann Arbor, MI 48103-1205 Email:
Tools for Researchers
Tools for Researchers Microsoft Research & the Scholarly Information Ecosystem Lee Dirks & Alex Wade Directors, Education & Scholarly Communication Microsoft External Research Microsoft Corporation Microsoft
