naming things prepared by Jenny Bryan for Reproducible Science Workshop

Similar documents
NaviCell Data Visualization Python API

Creating Codes with Spreadsheet Upload

Core Fittings C-Core and CD-Core Fittings

Notepads Import File Specification

Filemaker Go 12/13 pre-installed (available free of charge from the App Store)

CashManager OnLine Positive Pay Set-Up Information

PowerPoint Presentation to Accompany. Chapter 3. File Management. Copyright 2014 Pearson Educa=on, Inc. Publishing as Pren=ce Hall


Using the HFSD journal for deleted file recovery

Score Management Software User Guide Addendum

Demographic Batch Search (DBS)

SIX Trade Repository AG

Command Line - Part 1

Welcome Computer System Validation Training Delivered to FDA. ISPE Boston Area Chapter February 20, 2014

CHAD TILBURY.

CORPORATE RECORDS MANAGEMENT STANDARDS - STANDARD NAMING CONVENTIONS FOR ELECTRONIC FILES, FOLDERS AND RECORDS 1

Ad Hoc Reporting: Data Export

AAFCO Check Sample Program New Data Reporting Website Manual Date of Issue: March 1 st 2014

The following features have been added to FTK with this release:

Installing Hortonworks Sandbox 2.1 VirtualBox on Mac

Downloading & Using Data from the STORET Warehouse: An Exercise

1001ICT Introduction To Programming Lecture Notes

Kaseya 2. User Guide. Version 7.0. English

User s Guide for the Texas Assessment Management System

File Management Where did it go? Teachers College Summer Workshop

FTP Service Reference

VMware vcenter Log Insight User's Guide

Setting Up the TCP/IP Scanner

FTP Service Reference

How To Test The Bandwidth Meter For Hyperv On Windows V (Windows) On A Hyperv Server (Windows V2) On An Uniden V2 (Amd64) Or V2A (Windows 2

BRAIN ASYMMETRY. Outline

Package fimport. February 19, 2015

Creating Geospatial Metadata. Kim Durante Geo4Lib Camp

Naming Conventions for Digital Documents

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

A computer running Windows Vista or Mac OS X

Version Control with Svn, Git and git-svn. Kate Hedstrom ARSC, UAF

VMware vrealize Log Insight User's Guide

Nuix Forensic Focus 2014 Webinar Accelerating investigations using advanced ediscovery techniques 6 th March 2014

Sharing Files and Printers. Mac/PC Compatibility: QuickStart Guide for Business

Introduction to Mac OS X

Big Data and Analytics by Seema Acharya and Subhashini Chellappan Copyright 2015, WILEY INDIA PVT. LTD. Introduction to Pig

Norwex Office Suite: The Consultant Experience

Electronic Remittance Advice (ERA) Processor

Research Data Archival Guidelines

Programming Languages CIS 443

Guidelines for Data Collection & Data Entry

Rapid Game Development Using Cocos2D-JS

Parallels Virtualization SDK ReadMe CONTENTS:

THE ARCHIVAL SECTOR IN DW2.0 By W H Inmon

BusinessObjects XI Release 2. Separated Values (CSV) Exporting

DEBT COLLECTION SYSTEM ACCOUNT SUBMISSION FILE

Archiving Your Photo Collection I

Fast track to HTML & CSS 101 (Web Design)

Computer Forensic Capabilities

UNIX Operating Environment

Chapter 7. Using Hadoop Cluster and MapReduce

Version Control with Git. Kate Hedstrom ARSC, UAF

Applies to Version 6 Release 5 X12.6 Application Control Structure

CatDV Pro Workgroup Serve r

Using Oracle Data Integrator with Essbase, Planning and the Rest of the Oracle EPM Products

Apache Logs Viewer Manual

your Apple warranty; see There are two main failure modes for a mirrored RAID 1 set:

FREQUENTLY ASKED QUESTIONS ABOUT ELECTRONIC DATA TRANSMISSION

Xerox Standard Accounting Import/Export User Information Customer Tip

Identity Finder: Managing Your Results

Introduction to IBM SPSS Statistics

Dyalog for Mac OS Installation and Configuration Guide

MY WORLD GIS. Installation Instructions

MDT-ASD chip and mezz card naming conventions 28 August 2003

Electronic Data Transmission Guide For International Mailers

CS 2112 Lab: Version Control

ITP 101 Project 3 - Dreamweaver

April 2012 Setup, Configuration and Use. Document Version: Product Version: SAP Version: Localization:

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Mac OS X for Unix Geeks. Hit The Ground Running

Unix Scripts and Job Scheduling

FileMaker 12. ODBC and JDBC Guide

Best Practices for Document Naming Conventions

Payment Page Extensions. Online Payment Processing for Businesses Worldwide.

vmprof Documentation Release 0.1 Maciej Fijalkowski, Antonio Cuni, Sebastian Pawlus

Digital Forensics with Open Source Tools

Matching Service Adapter

Chapter 2 Text Processing with the Command Line Interface

There are many useful and free tools available for working with digital materials. Here are some I use...

A Basic Guide To Onsite SEO

SQL 2012 Installation Guide. Manually installing an SQL Server 2012 instance

Transcription:

naming things prepared by Jenny Bryan for Reproducible Science Workshop

Names matter

NO myabstract.docx Joe s Filenames Use Spaces and Punctuation.xlsx figure 1.png fig 2.png JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt YES 2014-06-08_abstract-for-sla.docx joes-filenames-are-getting-better.xlsx fig01_scatterplot-talk-length-vs-interest.png fig02_histogram-talk-attendance.png 1986-01-28_raw-data-from-challenger-o-rings.txt

three principles for (file) names machine readable human readable plays well with default ordering

awesome file names :)

machine readable regular expression and globbing friendly - avoid spaces, punctuation, accented characters, case sensitivity easy to compute on - deliberate use of delimiters

Excerpt of complete file listing: Example of globbing to narrow file listing: Jennifers-MacBook-Pro-3:2014-03-21 jenny$ ls *Plasmid* 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A02.csv 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A03.csv 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B01.csv... 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_H03.csv 2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_platefile.csv

Same using Mac OS Finder search facilities:

Same using R s ability to narrow file list by regex: > list.files(pattern = "Plasmid") %>% head [1] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A01.csv" [2] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A02.csv" [3] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_A03.csv" [4] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B01.csv" [5] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B02.csv" [6] "2013-06-26_BRAFWTNEGASSAY_Plasmid-Cellline-100-1MutantFraction_B03.csv"

Deliberate use of _ and - allows us to recover metadata from the filenames. > flist <- list.files(pattern = "Plasmid") %>% head > stringr::str_split_fixed(flist, "[_\\.]", 5) [,1] [,2] [,3] [,4] [,5] [1,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A01" "csv" [2,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A02" "csv" [3,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A03" "csv" [4,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B01" "csv" [5,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B02" "csv" [6,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B03" "csv" date assay sample set well This happens to be R but also possible in the shell, Python, etc.

> flist <- list.files(pattern = "Plasmid") %>% head > stringr::str_split_fixed(flist, "[_\\.]", 5) [,1] [,2] [,3] [,4] [,5] [1,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A01" "csv" [2,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A02" "csv" [3,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "A03" "csv" [4,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B01" "csv" [5,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B02" "csv" [6,] "2013-06-26" "BRAFWTNEGASSAY" "Plasmid-Cellline-100-1MutantFraction" "B03" "csv" _ underscore used to delimit units of meta-data I want later - hyphen used to delimit words so my eyes don t bleed

machine readable easy to search for files later easy to narrow file lists based on names easy to extract info from file names, e.g. by splitting new to regular expressions and globbing? be kind to yourself and avoid - spaces in file names - punctuation - accented characters - different files named foo and Foo

human readable name contains info on content connects to concept of a slug from semantic URLs

human readable Jennifers-MacBook-Pro-3:analysis jenny$ ls -1 01_marshal-data.md 01_marshal-data.r 02_pre-dea-filtering.md 02_pre-dea-filtering.r 03_dea-with-limma-voom.md 03_dea-with-limma-voom.r 04_explore-dea-results.md 04_explore-dea-results.r 90_limma-model-term-name-fiasco.md 90_limma-model-term-name-fiasco.r Makefile figure helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r tmp.txt 01.md 01.r 02.md 02.r 03.md 03.r 04.md 04.r 90.md 90.r Makefile figure helper01.r helper02.r helper03.r helper04.r tmp.txt Which set of file(name)s do you want at 3a.m. before a deadline?

human readable 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r embrace the slug 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r

human readable easy to figure out what the heck something is, based on its name

plays well with default ordering put something numeric first use the ISO 8601 standard for dates left pad other numbers with zeros

plays well with default ordering chronological order logical order 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r

plays well with default ordering 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r put something numeric first

plays well with default ordering use the ISO 8601 standard for dates YYYY-MM-DD

http://xkcd.com/1179/

Comprehensive map of all countries in the world that use the MMDDYYYY format https://twitter.com/donohoe/status/597876118688026624

left pad other numbers with zeros 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r if you don t left pad, you get this: 10_final-figs-for-publication.R 1_data-cleaning.R 2_fit-model.R which is just sad

plays well with default ordering put something numeric first use the ISO 8601 standard for dates left pad other numbers with zeros

three principles for (file) names machine readable human readable plays well with default ordering

three principles for (file) names easy to implement NOW payoffs accumulate as your skills evolve and projects get more complex

go forth and use awesome file names :) 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r helper01_load-counts.r helper02_load-exp-des.r helper03_load-focus-statinf.r helper04_extract-and-tidy.r