Data Management Exercises Exercise 1 Defining Post-Graduate Research Data 1. Discuss your research project and research data in groups of 3-4 Think about the following questions: What is your research topic and research location. What physical data will you work with, e.g. published reports, existing/own field docs, artefacts, bones, etc. What is the origin of your data, e.g. published material, physical archive held in museum, samples from.now in.. Museum in the UK, etc. What types of digital data you will derive from the physical data. e.g. text docs, scans, spreadsheets, etc. What types of data will you create digitally? Where will your data end up after the project? How do you look after your data? Any there any other issues for management and curation of your digital data? Risks? Ownership? Sharing? Ethical issues? 2. Fill in the form on the next page about your own research data.
Exercise 1 Defining Post-Graduate Research Data Name: Research Topic: Laboratory: Individual research or part of project/institution (e,g, museum) Research / Project Code (if any): Physical data: artefacts, samples, paper archives, etc. Data origin: Where is it from? Digitally captioned data: scans, spreadsheets, etc End point: Where the data (physical and digital) goes after the project? Digitally created data: survey data, images, analysis readouts, types of text docs., etc. End point: Where the digital data goes after the project? Looking after your data: 1. How do you organise your data? 2. How do you back-up your data? 3. Your most important data management issue? 1. 2. 3. Any other issues?
Exercise 2 Post-Graduate Research Projects: File Structure and Naming 1. Read the following notes on file structuring and naming. 2. Fill out the form on the form on the next page. A systematically organised file structure is really important. While you are the one who will be working with it every day and way you organise your data might be obvious to you, the sign of a logical file structure is if it is easily understandable to others who know nothing about the research project. Many people work closely with other researchers or as part of larger projects and a logical system helps to share and exchange data. Think carefully about a sensible file structure. For those working with GIS for example, the file structure needs to be consistent for maintaining the retrieval of files from the Geodatabase into ArcGIS. Data Hierarchies When deciding how to organise your research data it is useful to first decide what the primary data of the project is. In archaeology this often comes down to projects either being organised by: material in the widest sense, so everything from types of material culture to archaeological samples (bones, soils, genetic samples, etc). Or location where data are grouped by region or archaeological site. A third possible way would be to organise the data chronologically but often there is so much temporal overlap with sites or material spanning several periods it is difficult to create distinct sets of data. A useful tip is to define the end-product of the research project, that is: o The data that will comprise the project archive and what will be made public and shared with others in the future. o Try to keep this clean of temporary folders and files. o Bear in mind that you need to be able to distinguish between different projects you are working on, and in particular, distinguish between sub-folders, which might end up with same name as other folders in other projects. o It is important to acknowledge that research designs change and so must file structure. o Some people advise against the over use of folders as it takes forever to find files, though this might be easier said than done. File Naming File naming should be considered from the very outset of a project. Names tell us what a file is that is they contain contextual information about the file so we know what it is without having to open it. Names order files thus making stuff easy to find. The most important thing is to define your system - and stick to it. Useful Tips o Different data may require different naming conventions. o File names can contain contextual information for example on date, author, site, project, material. o Capitals in file names affect ordering be consistent. o Numbers order files only if zeros are used before units and tens: - 001, 002, 003, etc will order files up to 999. - Dates are useful for version control and ordering files. - YY-MM-DD (11-03-02) first in a filename orders files by date. - Name_YY-MM-DD orders files of the same name by date. o Spaces between file names cause havoc in GIS. Use_underscores instead. Slashes / in file names can cause problems too, for example for files that will be uploaded to a website. o CAPITALS ARE HARD TO READ! Version Control Being consistent with what you call files makes keeping track of which version is the most up to date and the current version of a document you are working on much easier. Particularly, when you get feedback from others in the form of a document with reviewer s comments or sections highlighted etc. Tips o Add a draft or version number to the file name and/or the date. o Initials in file names tell people who worked on the file last. o Another important thing is to clean out older drafts of the same data. o It is wise to keep older drafts until the final version is made but whether you want to keep old versions of files and data is debatable. You have to ask yourself: are you ever going to look at them again?
Exercise 2 Post-Graduate Research Projects: File Structure and Naming Researcher: Project Title: Project Duration: Project Context: Where is the research being carried out, and what is under study? 1. File Structure List the primary folders, and then summarise the organisation of their sub-folders. Does the file structure follow conventions from a host project, laboratory or institution? 2. File Naming Describe the logic behind the file naming system and give examples from different types of digital. Does the file naming follow conventions from a host project, laboratory or institution? If a coding or numbering system is used to name files, where will the explanation of this system be saved? Good Practice: Use underscores instead of spaces, write dates in numbers. If numbering files, consider how many potential files are needed: 001, 002, etc, will order files up to 999. DO NOT WRITE IN CAPITALS AS THIS IS HARD TO READ. Signed: Version: Date Created: Date Amended:
Exercise 3 Data Management Plan for Post-Graduate Research Projects 1. Complete a Data Management Plan using the form on the following page as a template. The form is organised so that it follows the data lifecycle model. Writing a full plan may take a few hours, but it will be a very useful few hours spent and save you many hours or even days of wasted time in the future. First you define what data will be studied, and how the data will be documented. You should describe the timetable for the data management tasks over the project. There is then a check box reminder to make sure that the details on how the data will be organised (see exercise 2) has been completed. The last three sections put in place plans for what will happen to the data at the end of the project. In particular they make clear arrangements for any ethical or legal issues relating to the data and to think about long term preservation and potential re-use of the data. The more detail you can put down the better, but at the same time try to keep things simple. What may seem logical and self-explanatory to you is often utterly confusing and bizarre to others. A key point to bear in mind is that somebody else, who knows nothing about your work, should be able to read it and be able to navigate around, understand and re-use the data in the future. Lastly, it is important to recognise that we will change the way we organise our data as our research progresses. It is recommended that Data Management Plans are reviewed every year and updated.
Researcher: Project Title: Project Duration Project Context: Exercise 3 Data Management Plan for Post-Graduate Research Projects 1. What Data will be Produced? What physical data will you study, and what digital data will be captured/derived from these? (field notes, images, measurements, spreadsheets, survey data, etc)? What data will be created digitally and what are the methods/standards for data creation? What file formats and software will you use and how many individual files you expect to make, what are the anticipated file sizes and total storage volume? 2. How will the Data be Documented and Described? What contextual information is required to make the data understandable to others? What standards will be used to record the data? What information on the data collection methods, standards, and context ( metadata ) will be recorded for each data type/set? Where will the metadata for each data type/set be located?
3. Has a File Structure/Naming Form been completed? (add separate document) Date Created: Date Amended: Version no. 4. Deposition of E-Thesis: delete as appropriate and state reasons: A. Intend to deposit e-thesis with.. with open access. B. Intend to deposit e-thesis with with a time-limited embargo on open access. C. Do not intend to deposit e-thesis. Give Reasons: NB. If you intend to deposit your thesis with a digital repository agreement must be sought with all concerned third-parties (museums etc), particularly for use of any copyright material. 5. What are the plans for data sharing and access after submission of the thesis? Who, if any, are the anticipated future users of any digital data / resources from the research? Will any of the digital data supporting the thesis be made available to others on request or open access? Are there any ethical issues that need to be taken into account? If so, what actions will safe guard these data? Are there any funding body / institutional requirements regarding re-use of, or open-access to, data? 6. What are the plans for long-term archiving of data supporting the thesis? Where will the digital data be archived? What arrangements are there to archive the digital data with a laboratory or institution? Will a copy of the digital data be archived with the physical data (in a laboratory / institution)? If no institutional archiving is possible, how will the data be safe guarded by the individual? Signed: Date Created: Version: Date Amended: