ThesisIsCode

Your Thesis is Code

Code is an increasingly important part of research. Whether R or MATLAB “snippets,” integrated documents using Jupyter or RMarkdown, or more complicated workflows using research databases, instrumental measurements, and databases.

This workshop is focused on early career researchers who are working on projects and interested in improving their skills and learning new techniques. Participants will register for GitHub accounts, set up a repository, and learn how to write journal-quality documents that include all the code required to download data, build statistical tests, and publication-quality plots.

Participants will be introduced to concepts such as test-driven development and continuous integration to produce research-quality code. This workshop also includes an element meant to help build an Earth Science code cookbook. Participants will survey blog posts, code repositories, and other online resources to discuss strategies for credit and discoverability of the code they and others produce.

Before the Workshop

Set up online accounts

We will be using Github as our primary platform for tracking our code. Github is a free platform owned by Microsoft. Github is only one of a number of platforms available to host code online. You are welcome to use a different platform, but we cannot offer support during the workshop for your specific platform.

We will be using ORCID as a tool to provide you with a unique personal identifier as a researcher. ORCID provides a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. ORCID also provides tools for tracking your publications, grants and awards, and tools for linking to and discovering similar products by other researchers.

  1. Create a GitHub account (note: Register your account as a GitHub Student account to obtain extra benefits)
  2. Create an ORCID account because you should!

Install Software

We will be using several pieces of software in this workshop. These are widely used software packages that will serve you for some time to come.

  1. R is a free statistical software package that is supported by all platforms (Windows, Mac and Linux). If you already have R, please ensure your version of R is above v4.0.
  2. RStudio is an Interactive Development Environment (IDE) that is widely used and includes a number of tools to help with scripting.
  3. git is a widely used version control software package that keeps track of changes you make in any kind of file, across multiple projects. NOTE: Most installations only need the default settings.
  4. pandoc is a piece of software that helps users change document formats between Markdown, HTML, PDF, docx and others. Pandoc also supports embedded citations, graphics and other stuff that’s cool.

Install R Packages

Open up RStudio and, in the console, enter the following:

install.packages(c('revealjs', 'assertthat', 'jsonlite'))

Check that everything works

Testing knit

Once you’ve installed your programs, navigate to this RMarkdown document and right-click to Save As. Save it somewhere you’ll find it, and then open that file in RStudio. Once it’s open, click the knit button on the toolbar:

Successfully knitting the Rmd file to an HTML output will let us know that your installation of R, RStudio and pandoc are working.

It Didn’t Work: Sometimes the knit button doesn’t appear if the file is saved with the wrong file extension. For example, Windows likes adding .txt to the end of text files. If this is the case, rename the file to make sure it has a .Rmd extension.

Testing git

To test whether or not git is working, open up a terminal. Linux and MacOS users should open their Dock and type in terminal. Windows users can open up the Windows Menu (windows key-R) and type in cmd. Once that’s done, type git --version. You should get a result that gives you the current version of git that you’re running.

If you’re having problems, feel free to ask for help in the Thesis Is Software Slack channel. This is a public channel. If you aren’t a member yet, please feel free to join.

During The Workshop

Please note that we will be following both the Code of Conduct for this repository, as well as the Geological Society of America’s RISE Slide.

Presentation Order

Time (PST) Talks
10:00am Land acknowledgment, Introductions
10:05am Introduce Throughput
10:15am Why build your thesis as software? (Why Software?)
10:45am An introduction to GitHub (Slides)
11:45am Short Nature break
12:00pm Git workflows & gitignore (Slides)
12:15pm Active work (http://bit.ly/githubrepos)
12:45pm Summarize & Questions
1:00pm Lunch-ish Break
1:30pm Moving to Markdown (Your Thesis is Code)
3:00pm Another Short Nature Break
3:05pm Tips & Tricks (Tips Document)
3:45pm Questions & Stuff
4:00pm End of Workshop, thanks everyone!

Acknowledgements

Funding

NSF-1928366

Contributors

This project is an open project, and contributions are welcome from any individual. All contributors to this project are bound by a code of conduct. Please review and follow this code of conduct as part of your contribution.