USGS - science for a changing world

Getting Started with USGS R Packages

USGS R Packages: Collaborative and reproducible data analysis using R

As free, open-source software, the statistical programming language R has a growing community of developers and users for data analysis and visualizations. Getting started with R can be challenging, but there are many online resources for learning the basics of the language. Many users can accomplish simple analyses through self-taught skills, or learn R in a more formal setting. R users can also use online forums to find solutions for recurrent errors and help with commonly used R packages. However, there are many R packages built by and for USGS purposes that are more specific to the USGS. The user community for these packages is small, so there are not as many online resources available, nor online help forums able to offer assistance. With the creation of this online course, there will be canonical online resources for learning USGS packages, while simultaneously building a larger user community.

The common workflow for completing the data processing pipeline is subject to human error at every step: accessing data, analyzing data, and producing final figures. Multi-site analyses are especially error-prone because the same workflow needs to be repeated many times. This course teaches a modular approach to the common data analysis workflow by building on basic R data analysis skills and leveraging existing USGS R packages that can create advanced, reproducible workflows, such as for accessing gridded climate data, analyzing high frequency water observations, and for taking full advantage of the USGS ScienceBase repository. The USGS packages covered in this course span a variety of applications: accessing web data, accessing personally stored data, and releasing data for publication.

The modular workflows taught in this course will prepare researchers to create automated, robust data processing workflows through more efficient code development. Following the course, students will be capable of integrating these packages into their own scientific workflows.

Suggested prerequisite knowledge

This course assumes a moderate to advanced knowledge of the statistical programming language R. If you’re interested in using USGS packages for data analysis but have no R experience, please visit the Introduction to R curriculum available at this site.

  1. Experience using R to import, view, and summarize data
  2. Recommended: experience creating simple plots in R
  3. Recommended: familiarity with RStudio

Course outline

Summary of Modules
Module Description Duration
dataRetrieval Accessing time series data. 2 hours
geoknife Accessing gridded data. 1 hour
sbtools Interacting with ScienceBase to access data, add data to ScienceBase, or release data from R output for a data release. 1.5 hour
Application Use the packages introduced in previous modules to create and use a robust modular workflow. 1.5 hour

Software requirements

See the R installation instructions page for how to install/upgrade R and RStudio, add GRAN to your settings, and install some basic packages. Then, execute these lines so that you have the most up-to-date version of the packages used in this course.

install.packages(c('dataRetrieval', 'geoknife', 'sbtools'))

Lindsay R. Carr