USGS - science for a changing world

Package Mechanics

This lesson provides definitions and examples of the structural components of an R-package, and the minimum setup requirements.

Lesson Objectives

  1. List the structural components of an R-package.
  2. Understand package dependency trees.
  3. Be familiar with different ways data can be included in packages.
  4. Correctly define what licenses and disclaimers are needed for USGS software.
  5. Apply the build and check features to a package.
  6. Define internal functions and know their benefits.

Minimum Package Requirements

There are many excellent resources on how to create an R package. The official documentation is Writing R Extensions. This is the official guide, and R-package developers should absolutely refer back to this page for reference. A common companion to that is Hadley’s Writing R packages. It is a very complete guide to the mechanics of R-packages. This section provides the briefest of introductions, but enough to get you started.

Package Skeleton

Let’s build a bare-bones package from scratch, defining each section as we go. We are using RStudio as our working environment, so let’s first create a new Project called “demoPackage”. Note we could do all of this by hand as well.

Step 1: Open New Project:

New Project

Step 2: Choose New Package option:

New Package

Step 3: Name your package:

Name Package

Step 4: Discover your package!

Minimum Package Requirements

Minimum Required Content

There are just a few files and folders created in the above process. These are the absolute minimum required to build a package. As this workshop goes on, we will introduce more options for directories and files.

R folder

Any R code that will be executed goes in the “R” folder. You can save one or many files (with a .R extension) that will run when the package is loaded. By using RStudio’s package wizard, they created a file “hello.R”, which created a “Hello World” function.

man folder

The “man” folder contains the files used to create each function’s “help” file (the documentation). We will discuss the requirements at length in the Documentation section. Assuming you use the roxygen2 package, you will not need to create these files by hand, they will be generated by information in the R folder.

DESCRIPTION file

The DESCRIPTION file contains basic information on the package. Do not leave the fields with their defaults. See also here for more information.

One thing to remember…we will be using roxygen2 for most of our documentation work. roxgyen2 will not update the DESCRIPTION file. If you add a new package dependency….you must manually add that to this file, either in the “Depends”, “Imports”, or “Suggests” field. More below

NAMESPACE file

The NAMESPACE file shows what functions and methods are exported and imported. Assuming we use the roxygen2 package, this file should not need to be changed by hand. However, it is a very important file for R packages. It is also a somewhat difficult subject to explain in a simple way. See here for a detailed discussion.

Build, Check, Share

RStudio created a package skeleton. Let’s build the package to see if it builds correctly. Then, we can check it, to see if it passes all of the R-package requirements.

Step 1: Build:

Build Package

Our demoPackage built! This screenshot used RStudio’s “Build” tab. There are other ways to build a package, but during this workshop, we will focus on the tools embedded in RStudio.

Step 2: Check:

Check Package

The initial “Check” shows 0 errors (great!), 1 warning (need to fix!), and 0 notes (great!). The warning’s message is pretty informative (Non-standard license specification What license is it under). So, we need to update our DESCRIPTION file with the license information. See below for more discussion on licenses.

Step 3: Share:

Let’s say you’ve fixed all the errors, warnings, and notes on your package, and now you’d like to share it with a few people. You can build the package “source” file. Here is how you build the source:

Create Source

The output in the “Build” tab window is:

==> devtools::build()

"C:/PROGRA~1/R/R-34~1.0PA/bin/x64/R" --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD build "D:\LADData\RCode\demoPackage"  \
  --no-resave-data --no-manual 

* checking for file 'D:\LADData\RCode\demoPackage/DESCRIPTION' ... OK
* preparing 'demoPackage':
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
* building 'demoPackage_0.1.0.tar.gz'

[1] "D:/LADData/RCode/demoPackage_0.1.0.tar.gz"

So, now I can take my file “demoPackage_0.1.0.tar.gz” and share it. Maybe to start, I want to just email it to a co-worker, and have them install it. If you can send them that file, and they put it in their working directory, they can install like this:

install.packages("demoPackage_0.1.0.tar.gz", repos = NULL, type = "source")

If you are a Windows developer sending to a Windows developer, or Mac sending to Mac, you could choose the “Build Binary Package” option. To avoid confusion, assuming you have a package that doesn’t rely on compiled code, choosing “source” makes it easier to handle cross-operating-system collaborations.

Then to run the package:

library(demoPackage)
hello()
[1] "Hello, world!"

We will discuss other methods to share your package, including Github, GRAN, and CRAN.

More Sub-Directories

Aside from the “R” and “man” folders, there are others that can be included in a package. Here is a brief introduction to those folders. Some will be discussed further in this course. Others can be explored further from online references.

Table 1. Possible Folders in an R-Package
Folder Description
R Collection of R code
data Example data sets saved in R binary format
demo R scripts to show example workflows. These are losing favor over vignettes
exec Executable scripts
inst Installed files, basically, any file that you want to include in it’s original format can be stored here
man Documentation/help files
po Translations for R- and C-level error and warning messages
src Compiled Code such as C, C++, or Fortran
tests testthat code testing files
tools Auxiliary files needed during configuration
vignettes Files to create the detailed vignette files/user guides

More Files

Aside from the “DESCRIPTION” and “NAMESPACE” files, there are others that can be included in a package. Here is a brief introduction to those files Some will be discussed further in this course. Others can be explored further from online references. This list is not exhaustive. There are many other files that can be seen on R-Package github repositories. Some that we may explore in this workshop for example, “.travis.yml”, “appveyor.yml”, “codecov.yml” are used to configure continuous integration services.

Table 2. Possible Files in an R-Package
Files Description
DESCRIPTION Basic information on the package
NAMESPACE Exported and imported functions and methods
NEWS Information on package changes/updates
README Useful information to package users, but ignored by R
CITATION Official information on how to cite the package
LICENSE Specific information on licencing
CONTRIBUTING Information on how to contribute to the package
.Rbuildignore List of files to ignore when doing the package build

Data

There are several ways to add data to your package. See here for more information. We will describe 3 methods here:

inst/extdata subdirectory

You can put data in a folder in the “inst/extdata” folder. This data will be available to the user in it’s original format. For example, in the dataRetrieval package, we include example data that shows various data formats. This data can be accessed as follows:

library(dataRetrieval)

path_to_data <- system.file("extdata", "RDB1Example.txt", package = "dataRetrieval")

path_to_data
## [1] "C:/Users/lcarr/Documents/R/R-3.3.2/library/dataRetrieval/extdata/RDB1Example.txt"

This data is not automatically exported in the package. For dataRetrieval, we could open the data file by using dataRetrieval parsing functions:

rdb_data <- importRDB1(path_to_data)

data subdirectory

If the raw file format is not important, data sets can be included in the “data” subdirectory. This data needs to be saved as either “.RData” or “.rda” (both R binary extensions). The EGRET package offers some example data sets in the “data” folder: Choptank_eList and Arkansas_eList. The data still needs to be exported (this will be discussed later). See here for the EGRET example.

If you have LazyData: true in the DESCRIPTION file, the data can be accessed easily:

library(EGRET)
names(Choptank_eList)
## [1] "INFO"     "Daily"    "Sample"   "surfaces"

A minor detail…this data will not appear in your Environment until you specifically load it:

Choptank_eList <- Choptank_eList

EGRET data

sysdata.rda

If data is to be used within the R code (for example, a look-up table), it can go in a R binary file called “sysdata.rda” that is placed within the “R” subdirectory. By default, this data is available for functions within your package, but not exposed.

To expose the sysdata, see the example in dataRetrieval. The important lines in the roxygen2 code (much more to be discussed on that in the Documentation Lesson):

#' @docType data
#' @export stateCd
NULL

This adds the line export(stateCd) to the NAMESPACE file.

Depending on other packages

There are many good reasons for your package to use functions from other packages. These are called dependencies.

Use care, however, to decide if the package is stable and actively maintained. If you depend on packages that change their behavior or results, your package will also start to change it’s behavior.

When you do require a package, it’s best to only import the specific functions that you need. So, rather than depending on the entire dataRetrieval package, only depend on the functions within dataRetrieval that you need. Let’s create a throwaway function in our demoPackage that gets daily discharge values from a site, and then uses dplyr to calculate daily summaries.

Here’s the function if we were to write it as a script:

library(dataRetrieval)
library(dplyr)

summary_dv <- function(site){
  dv_data <- readNWISdv(siteNumbers=site, 
                        parameterCd = "00060", startDate = "", endDate = "")

  dv_summ <- renameNWISColumns(dv_data)
  dv_summ <- mutate(dv_summ, doy = as.numeric(strftime(Date, format = "%j")))
  dv_summ <- group_by(dv_summ, doy)
  dv_summ <- summarise(dv_summ,
                       max = max(Flow, na.rm = TRUE),
                       min = min(Flow, na.rm = TRUE))
  return(dv_summ)

}

summary_dv("08279500")

To get that into our demoPackage, first we need to add dataRetrieval and dplyr to the “Imports” section. It is best-practice to add them to “Imports” rather than “Depends” because the packages will only be imported when needed, rather than fully loaded. So, first, let’s update the DESCRIPTION file to this:

Package: demoPackage
Type: Package
Title: Throwaway demo package
Version: 0.1.0
Authors@R: c(person("Laura", "DeCicco", role = c("aut","cre"),
    email = "ldecicco@usgs.gov"))
Description: More about what it does (maybe more than one line)
    Use four spaces when indenting paragraphs within the Description.
License: CC0
Encoding: UTF-8
LazyData: true
Imports: 
  dataRetrieval,
  dplyr

Now, we need to add some roxygen2 info to the function (this will be described in detail in the Documentation section). So, save this to a file in the “R” folder in the demoPackage project:

#' summary_dv
#'
#' Get max and min of discharge for day of the year.
#'
#' @export
#' @param site character USGS site ID
#' @importFrom dataRetrieval readNWISdv
#' @importFrom dataRetrieval renameNWISColumns
#' @importFrom dplyr mutate
#' @importFrom dplyr group_by
#' @importFrom dplyr summarise
#' @examples
#' site <- "08279500"
#' sum_dv <- summary_dv(site)
summary_dv <- function(site){
  Date <- doy <- Flow <- ".dplyr.var"

  dv_data <- readNWISdv(siteNumbers=site,
                        parameterCd = "00060", startDate = "", endDate = "")

  dv_summ <- renameNWISColumns(dv_data)
  dv_summ <- mutate(dv_summ, doy = as.numeric(strftime(Date, format = "%j")))
  dv_summ <- group_by(dv_summ, doy)
  dv_summ <- summarise(dv_summ,
                       max = max(Flow, na.rm = TRUE),
                       min = min(Flow, na.rm = TRUE))
  return(dv_summ)

}

Your first question might be, what is this line?

Date <- doy <- Flow <- ".dplyr.var"

dplyr uses “Non-Standard Evaluation” (NSE) so that we can refer to columns in a data frame without surrounding them in quotes. On the package Check though, these NSE names get flagged with notes. Creating these “dummy” variables eliminates the Note. See more here for details. There are also ways within the dplyr package to perform Standard Evaluations.

I also deleted the hello function, and it’s corresponding .Rd file in the “man” folder. We could keep it…but now that we’re depending on roxygen2 to create the documentation, we’d need to add the mandatory header information to that function. Since this is just a throw-away package, I just deleted it. Also, to get roxygen2 to build the NAMESPACE file, I had to delete the originally created file. This is a precautionary measure from roxygen2 to not overwrite anything that doesn’t have a header “# Generated by roxygen2: do not edit by hand”.

When we build the package, if we have roxygen2 working properly, our NAMESPACE file should now look like this:

# Generated by roxygen2: do not edit by hand

export(summary_dv)
importFrom(dataRetrieval,readNWISdv)
importFrom(dataRetrieval,renameNWISColumns)
importFrom(dplyr,group_by)
importFrom(dplyr,mutate)
importFrom(dplyr,summarise)

Run the Check to make sure there are still no error or warnings.

We can load the updates to the package, and see our in-development help files:

library(demoPackage)
?summary_dv

#Try out the example:
site <- "08279500"
sum_dv <- summary_dv(site)

Finally, be aware there is a 3rd level of package dependency. We could add packages to the “Suggests” field in the DESCRIPTION file. These are packages that might be used in the vignette or help files, or packages such as knitr or testthat that are not directly used within the package itself, but needed to build.

License and Disclaimers

It is important to understand the policies of distributing code as a federal US employee. Please check for updates to any policy before releasing software. We do not guarantee that the following information will always be up-to-date and correct. See USGS-manual, DOI, and software release for the most official guidance on licenses and disclaimers.

Technically, US federal employees should not have a license associated with their code (completely open, unrestricted code). R requires a license, and we have been told then that it’s OK to use the CC0 license:

CC0

USGS employees should also be adding a Disclaimer to their README:

Disclaimer: This software is in the public domain because it contains materials that originally came from the U.S. Geological Survey, an agency of the United States Department of Interior. For more information, see the official USGS copyright policy

Although this software program has been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.

This software is provided "AS IS."

We also require a message on packages that have not been officially released. Adding this to any file in the R folder will cause a message to be displayed when a user opens the library:

.onAttach <- function(libname, pkgname) {
  packageStartupMessage("Although this software program has been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.")
}

If this was added to our demoPackage, we would see this on load:

library(demoPackage)
Although this software program has been used by the U.S. Geological Survey (USGS), no warranty, expressed or implied, is made by the USGS or the U.S. Government as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed by the USGS in connection therewith.

Other useful resources

Laura DeCicco