This is more of a general tip for reproducibility. For example, we can reference directories at the same level as the working directory by getting the parent directory
parent <- dirname(getwd())
dataPath <- file.path(parent, "data")
and then using file.path
to create the file path. Use file.path
instead of paste
to write platform indepedent code. Please see the examples below.
For csv files, we need to set mode = "wb"
to tell R to write a binary file. Otherwise download.file
adds an extra space between records.
# EIA residential energy consumption survey (RECS) 27 MB
download.file(url = "http://www.eia.gov/consumption/residential/data/2009/csv/recs2009_public.csv",
mode = "wb",
destfile = file.path(dataPath, "recs2009.csv"))
To download zip files, we don’t need any additional arguments for download.file
.
# Porto taxi cab data, 520 MB zip file
download.file(url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00339/train.csv.zip",
destfile = file.path(dataPath, "train.zip"))
We’ll unzip the file for demonstration, but this isn’t necessary, because R’s read functions accept zipped files. By default, unzip
does not remove the original zip file.
setwd(dataPath)
unzip("train.zip")
Downloading gz files is the same as for zip files.
# weather data, 206 MB gz file
download.file(url = "http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2010.csv.gz",
mode = "wb",
destfile=file.path(dataPath, "2010.csv.gz"))
To unzip gz files, you need the R.utils
package, though as for zip files, this usually isn’t necessary. unzip
removes the original gz file by default, but you can set remove = FALSE
to keep the gz file.
ifelse(!require(R.utils), install.packages("R.utils"), "already installed")
library(R.utils)
setwd(dataPath)
gunzip("2010.csv.gz", remove = FALSE)