started 2017-11-03 and updated 2023-05-26
I keep a database of high frequency flow and concentration data, which my colleagues and I have used to calculate uncertainties on annual nutrient and material loads1–8. Until now, I had found very practical to keep the data in hydrological years, which for the Northern hemisphere and in temperate climates starts around the first of september and ends late August the next year. I thus kept in the filenames, e.g., “99-00” to show that the data contained were obtained between Sep 01, 1999 and Aug 31, 2000.
I have recently worked with daily rainfall data from Oxford, U.K., dating back from 1828. The two digit system has thus become obsolete… So I decided to change everything back to four digits for years. Two problems arose:
[51] "/Data/Yar/MO/99-00/Lin_1min_Yar_MO_99-00.txt"
[52] "/Data/Yar/MO/99-00/Lin_2min_Yar_MO_99-00.txt"
[53] "/Data/Yar/MO/99-00/Lin_5min_Yar_MO_99-00.txt"
I decided to use the file.rename()
function in R to deal
with the problem. I found this very
good “Programatically rename files (or do other stuff to them) in R”
post, I tried to follow it but it did not work…
After a bit of frustration, I realized that as I was trying to rename
both the files and the directories at the same time,
and I got the angry 'No such file or directory'
. This is
the reason for the post.
When one needs to change the same pattern in the names of both directories and files, things must be done in sequence, directory names first and filenames in second.
Now, I stumbled upon lots of mistakes and I certainly spent more time
than I would have changing everything by hand… But I would not have
learned and shared things.
I used this code to generate a list of “00-01” and corresponding “2000-2001”:
years<-(1979:2017)
pattern<-matrix(NA,nrow = length(years)-1,2)
j = 1
for (i in head(years,-1)){
#defining the years to be used
pattern[j,1] = paste0(substr(as.character(i),3,4),"-",substr(as.character(i+1),3,4))
pattern[j,2] = paste0(i,"-",i+1)
j= j+1
}
head(pattern)
## [,1] [,2]
## [1,] "79-80" "1979-1980"
## [2,] "80-81" "1980-1981"
## [3,] "81-82" "1981-1982"
## [4,] "82-83" "1982-1983"
## [5,] "83-84" "1983-1984"
## [6,] "84-85" "1984-1985"
For this, I used the code below, which contains interesting tricks
for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],dirlist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}
First, the code to make a list of all directories down a directory tree is, as long as the file for the code is physically located in a directory downstream of which one wants all the recursive directories. I go into more details ablout this further down.
dirlist<-list.dirs(recursive = TRUE)
This yielded this:
[1] "." "./A1" "./A1/Cl" "./A1/Cl/98-99" "./A1/DIC" "./A1/DIC/98-99" "./A1/DOC"
[8] "./A1/DOC/98-99" "./A1/NH4" "./A1/NH4/98-99" "./A1/NO3" "./A1/NO3/98-99" "./A1/ON"
[15] "./A1/PO4" "./A1/PO4/98-99" "./A1/TN" "./A1/TN/98-99" "./A1/TP" "./A1/TP/98-99"
and more
Then I used the grep()
function, to list the row name of
the return dirlist
. When there was no match, R returned
integer(0)
which means a vector of length 0. I had to only
make changes when there was a match so that was in the case the return
from the grep()
function was not integer(0)
. I
found the identical()
function that yields a boolean
TRUE
or FALSE
.
The heart of the automatic changes is really with the
sapply()
function here, which I borrowed from the post.
It is very smart and efficient.
sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,2],replacement = pattern[i,1],eachPath))
The last important thing I want to share is that to have the
code work for any directory on your computer, it is better to provide
absolute file names. For this, I have chosen to add a path to files and
make directory and file lists from it. Now, there is a subtlety: Notice
in the code below that I purposely did not add a /
at the
end of the path name. This is because the function
list.dirs()
automatically adds one, but not the function
list.files()
…! So to get the full file name, I just added a
/
in the code so that the file list be correct.
path<-"/Users/francoisbirgand/Google Drive/Echantillonnage/Methods/Data"
dirlist<-list.dirs(path,recursive = TRUE) # recursive = TRUE means that all dir in the tree will appear
### this makes a list of all the years that need to change
years<-(1979:2017)
pattern<-matrix(NA,nrow = length(years)-1,2)
j = 1
for (i in head(years,-1)){
#defining the years to be used
pattern[j,1] = paste0(substr(as.character(i),3,4),"-",substr(as.character(i+1),3,4))
pattern[j,2] = paste0(i,"-",i+1)
j= j+1
}
### This changes all the directory names
for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],dirlist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}
filelist<-paste0(path,"/",list.files(path,recursive = TRUE)) # recursive = TRUE means that all files in the tree will appear
### This changes all the file names
for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],filelist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(filelist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}