Blog: renaming directories and files

by François Birgand

started 2017-11-03 and updated 2018-10-30

Keywords

• integer(0)
• grep()
• sapply()
• list.dirs()
• list.files()
• file.rename()
• identical()

File database in a directory tree

I keep a database of high frequency flow and concentration data, which my colleagues and I have used to calculate uncertainties on annual nutrient and material loads18. Until now, I had found very practical to keep the data in hydrological years, which for the Northern hemisphere and in temperate climates starts around the first of september and ends late August the next year. I thus kept in the filenames, e.g., “99-00” to show that the data contained were obtained between Sep 01, 1999 and Aug 31, 2000.

I have recently worked with daily rainfall data from Oxford, U.K., dating back from 1828. The two digit system has thus become obsolete… So I decided to change everything back to four digits for years. Two problems arose:

1. I had lots of files to change…
2. The directory tree also contained years in the same “99-00” format just like below:

[51] "/Data/Yar/MO/99-00/Lin_1min_Yar_MO_99-00.txt"
[52] "/Data/Yar/MO/99-00/Lin_2min_Yar_MO_99-00.txt"
[53] "/Data/Yar/MO/99-00/Lin_5min_Yar_MO_99-00.txt" 

I decided to use the file.rename() function in R to deal with the problem. I found this very good “Programatically rename files (or do other stuff to them) in R” post, I tried to follow it but it did not work…

After a bit of frustration, I realized that as I was trying to rename both the files and the directories at the same time, and I got the angry 'No such file or directory'. This is the reason for the post.

When one needs to change the same pattern in the names of both directories and files, things must be done in sequence, directory names first and filenames in second.

Now, I stumbled upon lots of mistakes and I certainly spent more time than I would have changing everything by hand… But I would not have learned and shared things.

Make a list of patterns

I used this code to generate a list of “00-01” and corresponding “2000-2001”:

years<-(1979:2017)
pattern<-matrix(NA,nrow = length(years)-1,2)
j = 1
#defining the years to be used
pattern[j,1] = paste0(substr(as.character(i),3,4),"-",substr(as.character(i+1),3,4))
pattern[j,2] = paste0(i,"-",i+1)
j= j+1
}
head(pattern)
##      [,1]    [,2]
## [1,] "79-80" "1979-1980"
## [2,] "80-81" "1980-1981"
## [3,] "81-82" "1981-1982"
## [4,] "82-83" "1982-1983"
## [5,] "83-84" "1983-1984"
## [6,] "84-85" "1984-1985"

Automatically rename directories

For this, I used the code below, which contains interesting tricks

for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],dirlist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}

Listing directory names

First, the code to make a list of all directories down a directory tree is, as long as the file for the code is physically located in a directory downstream of which one wants all the recursive directories. I go into more details ablout this further down.

dirlist<-list.dirs(recursive = TRUE)

This yielded this:

[1] "."  "./A1"  "./A1/Cl"  "./A1/Cl/98-99"  "./A1/DIC" "./A1/DIC/98-99" "./A1/DOC"
[8] "./A1/DOC/98-99" "./A1/NH4" "./A1/NH4/98-99" "./A1/NO3" "./A1/NO3/98-99" "./A1/ON"
[15] "./A1/PO4" "./A1/PO4/98-99" "./A1/TN"  "./A1/TN/98-99"  "./A1/TP"  "./A1/TP/98-99"
and more

Then I used the grep() function, to list the row name of the return dirlist. When there was no match, R returned integer(0) which means a vector of length 0. I had to only make changes when there was a match so that was in the case the return from the grep() function was not integer(0). I found the identical() function that yields a boolean TRUE or FALSE.

Applying changes through the whole directory tree

The heart of the automatic changes is really with the sapply() function here, which I borrowed from the post. It is very smart and efficient.

sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,2],replacement = pattern[i,1],eachPath))

The last important thing I want to share is that to have the code work for any directory on your computer, it is better to provide absolute file names. For this, I have chosen to add a path to files and make directory and file lists from it. Now, there is a subtlety: Notice in the code below that I purposely did not add a / at the end of the path name. This is because the function list.dirs() automatically adds one, but not the function list.files()…! So to get the full file name, I just added a / in the code so that the file list be correct.

path<-"/Users/francoisbirgand/Google Drive/Echantillonnage/Methods/Data"
dirlist<-list.dirs(path,recursive = TRUE)      # recursive = TRUE means that all dir in the tree will appear

### this makes a list of all the years that need to change
years<-(1979:2017)
pattern<-matrix(NA,nrow = length(years)-1,2)
j = 1
#defining the years to be used
pattern[j,1] = paste0(substr(as.character(i),3,4),"-",substr(as.character(i+1),3,4))
pattern[j,2] = paste0(i,"-",i+1)
j= j+1
}

### This changes all the directory names
for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],dirlist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(dirlist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}

filelist<-paste0(path,"/",list.files(path,recursive = TRUE))    # recursive = TRUE means that all files in the tree will appear

### This changes all the file names
for (i in 1:nrow(pattern)){
is.pattern = grep(pattern[i,1],filelist)
if (identical(is.pattern,integer(0)) == FALSE){
sapply(filelist[is.pattern],FUN=function(eachPath){
file.rename(from=eachPath,to= sub(pattern= pattern[i,1],replacement = pattern[i,2],eachPath))
})
}
}

References

1. Moatar, F., Meybeck, M., Raymond, S., Birgand, F. & Curie, F. River flux uncertainties predicted by hydrological variability and riverine material behaviour. Hydrol. Process. 27, 3535–3546 (2013).

2. Birgand, F., Lellouche, G. & Appelboom, T. W. Measuring flow in non-ideal conditions for short-term projects: Uncertainties associated with the use of stage-discharge rating curves. J. Hydrol. 503, 186–195 (2013).

3. Birgand, F., Appelboom, T. W., Chescheir, G. M. & Skaggs, R. W. Estimating nitrogen, phosphorus, and carbon fluxes in forested and mixed-used watersheds of the lower coastal plain of north carolina: Uncertainties associated with infrequent sampling. Trans. ASABE 54, 2099–2110 (2011).

4. Birgand, F., Faucheux, C., Gruau, G., Moatar, F. & Meybeck, M. Uncertainties in assessing annual nitrate loads and concentration indicators: Part 2. deriving sampling frequency charts in brittany, france. Trans. ASABE 54, 93–104 (2011).

5. Birgand, F. et al. Uncertainties in assessing annual nitrate loads and concentration indicators: Part 1. impact of sampling frequency and load estimation algorithms. Trans. ASABE 53, 437–446 (2010).

6. Birgand, F. et al. Une approche quantitative du rôle de la fréquence d’échantillonnage sur les incertitudes associées aux calculs des flux et des concentrations moyennes en nitrate en bretagne. Ingénieries 59-60, 23–37 (2009).

7. Moatar, F., Birgand, F., Meybeck, M., Faucheux, C. & Raymond, S. Incertitudes sur les métriques de qualité des cours d’eau (médianes et quantiles de concentrations, flux, cas des nutriments) évaluées a partir de suivis discrets. La Houille Blanche 68–76 (2009). doi:10.1051/lhb/2009029

8. Birgand, F. et al. Mesure des flux et échantillonnage des matières en suspension sur de petits cours d’eau. Ingénieries 40, 21–35 (2004).