started 2011-09-11 and updated 2023-05-26
Lately, we have been researching and developing techniques of analyzing continuous water quality and hydrology data in order to explain hydrological and biogeochemical processes controlling stream chemistry in watersheds. One of the techniques in which we have applied to an entire year of water quality and hydrology data is a method that assesses how relatively reactive nutrient/sediment loading is to flow and the relative flashiness of a watershed. This method, referred to as cumulative load and flow volume duration curves, determines the percentages of the total load or Mass (Mk%) and volume of Water (Wk%) that occur in a percentage of the total sampling time, termed probability of occurrence.
Mk% and Wk% duration curves can be plotted as a function of the
probability of occurrence, which provides an interesting way of
demonstrating how loading relates to flow and varies among water quality
constituents. This method can also be applied to individual storm events
to assess if there is a first flush response.
There are lots of great new (in 2017) plotting tools in R, such as
ggplot
, qqplot
, or plotly
. These
tools have developped to manipulate data as function of factors,
parameter, etc. But as far as we can tell, they are not well designed
for handling time series of different units. This is too bad because
they are otherwise rather attractive. As a result, for continuous flow
and concentration data, we are still forced to use the more basic
R graphics
. But maybe me are missing something. In the
meantime, it is important to be able to have some basic knowledge on how
to plot continuous hydrographs and chemographs.
In the code below, we have added quite a bit of comments to explain
what code line does what.
data<-read.csv(file="https://raw.githubusercontent.com/francoisbirgand/francoisbirgand.github.io/master/data/sample_1hr_QC_data.csv",header = TRUE) #Reads file into table format
WSarea<-24.2 #Area of watershed in km2
WS<-"Maudouve at Saint-Donan, France"
names(data)=c("datetime","Q","C") # renames the columns in simpler names
data<-as.data.frame(data)
data$datetime<-as.POSIXct(strptime(data$datetime, "%Y-%m-%d %H:%M:%S")) # transforms characters into date values understood by R
D<-data$datetime
Q<-data$Q #Defines Q as the flow value (m3/s)
C<-data$C #Defines C as the Concentration value (mg NO3-N/L)
L<-Q*C
N=nrow(data) #Sets N to the value equal to the number of total rows in the table
# definition of the x and y axes limits
xlim = as.POSIXct(c(D[1],D[N])) # this renders the first and last date understandable for plotting purposes
ylimQ = c(0,max(Q)) # ylim for flow
ylimC = c(0,max(C)) # ylim for concentrations
ScaleF = 1.2 # scaling factor for size of fonts and other things
y1lab<-expression("Flow rate (" * m^3 * "/s)") # defines the label for flow
y2lab<-substitute(paste("Nitrate concentration (mg ",NO[x]^{y},"-N)",sep=""),list(x=3,y="-")) # defines the label for concentrations
par(mar=c(4.5,4.5,4,4.5)) # defines the sizes, in number of lines, for the margins (bottom, left, top, right)
ltyp=c(1,2)
plot(D,Q,col="blue",type="l",cex=0.1,yaxt="n",
lty=ltyp[1],xaxt="n",xlab="",ylab="",xlim=xlim,ylim=ylimQ)
# we are taking all the default addition of axis tick marks and numbers out by using xaxt and yaxt = "n"
# and setting the axis labels at nothing using xlab = "" and ylab = ""
abline(h=0)
axis.POSIXct(1, at=seq(D[1], D[N], by="month"), format="%m/%d",cex.axis=ScaleF)
# this tells R that we want the X axis ticks and values to be displayed as dates, be added on a monthly basis,
# using the month/day format
axis(2,cex.axis=ScaleF)
# this tells R that the first Y axis ticks can be displayed (that function was repressed earlier by 'yaxt="n" ')
par(new=TRUE)
# this tells R that a new plot has already been opened, in other words you are telling R to keep adding things
# on the existing plot
ColElmt="deeppink1"
plot(D,C,col=ColElmt,type="l",cex=0.1,yaxt="n",
lty=ltyp[1],xaxt="n",xlab="",ylab="",xlim=xlim,ylim=ylimC)
# plots the concentration data
axis(4,cex.axis=ScaleF)
# this tells R that the second Y axis ticks can be displayed (that function was repressed earlier by 'yaxt="n" ')
par(new=TRUE)
mtext("Dates in 1997-1998",side=1,line=3,cex=ScaleF) # add in the margin the defined labels and title
mtext(y1lab,side=2,line=3,cex=ScaleF)
mtext(y2lab,side=4,line=3,cex=ScaleF)
mtext(WS,side=3,line=1.5,cex=ScaleF)
legend("topleft",c("Flow","Conc"),lty = c(1,1), col = c("blue",ColElmt))
Flow volume duration curves represent the percentage of the total
flow that occurred in x% of the time corresponding to the highest flows.
The same applies for loads. This might sound a bit merky, but hopefully
it will not with the further explanations below. To get there, one first
needs to order flow and loads in descending order.
QSort=sort(Q,decreasing = TRUE) #Sorts instantaneous flow rates in descending order
LSort=sort(L,decreasing = TRUE) #Sorts instantaneous flux values in descending order
The blue hydrograph from above now becomes:
par(mar=c(4.5,4.5,1,1))
plot(QSort,col="blue",type="l",cex=0.1,yaxt="n",
lty=ltyp[1],xaxt="n",xlab="",ylab="",ylim=ylimQ)
abline(h=0)
axis(2,cex.axis=ScaleF)
mtext(y1lab,side=2,line=3,cex=ScaleF)
mtext("Cumulative number of flow values",side=1,line=3,cex=ScaleF)
legend("topright",c("Sorted Flow"),lty = c(1), col = c("blue"))
The next thing to do is to integrate under the sorted curves to
obtain the cumulative flow as a function for each represented flow
value. To do this, there is a really nice way to calculate this using
the cumcum()
function as below. All values are put in
percentage values
cumQSort<-c(0,(cumsum(QSort[-1])+cumsum(head(QSort,-1)))/2)
cumQSort<-cumQSort/tail(cumQSort,1)*100
cumLSort<-c(0,(cumsum(LSort[-1])+cumsum(head(LSort,-1)))/2)
cumLSort<-cumLSort/tail(cumLSort,1)*100
So now the cumulative flow volume curve corresponding to the highest flow rates as a function of the cumulative number of flow values looks like this:
par(mar=c(4.5,5.5,1,1))
plot(cumQSort,col="blue",type="l",cex=0.1,yaxt="n",
lty=ltyp[1],xaxt="n",xlab="",ylab="", ylim=c(0,100))
abline(h=0)
axis(2,cex.axis=ScaleF)
mtext("Cumulative number of flow values",side=1,line=1,cex=ScaleF)
mtext("%age of cumulative flow corresponding to\n the highest sorted flow rates ",side=2,line=3,cex=ScaleF)
# the \n in the text allows for line break in the title
Notice that there is still no unit added for x axis because I decided that the cumulative number of flow value does not really add a lot to the analysis. However, it becomes very interesting to transform these values in probability of occurence. Each value has 1/N the probability to occur. We can also calculate the cumulative probability of occurence of flow values. Flow volume and cumulative load duration curves are thus derived this way.
In more details, the cumulative discharge calculated at each
instantaneous flow rate can be calculated as a percentage of the total
discharge yielding Wk% values corresponding to the kth cumulative
probability and the time elapsed at each point can be calculated as a
percentage of the total time. This works because even though flow rates
are rearranged, the same amount of data points exist within the dataset
with the same time increment occurring between each value. Wk% values
can then be plotted as a function of the percentage of the total time.
This is what is referred to as Flow Volume Duration Curves. This
provides a way of demonstrating of how relatively flashy the watershed
may be, either relatively to other watersheds or to previous
years.
The flashiness of a watershed refers to how rapidly flow is altered
as a result of storm events/varying conditions. More frequent spikes in
flow in response to precipitation events, in which flow increases and
decreases more greatly and rapidly, are typically indicative of
watersheds with predominant portions of streamflow being influenced by
surface runoff, a quicker responding contributor of water to
streamflow.
Wk=quantile(cumQSort,probs=seq(0.01,1,0.01))/tail(cumQSort,1) #Calculates and assigns Wk% values to a probability occurring in 1-100% of the total time
Mk=quantile(cumLSort,probs=seq(0.01,1,0.01))/tail(cumLSort,1)
This method allows us to see the percentage of the total discharge
that occurs in a fraction of the total time with the lowest
probabilities of occurrence corresponding with the highest flow rates
associated with event flow. Therefore, if one watershed produces a
majority of the total discharge in 50% of the time versus a watershed
that produces a majority of the total discharge in 80% of the time, that
watershed may be considered relatively flashier because a greater
portion of the total discharge occurs in association with higher flow
rates. In other words, streamflow would be considered more reactive to
event water because the event hydrograph rises and recedes more quickly
than the other watershed. This quick rise and recession allows for most
flow to occur in a smaller percentage of the time versus the watershed
that has a much wider event hydrograph spanning across a greater range
of instantaneous flow values over a greater period of time. Visually,
this method can provide a relative comparison of the flashiness of
multiple watersheds. In the example above, the shape of the curve in the
first watershed would have a greater slope towards the lower
percentages/probabilities of occurrence and the curve for the second
watershed would be somewhat flatter.
xlim=c(0,100);ylim=c(0,100);
plot(1:100,Wk*100,xlab="Prob of Occurrence %",ylab="Mk% & Wk%",xlim=xlim,ylim=ylim,pch=21,col="black",bg="blue")
par(new=TRUE)
plot(1:100,Mk*100,xlab="Prob of Occurrence %",ylab="Mk% & Wk%",xlim=xlim,ylim=ylim,pch=22,col="black",bg=ColElmt)
par(new=TRUE)
abline(1,1,xlab="Prob of Occurrence %",ylab="Mk% & Wk",col="black",lty="dashed",xlim=xlim,ylim=ylim)
par(new=TRUE)
legend("bottomright",c("Cumul Q","Cumul NO3 load"),
pch=c(19,19),
col=c("blue",ColElmt),
bg="white")
title(main="Nitrate Cumulative Load and flow volume duration curves") # the \n in the text allows for line break in the title
#Code for plotting the double cumulative plot using the normal distribution probabilities to zoom in on the lower tails
# Q is understood as the entire time series of flow
# C is understood as the entire time series for nitrate
par(mar=c(4.5,4.5,3,0.5)) # this gives the sizes expressed in lines of the size of the margins around the plot, in that order bottom, left, top, right.
# The first thing to do is to select a few values among the 100 possible, such that the details at the tail ends are highlighted. This is what the next line does.
SelectPerc<-c(1,2,3,5,10,25,50,75,90,95,97,98,99,100) # SelectPerc is a set of user defined values of percentages of the total time to be used for calculating and plotting specific corresponding Wk% (and Mk% values)
qnormSelectPerc<-qnorm(SelectPerc/100) #All this does is transforming the selected values chosen in SelectPerc (divided by 100 to express them as percentage values) such that in the plotting, the tail ends of the values be highlighed. Technically, the qnorm function takes a given probability (x values in this case) and returns the corresponding cumulative distribution value (Z-score) based on a normal distribution curve
qnormWk<-qnorm(Wk[SelectPerc]) #This does the same thing here with the Wk values corresponding with percentages of the total time defined by values in SelectPerc. Technically the qnorm function takes percentages of the flow occurring associated with percentages of the total time defined by SelectPerc values and assigns Z-score values to each probability based on the normal distribution curve
plot(qnormSelectPerc,qnormWk,xlab="Probability of Occurrence (%)",ylab="Wk% and Mk%",
type="o",xaxt="n",yaxt="n",xlim=c(-2.5,2.5),ylim=c(-2.5,2.5),pch=19,col="blue") #Plots the qnorm transformed values of Wk as a function of qnorm transformed values of the selected percentage values. The option xaxt="n",yaxt="n" tell R that none of the axis ticks and tick labels should be plotted at this point.
par(new=TRUE) # this tells R that we want to plot another curve on top of the existing one.
# The Mk has been defined above so there is no need to redefine it here. Now, one needs to transform the values of Mk into this new stretched out axis at the tail ends using the qnorm function:
qnormMk<-qnorm(Mk[SelectPerc]) #Sets y as the set of Mk% values corresponding with percentages of the total time defined by values in SelectPerc
color<-c("deeppink1") #Defines the plot color for the element to be plotted, here nitrate.
plot(qnormSelectPerc,qnormMk,xlab="",ylab="",
type="o",xaxt="n",yaxt="n",xlim=c(-2.5,2.5),ylim=c(-2.5,2.5),pch=2,col=color) #Plots yy and yV values as a function of xx values
par(new=TRUE)
axis(1,at=qnormSelectPerc,labels=SelectPerc)
axis(2,at=qnormSelectPerc,labels=SelectPerc)
mtext("Flow volume and cumulative mass duration curve for the \nMaudouve River at Saint-Donan, France", side=3, line = 1) # the \n in the text allows for line break in the title
abline(h=qnormSelectPerc,lty=3,col="grey") #Plots gridlines
abline(v=qnormSelectPerc,lty=3,col="grey")
legend("bottomright",c("Cumul Q","Cumul NO3 load"),
pch=c(19,2),
col=c("blue",color),
bg="white")
Once again, the cumulative load calculated at each instantaneous flux value can be calculated as a percentage of the total load yielding Mk% values and the time elapsed at each point can be calculated as a percentage of the total time. Mk% and Wk% values can be plotted as functions of the probability of occurrence and multiple Mk% curves for various water quality components can be plotted simultaneously for comparison of loading as a function of probability of occurrence. For example, 50% of the nitrate load may be exported in 25% of the time whereas 50% of the ammonium load may be exported in only 5% of the time. The interaction of flow and concentrations could be examined further to explain differences among loading and flashiness of concentrations with event flow.
With the above plot, we see that the Maudouve River at Saint-Donan in 1997-1998 is not very flashy as only 10% of the flow occurs in 2% of the time. Some watersheds are a lot flashier and can export more than 25% of the flow volume in 2% of the time. We can also see that there is a very slight difference between the flow volume and the cumulative load duration curves, with the cumulative load duration tending to be a bit lower. This is due to the “dilution” effect of nitrate concentrations during flow events: nitrate concentrations exhibit troughs during flow peaks.
It is helpful to use the qnorm
function in R to zoom in
on the very low and high probabilities of occurrence, or on the tails of
the normal distribution curve.