THIS WORKS IS YET TO BE REVIEWED
In this paper contains estimates for the reproductive number $R_{t} over time. This is done as described in [1]. These have been implemented in R using EpiEstim
package [2] which is what is used here. This report should be updated roughly daily and is available online.
As this paper is updated over time this section will summarise significant changes. The code producing this paper is tracked using Git. The Git commit hash for this project at the time of generating this paper was 43c938c362fee38df7d97ae1a08fbc1d9189601c.
2020-06-12
The project uses the following libraries.
require(EpiEstim)
require(EnvStats)
require(ggplot2)
require(ggpubr)
require(lubridate)
require(utils)
require(httr)
require(dplyr)
require(tidyr)
require(scales)
Data is downloaded from the Git repository associated with [3].
# Provincial Deaths
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv",
write_disk("covid19za_provincial_cumulative_timeline_deaths.csv", overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv]
Date: 2020-06-12 19:58
Status: 200
Content-Type: text/plain; charset=utf-8
Size: 7.19 kB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_deaths.csv
# Provincial Cases
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv",
write_disk("covid19za_provincial_cumulative_timeline_confirmed.csv", overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv]
Date: 2020-06-12 19:58
Status: 200
Content-Type: text/plain; charset=utf-8
Size: 9.08 kB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_confirmed.csv
First read in the data from the downloaded comma-separated values text file.
# Read from CSVs above
data_cases <- read.csv("covid19za_provincial_cumulative_timeline_confirmed.csv",
stringsAsFactors = FALSE)
data_deaths <- read.csv("covid19za_provincial_cumulative_timeline_deaths.csv", stringsAsFactors = FALSE)
In the case data file row 21 and 32 contain no provincial details. We estimated it by spreading the national total to the provinces in proportion to a mixture of the prior day and the next day.
data_cases[21, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_cases[c(20,
22), c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_cases[c(20,
22), ]$total) * data_cases[21, ]$total
data_cases[32, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_cases[c(31,
33), c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_cases[c(31,
33), ]$total) * data_cases[32, ]$total
The following function will be applied to case and death data. In this function the following occurs:
SA
column is added as the sum of the new per province data.fix_data <- function(data, start_date = as.Date("2020-03-01"), end_date = as.Date("2020-03-31")) {
# Scale provinces by scale factor (assume unknown are in proportion)
data[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- data[, c("EC",
"FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] * (1 + data$UNKNOWN/rowSums(data[,
c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")]))
# Only select columns we need
data <- data %>% select("date", "EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW",
"WC")
data$date <- as.Date(data$date, "%d-%m-%Y")
# Round data so we have integer cases
data[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- round(data[,
c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")], 0)
# Calculate a new SA column
data$SA <- rowSums(data[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW",
"WC")])
# 'Melt' the data
data <- pivot_longer(data, cols = c("EC", "FS", "GP", "KZN", "LP", "MP", "NC",
"NW", "WC", "SA"), names_to = "province", values_to = "count")
# Getting daily data from the cumulative data set
data = data %>% group_by(province) %>% arrange(date) %>% mutate(count = count -
lag(count, default = 0)) %>% ungroup()
# add missing dates
all_dates <- expand_grid(date = seq(start_date, end_date, 1), province = levels(as.factor(data$province)))
# join
data <- left_join(all_dates, data, by = c("date", "province"))
# province factor
data$province <- as.factor(data$province)
# 0 for NAs
data$count <- ifelse(is.na(data$count), 0, data$count)
# remove negatives
data$count <- ifelse(data$count < 0, 0, data$count)
data <- data %>% group_by(province) %>% mutate(cumulative_count = cumsum(count)) %>%
ungroup()
return(data)
}
Below we use the function above to process deaths and cases and then combine them into a single dataset.
start_date <- min(as.Date(data_cases$date, "%d-%m-%Y"))
end_date <- max(as.Date(data_cases$date, "%d-%m-%Y"))
data_cases <- fix_data(data_cases, start_date = start_date, end_date = end_date)
data_deaths <- fix_data(data_deaths, start_date = start_date, end_date = end_date)
data_cases <- cbind("cases", data_cases)
data_deaths <- cbind("deaths", data_deaths)
colnames(data_cases)[1] <- "type"
colnames(data_deaths)[1] <- "type"
# combined
data <- rbind(data_cases, data_deaths)
# remove data sets no longer needed
rm("data_cases", "data_deaths", "start_date", "end_date")
Below we plot cumulative case count on a log scale by province:
ggplot(data %>% filter(type == "cases"), aes(x = date, y = cumulative_count)) + geom_line(aes(color = province),
size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Cases by Province") +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") +
guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
Below we plot the cumulative deaths by province on a log scale:
ggplot(data %>% filter(type == "deaths"), aes(x = date, y = cumulative_count)) +
geom_line(aes(color = province), size = 1) + scale_y_log10(labels = comma) +
ggtitle("Cumulative Deaths by Province") + theme_bw() + theme(axis.text.x = element_text(angle = 45,
hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) +
scale_color_hue(l = 50)
To do further analysis an serial interval assumption is needed. The serial interval is taken from [4]. It’s assumed to be Gamma distributed with mean of 6.48 and standard deviation of 3.83. This corresponds to the effective infectiousness of an individual since acquiring the infection themselves.
We plot this serial distribution below:
mean1 <- 6.48
sd1 <- 3.83
cv1 <- sd1/mean1
serial_interval = rep(0, 1)
serial_interval[1] = (pgammaAlt(1.5, mean1, cv1) - pgammaAlt(0, mean1, cv1))
for (i in 2:50) {
serial_interval[i] = (pgammaAlt(i + 0.5, mean1, cv1) - pgammaAlt(i - 0.5, mean1,
cv1))
}
ggplot(data.frame(days = seq(1, 20, 1), serial_interval = serial_interval[1:20]),
aes(y = serial_interval, x = days)) + geom_line(size = 1) + theme_bw() + theme(axis.text.x = element_text(angle = 45,
hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) +
scale_color_hue(l = 50)
The following code works backwards and fits \(R_{t}\) values for whole weeks (ending on the last date in the data) using the EpiEstim
package. Using deaths or infection the as the cases in the puts the estimate of \(R_{t}\) as at the date of the cases being tracked.
Two values are estimated for each province (including South Africa as a whole):
Note that the time periods are left unadjusted, though in reality the \(R_{t,m}^{deaths}\) should be shifted back approximately 2 weeks relative to \(R_{t,m}^{cases}\).
Rt_data <- NULL
for (p in levels(data$province)) {
for (t in levels(data$type)) {
# filter out province data of type t
p_data <- data %>% filter(province == p & type == t)
# vector of count of cases/deaths
I <- p_data$count
# t=1 corresponds to this date:
t1_date <- min(p_data$date)
# the day after the first case/death:
t_start_date <- min((p_data %>% filter(count > 0))$date) + 1
t_start <- min(seq(1, length(I))[I > 0]) + 1
# last day of cases/deaths
t_end <- length(I)
# how many full weeks do we have
full_weeks <- floor((t_end - t_start)/7)
# only continue if we have 1 full week or more
if (full_weeks > 0) {
# then divide period into weeks
T.Start <- seq(t_end - 7 * full_weeks, t_end - 7, by = 7)
T.End <- T.Start + 7
# estimate $R_{t,m}^{type}$ for each week:
p_Rt <- EstimateR(I, T.Start = T.Start, T.End = T.End, Mean.SI = mean1,
Std.SI = sd1, method = "ParametricSI")$R
# add province and type designations
p_Rt <- cbind(p, t, p_Rt)
# and add dates
p_Rt$date_start <- t1_date + p_Rt$T.Start
p_Rt$date_end <- t1_date + p_Rt$T.End
# combine the results
if (is.null(Rt_data)) {
Rt_data <- p_Rt
} else {
Rt_data <- rbind(Rt_data, p_Rt)
}
}
}
}
Below we prep column headings and prepare CI data:
# Column names
colnames(Rt_data) <- c("province", "type", "t_start", "t_end", "Rt_mean", "Rt_std",
"Rt_li_95", "Rt_li_90", "Rt_li_50", "Rt_median", "Rt_ui_50", "Rt_ui_90", "Rt_ui_95",
"date_start", "date_end")
# mid week data point
Rt_data$date_mid <- Rt_data$date_start + (Rt_data$date_end - Rt_data$date_start)/2
# split out CI data into different dataset
Rt_ci_data <- NULL
for (ci in c("50", "90", "95")) {
r <- data.frame(province = Rt_data$province, type = Rt_data$type, ci = rep(paste0(ci,
"%"), nrow(Rt_data)), date_mid = Rt_data$date_mid, Rt_mean = Rt_data$Rt_mean,
Rt_li = Rt_data[[paste0("Rt_li_", ci)]], Rt_ui = Rt_data[[paste0("Rt_ui_",
ci)]])
Rt_ci_data <- rbind(r, Rt_ci_data)
}
Let’s generate our plots for each province in a list:
province_plots <- list()
for (p in levels(Rt_data$province)) {
p1 <- ggplot(Rt_ci_data %>% filter(province == p & ci == "95%"), aes(x = date_mid,
y = Rt_mean)) + geom_crossbar(position = position_dodge2(padding = 0), aes(ymin = Rt_li,
ymax = Rt_ui, colour = type, fill = type), width = 7) + scale_fill_manual(values = c(alpha("deepskyblue4",
0.45), alpha("#8b0068", 0.45))) + scale_colour_manual(values = c(alpha("deepskyblue4"),
alpha("#8b0068"))) + theme_bw() + theme(axis.text.x = element_text(angle = 45,
hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) +
xlab("Week") + ylab(expression(R[t, m])) + ggtitle(p)
province_plots[[p]] <- p1
}
Let’s look at SA below. In blue we plot the reproductive number estimated from cases \(R_{t,m}^{cases}\) and in red we plot the reproductive number estimated from deaths \(R_{t,m}^{deaths}\). It’s clear that the \(R_{t,m}^{deaths}\) shows a delay when compared to \(R_{t,m}^{cases}\). During April \(R_{t,m}^{deaths}\) remains high while \(R_{t,m}^{cases}\) has dropped (presumably to the lockdown). For some periods no death data is available and we only estimate using case data.
province_plots[["SA"]] + ggtitle("Reproductive number in SA with 95% confidence intervals")
Below we plot all the provinces:
ggarrange(province_plots[["EC"]], province_plots[["FS"]], province_plots[["GP"]],
province_plots[["KZN"]], province_plots[["LP"]], province_plots[["MP"]], province_plots[["NC"]],
province_plots[["NW"]], province_plots[["WC"]], ncol = 2, nrow = 5) + ggtitle("Reproductive number in SA, by province, with 95% confidence intervals")
The above shows estimates for reproductive number using cases and deaths (\(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\)) for each province \(m\) over time \(t\) in weeks.
It’s clear that:
It appears that the \(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\) would be useful as a monitoring tool, especially the \(R_{t,m}^{deaths}\) as testing backlogs and issues may bias \(R_{t,m}^{cases}\) derived. This report may be useful in this regard.
[1] A. Cori, N. M. Ferguson, C. Fraser, and S. Cauchemez, “A new framework and software to estimate time-varying reproduction numbers during epidemics,” American Journal of Epidemiology, vol. 178, no. 9, pp. 1505–1512, Sep. 2013, doi: 10.1093/aje/kwt133. [Online]. Available: https://doi.org/10.1093/aje/kwt133
[2] A. Cori, EpiEstim: EpiEstim: A package to estimate time varying reproduction numbers from epidemic curves. 2013 [Online]. Available: https://CRAN.R-project.org/package=EpiEstim
[3] V. Marivate et al., “Coronavirus disease (COVID-19) case data - South Africa.” Zenodo, 2020 [Online]. Available: https://zenodo.org/record/3888499
[4] N. Ferguson et al., “Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand,” Imperial College London, 2020 [Online]. Available: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-9-impact-of-npis-on-covid-19/