This paper describes the methodology for estimating the effective
reproductive number \(R_{t,m}\) over
time \(t\) in various countries or
provinces \(m\). This is done using the
methodology as described in [1]. These
have been implemented in R using `EpiEstim`

package [2] which is what is used here.

This paper and it’s results should be updated roughly daily and is available online.

To view results of this methodology please refer to the following:

As this paper is updated over time this section will summarise significant changes. The code producing this paper is tracked using Git. The Git commit hash for this project at the time of generating this paper was 81f9a976acb2b0257b80d7eeab7313fbb14473b7.

**2020-06-12**

- Initial version.

**2020-06-13**

- Combine the version for South Africa and other countries.

**2020-06-14**

- Add Google Mobility data for comparison and further modelling.

**2020-06-15**

- Add more details about methodology.

**2020-06-17**

- Tidy-up to remove the code.

**2020-10-19**

- Tidy up reports and split out into multiple reports for South Africa and the world.

**2020-10-29**

- Switch from using a serial interval assumption to a generation interval assumption.
- Introduce uncertainty around this assumption.

**2020-12-27**

- Updated graphics in all reports.
- Added risk quadrants.
- Switched to an overlapping sliding 7-day window for calculations (instead of distinct 7-day periods).

This analysis follows the follows the method proposed [1].

Essentially this models the following relationship:

\(E(I_{t,m})=R_{t,m}\sum_{s=1}^tI_{t-s,m}w_{s}\) where:

- \(m\) is the country or province/state of a country.
- \(I_{t,m}\) is the number of cases (or deaths) reported at time \(t\).
- \(R_{t,m}\) is the “instantaneous reproduction number” to quote [1].
- \(I_{t,m} \sim Poisson\) with the above mean.
- \(w_{s}\) is the infectivity profile of an individual over time \(s\).

The formulation of instantaneous rate of transmission proposed in [1] is convenient because it ensures that \(R_{t,m}\) only depends on information that is known at time \(t\). We do not need to know about future transmission.

To do further analysis a generation time assumption is needed (\(w_{s}\)).

We assume based on the meta-analysis in [3] that mean of the generation interval is 5.19 and that it is distributed normally with an standard deviation of 0.45. We truncate this distribution at a minimum of 2.19 and a maximum of 7.19. Note that the mean of generation interval is the same as the mean of the serial interval.

Most of the studies included in the meta-analysis measure the standard deviation of the serial interval rather than that of the generation interval. [4] points out that the generation interval has lower variance than the serial interval distribution and hence the generation interval should be used when estimating \(R\). Using the serial interval introduces bias.

In [4] the estimates for the standard deviation of the generation time distribution is between 1.75 and 2.29. We assume the mean of the standard deviation is 2 and it’s standard deviation is 0.75. We also assume this parameter is normally distributed and is truncated at a minimum of 1 and a maximum of 4.

We assume that the generation interval itself is Gamma distributed with the above parameters.

[1] describes a choice of a time window to estimate \(R_{t,m}\). Based on this analysis we chose a time window of 7-days (\(\tau\) using notation of [1]). During this window \(R_{t,m}\) is assumed to be constant. However information prior that is taken into account.

Note that the time window size does not interact with the generation interval as information prior to the window is taken into account. It’s just the period during which \(R_{t,m}\) is assumed to be constant.

Based on the table in Appendix 2 of [1] we believe a window of 7-days to be reasonable as long as that window contains 12 or more cases or deaths.

The analysis works backwards and fits \(R_{t,m}\) values 7-day periods (ending on
the last date in the data) using the `EpiEstim`

package,
using an overlapping sliding window.

So \(t\) is based on the date of reporting (be that cases or deaths) and \(m\) is the country or province. Two values are estimated for each country:

- \(R_{t,m}^{cases}\) is the reproductive number implied by the cases reported at time \(t\) in country \(m\).
- \(R_{t,m}^{deaths}\) is the reproductive number implied by the deaths reported at time \(t\) in country \(m\).

Note that the time periods are left unadjusted, though in reality the \(R_{t,m}^{deaths}\) should be shifted back approximately 2 weeks relative to \(R_{t,m}^{cases}\).

Limitation of this method to estimate \(R_{t,m}\) are noted in [1]

- It’s sensitive to changes in transmissibility, changes in contact patterns, depletion of the susceptible population and control measures.
- It relies on an assumed generation interval assumptions.
- The size of the time window can affect the volatility of results.
- Results are time lagged with regards to true infection, more so in the case of the use of deaths.
- It’s sensitive to changes in case (or death) detection.
- The generation interval may change over time.

Further to the above the estimates are made under assumption that the cases and deaths are reported consistently over time. For cases this means that testing needs to be at similar levels and reported with similar lag. Should these change rapidly over an interval of a few weeks the above estimates of the effective reproduction numbers would be biased. For example a rapid expansion of testing over the last 3 weeks would results in overestimating recent effective reproduction numbers. Similarly any changes in reporting (over time and underreporting) of deaths would also bias estimates of the reproduction number estimated using deaths. It may well be that some catch-up in reported deaths is exaggerating the estimates for October.

Estimates for the reproduction number are plotted in time period in which the relevant measure is recorded. Though in reality the infections giving rise to those estimates would have occurred roughly between a week to 4 weeks earlier depending on whether it was cases or deaths. These figures have not been shifted back.

Despite these limitation we believe the ease of calculation of this method and the ability to use multiple sources makes it useful as a monitoring tool.

[1]

A.
Cori, N. M. Ferguson, C. Fraser, and S. Cauchemez, “A new
framework and software to estimate time-varying reproduction numbers
during epidemics,” *American Journal of Epidemiology*,
vol. 178, no. 9, pp. 1505–1512, Sep. 2013, doi: 10.1093/aje/kwt133.

[2]

A.
Cori, *EpiEstim: A package to estimate time varying
reproduction numbers from epidemic curves.* 2013.

[3]

B.
Rai, A. Shukla, and L. K. Dwivedi, “Estimates of serial interval
for COVID-19: A systematic review and
meta-analysis,” *Clinical Epidemiology and Global Health*,
Aug. 2020, doi: 10.1016/j.cegh.2020.08.007.

[4]

T.
Ganyani *et al.*, “Estimating the generation interval for
coronavirus disease (COVID-19) based on symptom onset data,
March 2020,” *Eurosurveillance*, vol. 25, no.
17, p. 2000257, Apr. 2020, doi: 10.2807/1560-7917.ES.2020.25.17.2000257.