neopolitan's philosophical blog: Infectious Statistics

Sunday, 16 February 2020

Infectious Statistics

In A Worry of Climate Change Scientists (coming soon), I address a claim associated with the 97% consensus figure which the media, such as the Guardian, picked up and ran with. It’s possible that they are using it ironically in the rubric “Climate Consensus - the 97%”, echoing the 99% quoted by the Occupy movement.

Any statistic, like the 97% figure, should be taken with a pinch of salt and an assortment of questions. What does it actually mean? What is it really measuring? Were there any caveats associated with the figure? And so on and so on and so on.

A recent example of a similar problem came to mind as I was developing that post, a time at which the novel coronavirus recently dubbed COVID-19 had been confirmed to have infected more than 60,000 people and killed more than 1300. It’s a bit messy because China released additional figures based on a new detection technique, leading to a one day jump of more than 15,000 cases and the day after 100 people miraculously came back to life (that is during the early reporting hours, the numbers were adjusted down by 122 and later down by about 100). For the purposes of this discussion, I am going to work from the basis of it being 12 February 2020 when the figures, although probably not accurate, were at least consistent.

There was a question occupying the minds of many people trying to work out whether they should be bothered by a virus that has, so far, killed in the order of 0.2-0.4% as many people as die each year from influenza. Sure, the flu is everywhere, while COVID-19 had so far been largely contained to China, but it seemed unlikely that, by the end of the year, more than half a million people would die from it.

The question was: what is the mortality rate due to COVID-19? An easy question, but not so easy to answer. The official answer, to avoid any unnecessary concerns, is about 2%. The figures are rubbery because we are unlikely to have a good idea of precisely how many have been infected until much later, we only knew the confirmed case numbers and these might have just been those who were sickest – sick enough to present to a medical clinic of some kind.

A simple way to calculate the mortality rate is to take the number of deaths and divide that by the number of (confirmed) cases (all figures taken from here with downloadable datasets here):

Deaths	Cases	Mortality Rate
12 Feb 2020	12 Feb 2020	D/C
1117	45206	2.5%

So that seems accurate, about 2% just like the authorities were telling us.

However, there is going to be a lag between a person presenting with symptoms, being confirmed as having COVID-19, getting progressively sicker and finally succumbing. Surely the mortality rate should be compared not against the number of people confirmed to have the virus at the time of death, but at time of confirmation. The question then is what is the lag between confirmation and death?

For the man who died in the Philippines, that lag was seven days. There appears to be a six-day lag between the downturn in the rate of new cases (6 Feb) and the downturn in the rate of new deaths (12 Feb). Let’s use six days:

Deaths	Cases	Mortality Rate
12 Feb 2020	6 Feb 2020	D/C
1117	30808	3.6%

But it could be worse than that, should we not consider it from the time that the virus was contracted, which is two to fourteen days prior to symptoms developing. Let’s split the difference there and say eight days before being confirmed as having the virus and fifteen days before succumbing:

Deaths	Cases	Mortality Rate
12 Feb 2020	28 Jan 2020	D/C
1117	6082	18%

Whoa Nelly!

That would be something to be worried about.

Another way to calculate the figure is to consider those who had run the course of the disease. Some recovered, some died which permits another calculation to be made, the percentage who run the course of the disease but succumb to it:

Deaths	Recovered	Mortality Rate
12 Feb 2020	12 Feb 2020	D/(D+R)
1117	5123	18%

My rounding here hides the fact that one 18% is actually 17.7% and the other is 18.3%, making them look precisely the same when they are different by more than half a percent. Also, an interesting thing happens when you push back a couple of days and run the previous calculation (deaths/cases, fifteen-day lag):

Deaths	Cases	Mortality Rate
10 Feb 2020	26 Jan 2020	D/C
910	2829	32%

What?!

Another two days:

Deaths	Cases	Mortality Rate
8 Feb 2020	24 Jan 2020	D/C
725	941	77%

Of course, that can’t be right. I’m conflating the notion of people who contract the virus on a particular day with figures reported as confirmed on that day. Phew!

Note that the 18% for deaths divided by the number of people for whom the virus has run its course still stands, but if I go back in time and check the figures, the mortality appears to be worse the further back I go, so that doesn’t seem right either.

What about the seven-day lag figure? If I do the same thing, pushing back a couple of days:

Deaths	Cases	Mortality Rate
10 Feb 2020	4 Feb 2020	D/C
910	24506	3.7%

It’s not as much of an increase, but as I push back further, the apparent mortality rate goes up – slowly, but inevitably, eventually getting into the 20% range in the early days of the outbreak.

So what I did was project into the future, assuming that the number of case and deaths increases linearly (which as of 12 Feb was worst case, given that there seemed to be a slowdown in both new cases and deaths), and found that the mortality rate based on cases, irrespective of the lag time, tended towards about 3%. Mortality based on number of people who run the course of the viral infection, assuming a fifteen-day illness and borrowing case numbers from the previous projection, also tends towards about 3%.

So, as of 12 Feb and depending on whether my objective was to be accurate, informative, calming, inflammatory or some combination thereof, I could possibly justify saying that the mortality rate was:

2% (official)

2.5% (current deaths divided by current cases)

3% (long term trend)

3.6% (current deaths divided by cases six days ago), or

18% (current death rate divided by current number of people who have run the course of the viral infection, one way or the other)

That’s quite a spread.

The point that I am trying to make here, long-windedly, is that unless I went to some serious effort to explain to the reader how I arrived at whatever percentage I provided, my use of an apparently more accurate figure (3.6% versus 2% or 3%) should not be thought of as providing you with any more confidence that my figure was accurate, or that it gave you the information that you thought it was giving.

The same applies to the climate consensus 97% figure – for the most part, when quoted by a denialist, it’s being used as a distraction with the intent to keeping rational people engaged on minutiae and, therefore, not on the scientific evidence that is so damaging to the denialist cause.

---

Note, since writing this, the topic came up on More or Less. I am not “Ian” so someone else out there is maintaining a spreadsheet and arrived at a figure of 18%.

neopolitan's philosophical blog

Sunday, 16 February 2020

Infectious Statistics

No comments:

Post a Comment