Intended for healthcare professionals

CCBYNC Open access
Analysis China’s Response to Covid-19

Better modelling of infectious diseases: lessons from covid-19 in China

BMJ 2021; 375 doi: (Published 02 December 2021) Cite this as: BMJ 2021;375:n2365

Read the full collection

  1. Yongyue Wei, associate professor of biostatistics1,
  2. Feng Sha, associate professor of biostatistics2,
  3. Yang Zhao, professor of biostatistics1,
  4. Qingwu Jiang, professor of epidemiology3,
  5. Yuantao Hao, professor of epidemiology and biostatistics4,
  6. Feng Chen, professor of biostatistics1
  1. 1Department of Biostatistics, School of Public Health, Center of Global Health, China International Cooperation Center for Environment and Human Health, Nanjing Medical University, Nanjing 211166, China
  2. 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
  3. 3School of Public Health, Fudan University, Shanghai 200433, China
  4. 4Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China
  1. Correspondence to: F Chen fengchen{at}

More timely, accurate, and relevant data and methodological innovation could exploit the full power of modelling, argue Feng Chen and colleagues

Since Daniel Bernoulli studied smallpox inoculation from a mathematical perspective in 1760, mathematical models have proved invaluable to understanding and helping control infectious disease epidemics.1 By simplifying real world phenomena to limited numbers of settings, transmission dynamics modelling uses mathematical models to describe, analyse, and predict infectious disease transmission dynamics and to produce tractable solutions in the face of quickly changing situations. Covid-19 has spread across the world since December 2019, causing millions of deaths and substantial economic losses.

Role of modelling

Since the start of the pandemic in China, transmission dynamics models have been at the forefront of understanding, predicting, preventing, and controlling the situation. Objectives include identification of epidemiological features to understand the disease, prediction of trends in disease, evaluation of control measures to inform decision making, and exploration of uncertainty.

Identification of epidemiological features

At the beginning of the outbreak, when almost nothing was known of the novel pathogen, the models explored the virus’s crucial epidemiological features, such as the incubation period—the period between exposure to the pathogen and the appearance of the first symptoms—and the basic reproductive number (R0)—the average number of secondary infections generated by the first infectious individual in a population of susceptible individuals. These key parameters helped advance our understanding of the features of the disease that we have not yet fully understood and realise the severity of the situation.23

Short term prediction

As more data became available, these models could be fitted by the actual data and steadily refined to improve prediction of future trends, such as infection numbers and hospitalisation needs.4 Models proved useful in predicting short term trends, on the scale of days or weeks, which was one of the major tasks of the Covid-19 Prevention and Control Expert Committee in February 2020 organised by the Chinese Preventive Medicine Association. These predictions allowed response teams to allocate healthcare resources efficiently and optimise containment strategies.

Evaluation of control measures

Because successful public health measures will change the course of an epidemic within days, by comparing the observed and predicted infection trends these models helped to quantitatively assess the effectiveness of the prevention and control measures. For example, the models made a great contribution to the Wuhan shutdown and national emergency response to delay the spread of the epidemic and averted high numbers of cases in China.5 The models reflected the implementation of clinical diagnostic criteria and universal symptom survey to epidemic control in Wuhan.6

Exploration of uncertainty

Facing fast changing situations, the models are naturally designed to explore uncertainties by sensitivity analysis incorporating different parameters. For instance, based on current understanding of virus control strategies and vaccine effectiveness, the mathematical models warned that completely lifting the non-pharmaceutical interventions, even with highly effective vaccines, would lead to a substantial increase in covid-19 transmission.7 Modelling results provide rapid feedback to inform future decision making.


However, despite their extensive use in this pandemic, mathematical models have several important limitations, as we discuss below.

We cannot model what we do not understand

Key information required for modelling includes duration of incubation period, transmission route and transmissibility of the pathogen, and difference in transmissibility of cases during the incubation period and symptomatic period, which can be obtained from real data, previous experience, or expert opinions. The genome of the novel virus was sequenced on 2 January 2020, and shared with the global community nine days later.

Although the virus, named SARS-CoV-2 by the International Classification on Taxonomy of Viruses, was soon identified as belonging to the beta-coronavirus family, crucial epidemiological characteristics remained largely unclear. In the absence of data, scientists had to rely on similar respiratory infections, such as severe acute respiratory syndrome and Middle East respiratory syndrome to inform model design.3 Many of these models ignored the virus’s incubation period—the gap between infection and development of symptoms 8—or underestimated its length as two to three days, as with severe acute respiratory syndrome. Other models assumed that transmission during the incubation period was zero or was equal to transmission during the symptomatic stage; both assumptions proved false.39

While several early epidemiological studies did provide key information that improved model accuracy,1011 our limited understanding of the new virus resulted in models with inappropriate structures and unverified parameters, which produced inherently flawed predictions.

Models are less powerful if data are inaccessible

If there is one thing more important than vaccines in this pandemic, it is data. Data are urgently needed not only on daily confirmed cases but also on transmission dynamics, population migration, individual symptoms, hospital admissions, treatment records, and contact tracing. Longitudinal data are particularly necessary for better understanding of the impact of covid-19 on population health and on health systems. While global response teams have exabytes of data, much of the data are kept in private databases. Most of the mathematical models developed so far are based largely on daily laboratory confirmed case numbers, with only a few incorporating population movement or migration data to reveal how the virus spread across the world.1213 These exceptions show the importance of data availability in understanding and combating the pandemic.

For us to exploit the full power of modelling, more relevant and accurate data must be made accessible. Individual databases should be combined, becoming greater than the sum of their parts and offering novel insights. A global, open minded sharing of data, combined with big data approaches and high speed network technologies, will help us to find better solutions to the covid-19 pandemic and to future epidemiological events, while protecting personal privacy and social security.

Accurate data are essential to truly understand the pandemic

In late January 2020, in the early stage of the epidemic in China, polymerase chain reaction (PCR) testing capacity was insufficient throughout the country; the situation was even more serious in Wuhan, the centre of the pandemic in China. This insufficiency resulted in a considerable delay between symptom onset and laboratory confirmation of infection. For infections before 22 January, average delays were 17.9 days in Wuhan, 15.8 days in the cities of Hubei province excluding Wuhan, and 12.7 days in China excluding Hubei (fig 1). These delays, though greatly reduced as testing capacity expanded, still averaged about three to seven days in February. Similar delays were seen in Germany.

Fig 1
Fig 1

Distribution of the delays between symptom onset and laboratory confirmation of viral infection. Distributions were fitted in (A) Wuhan, (B) Hubei province excluding Wuhan, (C) mainland China excluding Hubei province, and (D) Germany, using the statistical simulation incorporating lognormal distribution and daily frequency data (extracted from101115 and

According to a systematic review,14 the 33 models of the covid-19 pandemic in China, with few exceptions, relied on officially released numbers of laboratory confirmed cases rather than on numbers of symptomatic individuals.15 These models should therefore be interpreted with caution. One study incorporating dates of both symptom onset and laboratory confirmation, obtained from the National Notifiable Disease Report System database, estimated a peak effective reproductive number (Rt, or the mean number of people infected by a single infectious individual in an infection period) of about 3.54, considerably higher than the value (R0=2.38) from the study relying on laboratory confirmed case numbers.15 In addition, another study using the same data estimated the epidemic’s turning point (that is, the point at which the daily emerging case rate began to decelerate) as 31 January, about nine days earlier than models incorporating laboratory confirmed case numbers.4

Insufficient testing capacity during the pandemic’s early stages also resulted in a considerable number of unconfirmed infections—cases with typical symptoms or radiological evidence but without a positive PCR test or without PCR testing. We previously reported that unconfirmed infections accounted for about 40% of all cases in Wuhan in January 2020.6 A model incorporating individual level data with symptom onset information and accounting for presymptomatic infectiousness estimated that 87% of all infections in Wuhan from 1 January to 8 March were unconfirmed, potentially including asymptomatic and mildly symptomatic individuals.15

While this estimate may seem high, it is supported by recent infection rate data. The latest large scale seroprevalence study in Wuhan found a 6.92% positive rate for pan-immunoglobulins against SARS-CoV-2 in the population,16 equivalent to about 0.73 million infections, of which 94% were unconfirmed during the pandemic in Wuhan. This number is far beyond the public’s imagination and all existing model predictions, though there is a possibility of cross reaction with antibodies of other coronaviruses, which may result in an overestimated infection rate.

Models must keep up with a rapidly changing situation

Accurate prediction of the pandemic using models is a seemingly impossible task. Time varying control measures, continually updated treatment protocols, increasing public health consciousness, and vaccination all affect the pandemic’s trajectory. Without incorporating fast changing parameters, models would result in less accurate predictions. Two major factors may have shaped the pandemic trend: vaccination and viral mutation. Vaccination promises to end the covid-19 pandemic while allowing restoration of social activities. As of 13 September 2021, China had over 969 million people fully vaccinated against SARS-CoV-2, and about 2.3 billion people worldwide were fully vaccinated,17 reducing the number of infections, critically ill cases, and deaths. However, a broad list of real world factors could impact on vaccine effectiveness, including immune response heterogeneity, financing shortfalls, regional inequalities, logistical challenges, difficulties in expanding manufacturing capacity, and viral mutations. Future modelling studies should account for these factors to provide more reliable results to inform decision making.

Viral mutation is facilitated by the pathogen’s large scale spread. More virulent and transmissible variants, such as the delta variant, which is 40% to 60% more contagious than previous variants,18 alter vaccine effectiveness and global infection dynamics greatly. How the models account for such viral mutation possibilities, especially before it is widespread, to better predict the future is challenging.

In addition, the dominant viral transmission route has changed from potential animal-to-human transmission at the beginning of the outbreak to broad human-to-human transmission. On 30 November 2020, routine PCR testing among the staff at an aquatic product company in Jiaozhou, Qingdao, China, identified one asymptomatic positive case; further contact tracing identified an additional infection among the co-workers. After scientists from the Centre for Disease Control and Prevention comprehensively studied the tracing investigation, gene sequencing, and video records, the case was recognised as a potential contaminated cold chain product-to-human transmission.19 Going forward, scientists must continually incorporate new information into their models, while also ensuring that this information is reliable.


In 1976, the British statistician George Box famously stated that all models are wrong, but some are useful. Over four decades later, referring specifically to covid-19 modelling, Siegenfeld and colleagues noted that understanding what models cannot predict is sometimes more important than understanding what they can predict.20

Given the highly dynamic nature of disease outbreaks, even models that account for rapidly changing parameters cannot predict future numbers with total accuracy. Meanwhile, long term predictions of mathematical models have mostly proved badly wrong. In fact, if effective interventions are in place, changing the epidemic’s dynamics, these predictions are bound to be wrong; they will be correct only in the absence of such interventions.

Despite these limitations, mathematical modelling is one of our most powerful tools for detecting, understanding, and combating infectious disease outbreaks. As stated above, a model is only as reliable as the data underlying it. Increased amounts of data, through improved case identification methodology and expanded information sharing, is crucial for models to effectively recognise and mitigate future public health emergencies. On the other hand, model optimisation and methodological innovation are urgently needed to deal with the imperfect data to give early warning of major public health emergencies. Importantly, mathematical modelling should be one of the most valuable tools to reflect great uncertainties or warn of the worst situation. An appreciation of the shortcomings of models not only clarifies what they can’t do but helps anticipate what they can do.

Key messages

  • Mathematical modelling can help us understand and control infectious disease outbreaks, including the covid-19 pandemic

  • Accuracy of prediction is limited by insufficient, inaccessible, or inaccurate data

  • Greater information sharing and methodological innovation to deal with uncertainty are needed to improve accuracy

  • Nevertheless, transmission modelling is a powerful tool for early warning and short term predictions


The study was partially supported by the National Natural Science Foundation of China (82041024 to Feng Chen) and the Bill & Melinda Gates Foundation (INV-006371).


  • Contributors and sources: FC, president of the China Statistical Theory and Methodology Committee and vice president of the International Biostatistics Society China Branch, conceived, designed, and supervised this analysis. YH, professor president of the China Biostatistics Society, helped interpret the second arguments in this analysis and critically reviewed the manuscript. QJ critically reviewed and helped interpret the arguments in this analysis. YZ helped interpret the arguments in this analysis and critically reviewed and revised the manuscript. FS helped interpret the arguments in this analysis and critically reviewed and revised the manuscript. YW performed analysis, reviewed the literature, interpreted results, and drafted the manuscript. YW, YZ, QJ, YH, and FC served as the members in the covid-19 prevention and control expert committee organised by the Chinese Preventive Medicine Association. All material to support the arguments in this analysis were from publications or publicly accessible resources. FC is the guarantor.

  • Competing interests: We have read and understood BMJ policy on declaration of interests and have no relevant interests to declare.

  • Provenance and peer review: Commissioned; externally peer reviewed.

  • This article is part of a collection proposed by the Peking University Center for Public Health and Epidemic Preparedness and Response. Open access fees were funded by individual institutions. The BMJ commissioned, peer reviewed, edited, and made the decision to publish. Li-Ming Li advised on commissioning for this collection. Jin-Ling Tang, Di Wang, and Kamran Abbasi were the lead editors for The BMJ.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:


View Abstract