From molecular tools to Artificial Intelligence (AI): Covid-19 diagnosis

di Barbara Illi1, Matteo Chiara2,3, Graziano Pesole3,4
1 Institute of Molecular Biology and Pathology, National Research Council (IBPM-CNR) c/o Department of Biology and Biotechnology “Charles Darwin”
2 Department of Bioscience, University of Milano
3 Institute of Molecular Biology and Pathology, National Research Council (IBPM-CNR) 
4 Department of Bioscience, Biotechnology e Biopharmaceutics, Universy of Bari «A. Moro»

Table of Contents

For non-experts



Since the emergence of the pandemic, the view of COVID-19 and of SARS-CoV-2 (hereafter simply CoV-2) itself has consistently changed. The clinical characteristics of COVID-19 span from no symptoms at all, to mild symptoms, till organ failure, due to an exacerbated immune response (Figure 1).

This variety of clinical manifestations probably depends on the viral load a patient has been exposed, on pre- and co-existing pathologies, age, gender and even genetics of the immune system (in particular of the HLA system). Indeed, it has been demonstrated that in severe cases the mean viral load is 60 times higher than in mild cases1. CoV-2 may infect multiple tissues and may be recovered from different biological fluids, which represent the starting materials for COVID-19 diagnosis.

Figure 1 Overview of Covid-19 symptoms.

Covid-19 symptoms

Abbreviations: CRP, C Reactive Protein; AST, Aspartate Amino Transferase; ALT, Alanine AminoTransferase; TNFα, Tumour Necorsis Factor α; IL6, Interleukin-6; RT-PCR, Reverse Transcription-Polymerase Chain Reaction; LAMP, Loop-Mediated Isothermal Amplification; RPA, Recombinase Polymerase amplification; CT, Chest Tomography; AI=artificial intelligence. [Based upon Tu et, al Int J Mol Sci, 2020; picture in the middle: SciePro/Shutterstock]

Target tissues

ACE2-expressing tissues and cells are the direct target of CoV-2 infection. These include, not only epithelial cells of the upper and lower respiratory tract, but also vascular cells (such as pericytes2), oral buccal and gingiva, epithelial cells of tongue and oral mucosa3. It has been detected also in cells from salivary glands4. It has been demonstrated that epithelial cells of minor salivary glands ducts of rhesus macaques may be targeted by SARS-CoV after 48 hours from infection5. Therefore, it s conceivable that CoV-2 may also infect salivary gland ductal cells. Moreover, ACE 2 is also expressed in esophagus and other epithelial cells of the gastrointestinal tract, including enterocytes (see Figura 3A in Lamars et al. Science 2020). Indeed, it has been recently shown that these latters express high levels of ACE2 and may be productively infected by CoV-26.

Transmission of CoV-2

As every respiratory virus, CoV-2 transmission relies on the emission of droplets, mainly by coughing and/or sneezing (Figure 2). Whereas an individual may emit by coughing about 75 000 droplets/coughs7, it has been recently demonstrated that ordinary speaking may produce a higher number of droplets, which may represent vehicles of infection. Furthermore, it has been shown that this number differs according to the voice loudness.

Figure 2 Relationship between droplets size and airborne dispersion. 

Relationship between droplets size and airborne dispersion.

[Based upon International Federation of Infection control,, 2016]

Droplets size may differs from 1 to 500 μm, but, as small droplets evaporate quickly and larger droplets are subjected to gravitational settling velocities, droplets of medium diameter (that is from 30 to 50 μm) reach the maximum horizontal  distance, which has been estimated in less than 1 m when breathing till 6 m when sneezing8.

Loud speaking may produce 2600 particles per second. These particles dehydrated quickly (resulting in droplet nuclei). For example, droplets with a diameter of 12-21 μm, with a volume of 60 to 320 nL, dehydrated rapidly to 4 μm. The half life of these particles is 8 minutes and, considering for CoV-2 a viral load in oral fluid of about 7 · 106 copies per milliliter (mL) (the maximum average load has been estimated in 2.35 · 1011copies/mL9), it has been calculated that 1 minute of loud speaking may generate 1000-virion containing droplet nuclei, remaining in the air between 8 and 14 minutes.

The probability that small particles contain virions declined from 37% of 50 μm diameter droplets, to 0.37% for particles of 10 μm, while the probability for these latter to contain more than one virion is negligible10. Smaller particles may remain indefinitely in the air; however, for dehydrated particles of 1 μm (starting from 3 μm hydrated droplets) the probability to contain a virion is 0.01%10 (see Figure 2B in Asadi et al., Sci Rep 2020).

Although particle emission rate increase with loudness, the size distribution of particles is not affected. Moreover, the existence of droplets “superemitters” has been demonstrated, which may explain the occurrence of CoV-2 “superspreaders”11.

Fecal transmission has been also hypothesized and is still a debating issue. CoV-2 RNA copies have been detected in stool, independently from positive or negative naso-pharyngeal swab and from the presence or absence of symptoms12. The detection of viral RNA in stool does not mean that stool contain infectious virions. Nevertheless, it has been reported the presence of vital virions in stool specimens, as observed by electron microscopy13. The viral load of stool specimens has been determined recently14. 0.6-0.7/mL RNA copies of has been detected in stool specimens of patients without or with diarrhea, respectively. This is an important point, since CoV-2 may have low infective dose. Therefore, although it is believed that stool viral load should be low, fecal transmission should be considered as a concern. Furthermore, viral RNA has been detected also in urine and blood13. Contamination may occur also by contact with surfaces onto which droplets deposit. In fact, droplets half-life changes according to different materials15.

COVID-19 diagnosis

A variety of methods are currently employed to detect CoV-2 and allow COVID-19 diagnosis.

Molecular tools

Polymerase Chain Reaction (PCR)-based methods

The initial molecular identification of CoV-2 relied on reverse transcription-polymerase chain reaction (RT-PCR) methods from naso-pharyngeal or oro-pharyngeal swabs, which may be performed euther by the SYBRGreen® or TaqMan® procedures (Video 1).

At the beginning of the pandemic, the Food and Drug Administration (FDA) approved a SARS-CoV-2 ommercial test system from Roche (cobas® SARS-CoV-2). This test takes 3.5 hours to yield the results. Therefore, later, another faster system has been approved, the Xpert® Xpress SARS-CoV-2 from Cepheid Inc (USA), which yields the results within 45 minutes.

Video 1 How TaqMan works.


Loop-mediated isothermal amplification (LAMP)

Loop-mediated isothermal amplification (LAMP) is a cheaper and time-saving method. Furthermore, by using multiple primers, it is more sensitive and specific  (Video 2). Detection of the final products may be performed by gel electrophoresis and/or real time-PCR. This method was already used for detection of other Coronavirus (SARS-CoV16, MERS-CoV17, HCoV-NL6318). With this method as few as 3.4 copies of MERS-CoV have been detected17 and 0.01 plaque forming units (PFU) of SARS-CoV16. For SARS-CoV-2, LAMP has a detection limit of 2 · 102 copies per reaction19.

Video 2 Loop-mediated isothermal amplification (LAMP).

The most recent advance of this techinique is the LAMP-sequencing, which is based on the fragmentation of RT-LAMP products by the Tn5 trasposase which, in parallel, add “codes” (“indexes“, Figure 3) to fragments20. 96 are the codes that may be added, therefore 96 samples may be processed. Thereafter, fragments are amplified by the DNA polymerase and sequenced. This technique is a valid alternative to RT-LAMP20.

Figure 3 Trasposase-dependent DNA fragment generation and addition of index-containing adapters. Fragments are amplified and sequenced. 

Trasposase-dependent DNA fragment generation and addition of index-containing adapters. Fragments are amplified and sequenced.

Recombinase Polymerase Amplification (RPA)

An evolution of PCR-based methods is RPA, which uses two opposing primers, as PCR, complimentary to the target sequence and complexed with the Recombinase protein, forming a D-loop in the double stranded target sequence. Primers are then extended by a mesophilic (that is, stable at moderate temperatures) DNA polymerase (Figure 4). If fluorescent probes are added, RPA may be monitored in real time. This method has been recently demonstrated to detect CoV-2 RNA within 20 minutes, being the first results available after 7 minutes21.

Figure 4 Scheme of a RPA reaction. 

Scheme of a RPA reaction


Microarray-based studies may be also implemented for CoV-2 detection. Other Coronaviruses have been already detected by this method22. Basically, the viral RNA is retrotranscribed in cDNA and categorized by specific probes. Labelled cDNAs are hybridized onto a microarray containing the probes, followed by a series of washes to remove unbound cDNA.

Next Generation Sequencing (NGS)-based methods

NGS methods may simultaneously detect multiple viral genes, by sequencing the pool of RNAs of infected cells. The bioinformatic analysis of the obtained sequences, starting from the alignment with other available sequences in GenBank, leads to the categorization of novel viruses. These methods are also used to identify viral genomic diversity and phylogenesis. For example, very recently, the analysis of 7666 genomic sequences of CoV-2 worldwide led to the identification of recurrent mutations, occurring independently overtime (homoplasies), in specific CoV-2 genomic regions23.

CRISPR/Cas-based technology

The most recent advance for molecular CoV-2 diagnosis is the combinatorial arrayed reactions for multiplexed evaluation of nucleic acid (CARMEN), based on the CRISPR/Cas technology24. The CRISPR/Cas system depends on the use of a RNA guide complexed with the Cas enzyme. If the RNA guide binds to a complimentary sequence in a target nucleic acid, Cas cuts the target. CARMEN takes advantage of Cas13, which cuts only RNA and not DNA.

In this method, Cas13 cuts reporter RNA in a non specific manner when is activated by the recognition of a specific sequence. PCR or RPA are needed in the first step to amplify viral nucleic acids (whether present). A fluorescent dye is mixed to give to the amplified RNA a specific colour code, in a ratio providing 1 to 1050 colour combinations. Different one-nanolitre oil emulsified droplets are then generated for all the different amplification reactions. Another series of emulsified droplets with unique colour codes are also generated.

These coloured droplets contain a quenched fluorescently labelled reporter RNA and Cas13 bound to a guide RNA needed to detect a viral target. The mixed droplets (in a single tube) are loaded onto a chip containing microwells, capable to contain only two droplets. On the chip, each amplified nucleic-acid target is likely exposed to each detection mix, in multiple replicates in different locations. The exposure to an electric field leads to the merge of droplets pairs, initiating the detection reaction. If Cas13, in complex with a guide RNA, recognizes an amplified viral sequence in the same well, Cas13 is activated and generates a fluorescent signal from the reporter RNA, due to its nonspecific RNA-cleavage activity (Figure 5).

This platform is extremely innovative and permit the detection of more than a virus per experiment. It has been demonstrated to distinguish SARS-CoV-2 from other human coronaviruses, SARS-CoV and MERS-CoV24.

Figure 5 Graphical representation of the CARMEN platform.

Graphical representation of the CARMEN platform

Artificial intelligence (AI)

CoV-2 molecular detection by RT-PCR is to date the reference method to diagnose COVID-19 patients. This can be a long process, as it may require multiple tests to ensure the exclusion of false negatives, and does not reconcile with emergency conditions. Therefore, the application of alternative and complimentary diagnostic approaches may be a useful tool for the triage step before the acquisition of a definitive diagnosis.

The rapid diagnosis of CoV-2 infection in hospitals and critical contexts for pandemic containment (e.g. harbours and airports) is mandatory to minimize the risk of viral transmission and to speed the treatment of a symptomatic, false negative patients, before the worsening of the clinical conditions.

Computer-assisted tomography (CT) cannot be applied to diagose mild patients, which may not show the typical radiological signs. Furthermore, chest radiologists my not be available in each hospital. Finally, CT diagnosis may be not completely trustworthy due to human errors, because of erroneous interpretations, not always reproduicible. For these reasons, in the past few years, different authors have suggested the use of artificial intelligence (AI) with the aim to improve diagnostic processes based on the image analyses. AI and related technologies are now further employed in many aspects of our society and have the potential to ameliorate also patients diagnosis. The term AI collects a group of computational techniques. The so-called “machine learning” techniques are the most utilized in the clinics and diagnostics.

Machine learning is a computational technique able to generate models for problem solutions, which learn directly from the data and is at the basis of many AI approaches. A high number of machine learning implementations exists. The most common application in the clinics is the implementation of precision medicine methods, a discipline which foresees the best clinical protocol according to the patient characteristics and treatment context.

The vast majority of these applications requires, to train a learning model, a set of already classified data (e.g. patients chatacteristics upon the onset of a given pathology) from which the computational method is able to extract the prominent features; this approach is called supervised learning, as the algorithm “learns” from known examples.

One of the most successful algorithms applied to machine learning problems is the neural network: a technology available since the sixties, which has been used for decades in healthcare research, to classify patients or to predict putative risk factors for a specific disease. In its simplest formulation the neural network visualizes problems in terms of relationships between inputs and outputs. These relationships are described by defined characteristics, to which the network associate and optimal importance. The more an input characteristic will be fundamental to describe/acquire the output, the more will be the associated importance. For example, in a case of patients classification, the inputs may be the subjects of the investigation (e.g. patients and healthy controls), from which the system automatically determines the diagnosis (e.g. sick or healthy) on the basis of their features (e.g. blood parameters or other clinical characteristics).

The most advanced forms of machine learning are known as “deep learning”: neural network models with an elevated number of layers, which analyze different subsets of problems and which show a very structured topology. A deep learning system may contain and integrate thousands of neural metworks which are involved in the solution of different specific problems, to integrate the results in a single prediction afterwards. These kind of approaches are increasingly applied in medical diagnostics, as they are more efficient than the human eye in evaluating, in a complete manner, complex radio-images. The use of these approaches seems to promise a major diagnostic precision with respect to the previuos generation of instruments and methods.

AI-based applications have been recently applied also for COVID-19 diagnosis, A group of researchers of the eminent Icahn School of Medicine of Manhattan has been able to develop an automatic, neural-network-based algorithm which applies AI principles, in support to molecolar methods. This algorithm has been designed to favour/speed the diagnosis of suspected COVID-19 patients. The software, besides containing an elaborate model to analyze patients radiological data, integrates their clinical history, the results of laboratory tests, different physiological paramteres, symptoms and, whenever possible, data regarding episodes of exposure to other affected individuals or information on the local chains of contagion24. The authors have demonstrated that the neural network is more sensitive and specific (84.3% e 82.8%) than the models based only on radio-images (83.6% e 75.9%) or clinicla data (80.6% e 68.3%)25. Taking into account these evidences, this algorithm may be used to rapidly identify CoV-2-infected individuals, before the results of molecular testing.


As COVID-19 pandemic has been not eradicated yet and is rapidly climbing worldwide, fast COVID-19 diagnosis is mandatory to isolate patients and related contacts. A plethora of methods are currently being implemented, demonstrating how different research and diagnostic expertise may converge on a common field. From standard RT-PCR methods to the last frontiers of CRISPR/Cas technology and AI, a tremendous effort is currently ongoing to allow the fast and confident identification of CoV-2 infected patients, a prerequisite for the setting of proper therapeutic protocols.

For non-experts


RT-PCR is a method based on the production and detection of multiple copies of a gene of interest – in this case, CoV-2-specific genes, such as the Nucleocapsid (N) gene – starting from a RNA molecule. The reverse transcriptase enzyme, using as a template total RNA molecules exctracted from cells, little sequences (oligonucleotides) as primers and nucleotides as “building blocks”, synthesizes a single first strand of complimentary DNA (cDNA). Afterwards, special, heat-resistant bacterial DNA polymerases (such as the Taq polymerase, from Thermus Acquaticus or Pfu from Pyroccocus Furiosus), make multiple copies of the gene starting from the cDNA and specific oligonucleotides, recognizing the target gene, as primers. DNA products may be visualized on an agarose gel, which separate DNA molecules according to their molecular weight. However, this is a qualitative analysis. To monitor the quantity of a transcript in a given sample, real time-PCR has to be used. Real time-PCR methods are based on the emission of fluorescence as the DNA molecules increase in the sample. Double-stranded DNA binding dyes, like SYBRGreen, (less specific) or fluorescent reporter probes, like TaqMan probes, (more specific) may be used. They differ as double-stranded DNA binding dyes bind to every double strand DNA molecule, increasing fluorescence as the amount of DNA molecules increases, while fluorescent reporter probes detect specifically those DNA molecules complimentary to the probe. In this case, fluorescence is released at the end of the DNA synthesis, when the reporter probe is degraded. In each case, the fluorescence is detected by a real time-PCR machine and the quantity of DNA molecules calculated according to the method described in Livak and Schmittgen 200126.


  1. Liu, Y. et al. Viral dynamics in mild and severe cases of COVID-19. Lancet Infect Dis 20, 656-657, doi:10.1016/S1473-3099(20)30232-2 (2020).
  2. Chen, L., Li, X., Chen, M., Feng, Y. & Xiong, C. The ACE2 expression in human heart indicates new potential mechanism of heart injury among patients infected with SARS-CoV-2. Cardiovasc Res 116, 1097-1100, doi:10.1093/cvr/cvaa078 (2020).
  3. Xu, H. et al. High expression of ACE2 receptor of 2019-nCoV on the epithelial cells of oral mucosa. Int J Oral Sci 12, 8, doi:10.1038/s41368-020-0074-x (2020).
  4. Song, J. et al. Systematic analysis of ACE2 and TMPRSS2 expression in salivary glands reveals underlying transmission mechanism caused by SARS-CoV-2. J Med Virol, doi:10.1002/jmv.26045 (2020).
  5. Liu, L. et al. Epithelial cells lining salivary gland ducts are early target cells of severe acute respiratory syndrome coronavirus infection in the upper respiratory tracts of rhesus macaques. J Virol 85, 4025-4030, doi:10.1128/JVI.02292-10 (2011).
  6. Lamers, M. M. et al. SARS-CoV-2 productively infects human gut enterocytes. Science, doi:10.1126/science.abc1669 (2020).
  7. Lindsley, W. G. et al. Quantity and size distribution of cough-generated aerosol particles produced by influenza patients during and after illness. J Occup Environ Hyg 9, 443-449, doi:10.1080/15459624.2012.684582 (2012).
  8. Xie, X., Li, Y., Chwang, A. T., Ho, P. L. & Seto, W. H. How far droplets can move in indoor environments–revisiting the Wells evaporation-falling curve. Indoor Air 17, 211-225, doi:10.1111/j.1600-0668.2007.00469.x (2007).
  9. Wolfel, R. et al. Virological assessment of hospitalized patients with COVID-2019. Nature 581, 465-469, doi:10.1038/s41586-020-2196-x (2020).
  10. Stadnytskyi, V., Bax, C. E., Bax, A. & Anfinrud, P. The airborne lifetime of small speech droplets and their potential importance in SARS-CoV-2 transmission. Proc Natl Acad Sci U S A 117, 11875-11877, doi:10.1073/pnas.2006874117 (2020).
  11. Asadi, S. et al. Aerosol emission and superemission during human speech increase with voice loudness. Sci Rep 9, 2348, doi:10.1038/s41598-019-38808-z (2019).
  12. Amirian, E. S. Potential fecal transmission of SARS-CoV-2: Current evidence and implications for public health. Int J Infect Dis 95, 363-370, doi:10.1016/j.ijid.2020.04.057 (2020).
  13. Wang, W. et al. Detection of SARS-CoV-2 in Different Types of Clinical Specimens. JAMA, doi:10.1001/jama.2020.3786 (2020).
  14. Cheung, K. S. et al. Gastrointestinal Manifestations of SARS-CoV-2 Infection and Virus Load in Fecal Samples From a Hong Kong Cohort: Systematic Review and Meta-analysis. Gastroenterology, doi:10.1053/j.gastro.2020.03.065 (2020).
  15. van Doremalen, N. et al. Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1. N Engl J Med 382, 1564-1567, doi:10.1056/NEJMc2004973 (2020).
  16. Kim, J. H. et al. A Simple and Multiplex Loop-Mediated Isothermal Amplification (LAMP) Assay for Rapid Detection of SARS-CoV. Biochip J 13, 341-351, doi:10.1007/s13206-019-3404-3 (2019).
  17. Shirato, K. et al. Development of fluorescent reverse transcription loop-mediated isothermal amplification (RT-LAMP) using quenching probes for the detection of the Middle East respiratory syndrome coronavirus. J Virol Methods 258, 41-48, doi:10.1016/j.jviromet.2018.05.006 (2018).
  18. Pyrc, K., Milewska, A. & Potempa, J. Development of loop-mediated isothermal amplification assay for detection of human coronavirus-NL63. J Virol Methods 175, 133-136, doi:10.1016/j.jviromet.2011.04.024 (2011).
  19. Baek, Y. H. et al. Development of a reverse transcription-loop-mediated isothermal amplification as a rapid early-detection method for novel SARS-CoV-2. Emerg Microbes Infect 9, 998-1007, doi:10.1080/22221751.2020.1756698 (2020).
  20. Dao Thi, V. L. et al. A colorimetric RT-LAMP assay and LAMP-sequencing for detecting SARS-CoV-2 RNA in clinical samples. Sci Transl Med 12, doi:10.1126/scitranslmed.abc7075 (2020).
  21. Behrmann, O. et al. Rapid detection of SARS-CoV-2 by low volume real-time single tube reverse transcription recombinase polymerase amplification using an exo probe with an internally linked quencher (exo-IQ). Clin Chem, doi:10.1093/clinchem/hvaa116 (2020).
  22. Chen, Q. et al. Comprehensive detection and identification of seven animal coronaviruses and human respiratory coronavirus 229E with a microarray hybridization assay. Intervirology 53, 95-104, doi:10.1159/000264199 (2010).
  23. van Dorp, L. et al. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect Genet Evol 83, 104351, doi:10.1016/j.meegid.2020.104351 (2020).
  24. Ackerman, C. M. et al. Massively multiplexed nucleic acid detection with Cas13. Nature 582, 277-282, doi:10.1038/s41586-020-2279-8 (2020).
  25. Mei, X. et al. Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat Med, doi:10.1038/s41591-020-0931-3 (2020).
  26. Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 402-408, doi:10.1006/meth.2001.1262 (2001).

Versione stampabile

Prosegui la lettura


Lascia un commento