Can upper respiratory tract specimens be used to diagnose Tuberculosis?

In a recent study published in The Lancet Microbe, researchers evaluated the diagnostic accuracy of upper respiratory tract samples for active pulmonary tuberculosis (TB) compared to standard sputum or gastric aspirate testing.

Study: Accuracy of upper respiratory tract samples to diagnose Mycobacterium tuberculosis: a systematic review and meta-analysis. Image Credit: Gorodenkoff/Shutterstock.comStudy: Accuracy of upper respiratory tract samples to diagnose Mycobacterium tuberculosis: a systematic review and meta-analysis. Image Credit: Gorodenkoff/


Diagnosing pulmonary disease caused by Mycobacterium tuberculosis is challenging due to limited testing access, sampling difficulties, and test sensitivity issues.

In 2021, over 4 million out of an estimated 10.6 million TB patients remained undiagnosed, with only 63% of those diagnosed with pulmonary TB confirmed bacteriologically. Bacterial confirmation is crucial for precise diagnosis and effective drug-resistant treatment.

While adults typically use sputum for diagnosis but can alternatively utilize urine, children, who often cannot produce sputum, can resort to alternatives such as induced sputum, nasopharyngeal aspirate, and gastric lavage.

The World Health Organization (WHO) recently recommended stool as a sample for TB diagnosis, but there is a demand for patient-centered, easily accessible samples compatible with various tests.

Upper respiratory tract sampling, which is quick and minimally invasive, could improve diagnosis access, especially for those unable to produce sputum. Further research is crucial to refine these diagnostic methods to serve patient needs better.

About the study

The present systematic review and meta-analysis, following an online-published protocol, was conducted using the expertise of the Liverpool School of Tropical Medicine library.

Comprehensive searches were performed across databases such as Medical Literature Analysis and Retrieval System Online (MEDLINE), Global Health, Cinahl, Web of Science, and Global Health Archive until December 2022.

Search criteria encompassed various terms related to TB and methods of upper respiratory tract sampling.

Compared to traditional methods, the research analyzed studies on the accuracy of upper respiratory tract sampling for TB diagnosis. It included cohort, cross-sectional, and controlled studies from various settings without language barriers, excluding studies not focused on active TB.

Upon gathering titles and abstracts, unrelated ones were excluded, and a subset was double-checked for consistency by another reviewer. Two independent reviewers determined the relevance of selected manuscripts, and a third settled any disagreements.

For non-English studies, translations were conducted, and data were extracted using a structured form; when manuscripts had multiple datasets, each was treated as a separate report.

For Analysis, data were summarized, and forest plots were created for different sampling methodologies. Using a hierarchical random effects model, pooled sensitivity and specificity were determined.

Meta-regression, accounting for two reference standards, was conducted, and diagnostic odds ratio, along with positive and negative likelihood ratios, were computed. For child studies, separate models using a combined reference standard were employed.

The study’s registration was with the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42021262392), and the risk of bias was gaged using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.

Study results 

In the comprehensive research analysis, 9,680 studies were identified up to January 2021, and an additional 1364 were found with an extended search until December 2022. After removing duplicates, 10,159 studies remained, of which 71 were included in the systematic review.

This compilation incorporated reports on test comparisons, and a subset was included in meta-analyses. These studies differentiated upper respiratory tract sampling into four primary categories: nasopharyngeal aspirate, laryngeal swabs, oral swabs, and a miscellaneous group containing nasal swabs, mouthwash, saliva, and other mucosal or dental samples.

These studies span from May 1933, to December 2022, and originated from various countries, including the United Kingdom, South Africa, Norway, and more, with some nations only contributing a single study.

The data included 24,899 samples from diverse locations like clinics, hospitals, and sanatoriums, with some studies specifically targeting children and others focusing on adults. Among the demographic data, 57.7% of 3,173 participants were male, 42.3% were female, and 19.5% of 3,709 participants tested HIV positive.

Further insights reveal that 41 reports assessed the accuracy of laryngeal swabs. From May 1941 to March 1968, these studies showcased varying methodologies depending on the available technology.

Utilizing cultures of expectorated sputum or gastric lavage as the primary reference test, the Analysis revealed a specific level of sensitivity and specificity; however, meta-regression highlighted a notable variance in specificity, indicating significant inconsistency across the studies.

Nine studies conducted between November 1998 and May 2021, exclusively evaluated the accuracy of the nasopharyngeal aspirate in children. Different methodologies were employed, and a subsequent meta-regression, factoring the types of reference tests used, showed a significant difference in specificity.

Additionally, 18 studies looked into oral swab samplings compared to microbiological reference standards.

The Analysis of these studies between 2015 and 2022 pointed towards a certain sensitivity and specificity rate. Further meta-regression detected a significant impact on specificity based on the chosen reference standard.

Some studies analyzed alternative sample types, including mouthwash and saliva, but their number was limited and did not permit a complete meta-analysis.

Finally, older studies, especially those before 1950, lacked reporting on specific domains in evaluating the risk of bias. More recent studies also displayed high risk in participant selection due to diverse designs and varied reference standards.