Thyroid nodules: diagnostic evaluation based on thyroid cancer risk assessmentBMJ 2020; 368 doi: https://doi.org/10.1136/bmj.l6670 (Published 07 January 2020) Cite this as: BMJ 2020;368:l6670
- Naykky Singh Ospina, assistant professor of medicine, endocrinologist1,
- Nicole M Iñiguez-Ariza, assistant professor of medicine, endocrinologist2,
- M Regina Castro, associate professor of medicine, endocrinologist3
- 1Division of Endocrinology, Department of Medicine, University of Florida, Gainesville, FL, USA
- 2Department of Endocrinology and Metabolism, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico
- 3Division of Endocrinology, Department of Medicine, Mayo Clinic, Rochester, MN, USA
- Correspondence to: N Singh Ospina
Thyroid nodules are extremely common and can be detected by sensitive imaging in more than 60% of the general population. They are often identified in patients without symptoms who are undergoing evaluation for other medical complaints. Indiscriminate evaluation of thyroid nodules with thyroid biopsy could cause a harmful epidemic of diagnoses of thyroid cancer, but inadequate selection of thyroid nodules for biopsy can lead to missed diagnoses of clinically relevant thyroid cancer. Recent clinical guidelines advocate a more conservative approach in the evaluation of thyroid nodules based on risk assessment for thyroid cancer, as determined by clinical and ultrasound features to guide the need for biopsy. Moreover, newer evidence suggests that for patients with indeterminate thyroid biopsy results, a combined assessment including the initial ultrasound risk stratification or other ancillary testing (molecular markers, second opinion on thyroid cytology) can further clarify the risk of thyroid cancer and the management strategies. This review summarizes the clinical importance of adequate evaluation of thyroid nodules, focuses on the clinical evidence for diagnostic tests that can clarify the risk of thyroid cancer, and highlights the importance of considering the patient’s values and preferences when deciding on management strategies in the setting of uncertainty about the risk of thyroid cancer.
Thyroid nodules are extremely common and are frequently identified in patients with no symptoms who are undergoing evaluation for other medical conditions.12 Their incidental detection often precedes the diagnosis of thyroid cancer, and an unselective biopsy strategy of all newly detected thyroid nodules can lead to harm. Similarly, an extremely conservative strategy for thyroid biopsy could result in a missed diagnosis of a clinically relevant thyroid cancer.3 New evidence supports careful selection of patients who should undergo thyroid biopsy based on the assessment of risk of thyroid cancer. The presence of high risk features for thyroid cancer and nodule size on an ultrasound scan can help to determine the need for further diagnostic investigation with a fine needle aspiration biopsy (FNA).45 Once the results of a thyroid FNA are available, the risk of thyroid cancer can be estimated and can guide the next steps in management in patients with benign or malignant results.46 Management of patients with indeterminate FNA results is more challenging, as the estimated risk of thyroid cancer is highly variable (5-75%).467 For these patients, a careful evaluation of their situation (that is, clinical presentation, risk factors for thyroid cancer, values, context, and preferences), as well as other ancillary studies (for example, molecular markers and second opinion on thyroid cytology) can help to determine a more individualized risk of thyroid cancer and guide subsequent management.78
In this review, we summarize unique features of the epidemiology of thyroid nodules and the clinical evidence guiding our diagnostic approach. We focus on the concept of risk assessment for thyroid cancer at different stages of the diagnostic pathway for thyroid nodules and highlight management strategies as a response to different risks of thyroid cancer. We suggest that further studies should focus on distinguishing clinically relevant thyroid cancer from low risk thyroid cancer and benign thyroid nodules. Clinical recommendations are often based on imperfect evidence with methodological limitations that can result in broad estimates of any individual’s risk of thyroid cancer. Understanding the patient’s values and preferences before making decisions about further intervention is therefore important.
Sources and selection criteria
We did a literature search designed by an expert medical librarian (LP) and including the following search terms: thyroid nodule, thyroid cancer/neoplasm/carcinoma/metastasis, thyroid ultrasound, fine needle aspiration/biopsy, genetic/molecular markers. We searched EBM Reviews - Cochrane Central Register of Controlled Trials December, EBM Reviews - Cochrane Database of Systematic Reviews, Embase, and Ovid Medline(R) from inception to January 2019. We included peer reviewed publications without any language restrictions. We excluded case reports and studies published in non-peer reviewed journals. Two authors (NSO, NIA) reviewed the titles and abstracts of these articles and categorized them according to topic (epidemiology, risk factors, diagnosis, treatment, follow-up) and by type of article (guidelines, systematic reviews, randomized trials, observational studies). We selected articles for inclusion in each section of the manuscript on the basis of their design, size, and protection against bias. We favored randomized clinical trials, observational studies of high quality (that is, large and representative of the population, adequate statistical methods with adjustment for confounders, protection against bias), systematic reviews, and clinical practice guidelines. We reviewed the references of relevant systematic reviews and clinical practice guidelines for completeness, as well as relevant articles identified by authors and reviewers.
Incidence and prevalence
Thyroid nodules are common. Their prevalence can be affected by several factors, such as iodine sufficiency status and age, and detection rates differ according to the modality of imaging used and the experience of the operator.29
Depending on the mode of discovery, the detection rate of thyroid nodules varies widely from 4% to 67%. They are found by physical examination in around 4-7% of the population, with a higher detection rate of 30-67% by ultrasound.2910 In a cross sectional study in Germany (an iodine deficient country), of 96 278 people screened by high resolution ultrasound, goiter and/or nodules larger than 0.5 cm were found in 33%.9 A retrospective study of consecutive patients attending preventive check-ups showed a noticeable higher prevalence of 68% when 13 MHz ultrasound was used.2 As expected, thyroid nodules are often discovered incidentally by multiple imaging modalities.1112131415161718 On computed tomography, the prevalence of incidental thyroid nodules ranges from 5% to 25%.1213141516
The prevalence of thyroid nodules increases with age, from approximately 42% for younger patients (<40 years) to about 76% in the older population (>61 years) when screened by ultrasound.2 In a Chinese centenarian cohort study of 874 people screened with ultrasound, the overall prevalence rate of thyroid nodules was 74%.19
Clinical significance and association with overdiagnosis of thyroid cancer
The incidence of thyroid cancer has increased worldwide over the past three to four decades. In the US, a retrospective population based evaluation of patients with thyroid cancer found that the incidence increased from 3.6 per 100 000 in 1973 to 8.7 per 100 000 in 2002, a 2.4-fold increase; almost all the increase was attributable to papillary thyroid cancer (PTC) histology.20 In this analysis, 49% of the increase between 1988 and 2002 consisted of microcarcinomas (≤1 cm) and 87% were small tumors (≤2 cm). Mortality from thyroid cancer remained stable.
Screening for thyroid cancer in South Korea led to an epidemic of thyroid cancer. Diagnoses of thyroid cancer increased 15-fold between 1993 and 2011, with the entire increase ascribed to PTC detection and no change in thyroid cancer mortality.2122 Similarly, the effect of diagnostic changes with increased access to healthcare, improved diagnostic technology, and increased surveillance may account for more than 60% of thyroid cancer diagnoses in selected high income countries.323 Taken together, these results suggest that increased incidental detection of highly prevalent thyroid nodules is associated with overdiagnosis of thyroid cancer. However, overdiagnosis is unlikely to be the only driver of the current thyroid cancer epidemic. A population based study (1980-2005) found that the increase in detection corresponded not only to small PTCs but also to larger PTCs (1992-95, 20% of the increase due to tumors >2 cm).24 Another study using cancer registry data (1974-2013) found that PTC incidence increased for all sizes and stages, including larger tumors (tumors >4 cm increased by 6.1% per year). During 1994-2013, incidence based mortality increased 1.1% per year overall (from 0.40 per 100 000 person years in 1994-97 to 0.46 in 2010-13).25
The widespread use of highly sensitive imaging techniques is likely contributing to the current epidemic of thyroid nodules and low risk PTC, with the associated risks of overdiagnosis and overtreatment. The US Preventive Services Task Force recommends against screening for thyroid cancer in adults without symptoms. However, in the case of thyroid cancer, indirect detection through imaging frequently occurs.2627
Pathogenesis of thyroid nodule formation
Follicular cells are naturally heterogeneous, with thyroid follicles having variable growth capability and sensitivity to thyroid stimulating hormone (TSH). Iodine deficiency is associated with thyroid gland growth and propensity for thyroid nodule development.28 Somatic mutations can occur, giving thyrocytes different growth potential.2930 A BRAF mutation occurs frequently in PTC (60%),31 whereas a RAS driver mutation is characteristic of follicular adenomas, follicular thyroid cancer,32 and follicular variants of PTC, as well as the newly named non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP). The latter originated from the reclassification of non-invasive encapsulated follicular variants of PTC to NIFTP with the intent of eliminating the word “cancer” from the name of the tumor.33 NIFTP was included as a new tumor in the 2017 World Health Organization Classification of Tumors of Endocrine Organs. The American Thyroid Association (ATA) endorses this reclassification to NIFTP, given the excellent prognosis of these very low risk neoplasms.34
Risk factors for thyroid nodules and thyroid malignancy
Most thyroid nodules are benign.35 Exposure to ionizing radiation is one of the best characterized risk factors for development of thyroid cancer.363738394041 Similarly, a strong U-shaped association exists between iodine intake and the frequency of diffuse goiter, with low and high iodine intakes associated with increased rates of thyroid disease; nevertheless, the risk of nodular goiter is perhaps increased only at low intakes.424344Table 1 details other risk factors.1835384546474849505152535455565758596061
Clinical evaluation of patients with thyroid nodules
The goals of management when caring for a patient with thyroid nodules are largely determined by the presence or absence of symptoms, laboratory testing, and assessment of risk of thyroid cancer. For euthyroid patients with no symptoms, the risk of thyroid cancer usually guides management strategies (fig 1).
Symptoms related to thyroid nodules include a palpable neck mass, anterior neck pain, globus sensation, dysphagia, dyspnea, and dysphonia/hoarseness.62 Swallowing complaints are common but non-specific, with more than 50% associated with laryngopharyngeal reflux.6364 The onset and rate of progression of symptoms are important in terms of risk of thyroid cancer, with persistent hoarseness and rapidly growing thyroid nodules more likely to indicate a malignant cause.6566
The ATA guidelines recommend routine TSH measurement during the evaluation of thyroid nodules, to rule out hyperthyroidism, as most hyperfunctioning thyroid nodules are benign. Controversy exists about the benefit of routine measurement of serum calcitonin.4 The specificity of an elevated serum calcitonin (>100 pg/mL) for diagnosis of medullary thyroid cancer (MTC) is improved with pentagastrin stimulation; however, pentagastrin is no longer available in the US and supply is limited in Europe.67 The ATA recommends against routine measurement of calcitonin during thyroid nodule evaluation, whereas in Europe it is considered standard of care; nevertheless, some authors consider routine calcitonin measurement of uncertain benefit during initial thyroid nodule work-up.6768
Assessment of thyroid cancer risk
Clinicians will estimate the risk of thyroid cancer and assess the need for thyroid biopsy on the basis of clinical, laboratory, and ultrasound findings. If a thyroid biopsy is performed, the next management step (observation or surgery) is based on the FNA result, molecular markers and/or repeat FNA, and the individual patient’s preferences and context. This decision making and management process is based on a clinical response to variable risks for thyroid cancer that should incorporate all the available clinical evidence.469 We will critically appraise the literature evaluating commonly used tests to estimate the risk of thyroid cancer.670
Limitations of diagnostic literature for thyroid nodules
Conducting studies at low risk of bias on diagnosis of thyroid nodules is challenging, as only a relatively small proportion of patients with thyroid nodules undergo final histological assessment.67172 Moreover, the recent reclassification of NIFTP as an entity with indolent behavior affects the diagnostic estimates. Table 2 summarizes other methodological limitations.337374
Clinical evaluation has focused on differentiating benign from malignant thyroid nodules. However, the use of clinically relevant thyroid cancer as a new outcome can help clinicians to differentiate between benign and low risk thyroid cancer and to identify those lesions that would result in important adverse outcomes to patients. No consensus exists on how to define this concept, but definitions based on high risk histopathological variants have been proposed.75 Other factors that could help to identify clinically relevant thyroid cancer include size of thyroid nodules, presence of cervical lymphadenopathy, and results of cytology and molecular markers. For example, thyroid cancers (PTC) of 4 cm or larger are associated with more aggressive behavior and smaller tumors of 1.5 cm or less with an overall good prognosis based on observational studies.476777879 Similarly, the presence of suspicious lymphadenopathy is associated with concern for clinically relevant disease and a lower size threshold for FNA.4
The Bethesda FNA reporting system can also provide prognostic information.8081 A study including 1291 thyroid malignancies reported an increased risk of high risk malignancy between the atypia of undetermined significance or follicular lesion of undermined significance (AUS/FLUS) (4%), suspicious for malignancy (9%), suspicious for follicular neoplasm (14%), and malignant (27%) categories. A composite endpoint of recurrence/distant metastases/death also increased between these categories.80 An evaluation of 42 studies and 11 109 patients concluded that the identification of specific mutations such as RAS, TERT, and RET/PTC was associated with distant metastases in patients with PTC.82
Estimating the risk of thyroid cancer and need for thyroid biopsy
Clinical and laboratory findings
A meta-analysis of observational studies showed an increased risk of malignancy in patients with thyroid nodules who were male, had a family history of thyroid cancer, or had a history of head or neck irradiation.83 Included studies were at moderate risk of bias (table 1 and table 3).838485 No association was found between age or serum TSH concentrations and the risk of thyroid cancer.83 Two large observational studies including around 10 000 patients and a meta-analysis including 28 observational studies reported an increase in thyroid cancer with increases in serum TSH, even with values within the normal range.85868788 However, the diagnostic accuracy of TSH for the diagnosis of thyroid cancer was limited.8689 This evidence is limited by confounders, different baseline risks of thyroid cancer, and inappropriate comparisons.83868990
Observational studies have shown increased risk of thyroid nodular disease with advancing age.18384 A prospective study including 6391 patients with thyroid nodules of 1 cm or larger found a lower prevalence of thyroid cancer (13%) for patients in the highest age group (>70 years) than for those in the youngest age group (20-30 years, prevalence 23%), suggesting a decreased risk of thyroid cancer with advancing age. However, the risk for high risk thyroid cancer was higher in patients in the oldest age group.184
Thyroid ultrasound is extremely important in the evaluation of thyroid nodules, as it can clarify the presence, location, and size of nodules and the risk of thyroid cancer.4 A systematic review including 14 studies at moderate risk of bias found the odds ratio for thyroid cancer to be lower in patients with a multinodular goiter than in those with single nodules (0.8, 95% confidence interval 0.67 to 0.96).91 Initial observations associated individual ultrasound features such as nodule composition, echogenicity, shape, margins, and presence of echogenic foci or suspicious cervical lymphadenopathy with the risk of thyroid cancer.45 A meta-analysis evaluating 31 observational studies reported a high diagnostic odds ratio for malignancy for thyroid nodules that were taller than they were wide (11.1, 6.6 to 18.9) and for benign disease for those with spongiform appearance (12.0, 0.6 to 234.3).92 Subsequent studies used a combined assessment of ultrasound features either qualitatively or quantitatively to develop ultrasound risk stratification systems that can estimate the prevalence of malignancy (POM) (table 4 and table 5).4593949596979899100101102103104105106107108109110111
Studies that evaluate the performance of these ultrasound stratification systems provide clinicians with an estimated POM according to the category and an expected clinical performance in terms of avoided biopsies and missed thyroid cancer diagnoses, according to different biopsy thresholds. The final goal is to identify the size threshold that would allow us to identify clinically relevant thyroid cancer.
A comparative analysis of the ATA, Korean Thyroid Imaging Reporting and Data System (K-TIRADS), and American College of Radiology Thyroid Imaging Reporting and Data Systems (ACR-TIRADS) in 902 nodules, with a mean diameter of 1.5 cm and a 30% malignancy rate, found that the ATA and the Korean systems had higher sensitivity for diagnosis of thyroid cancer (95% v 100% v 80%) but a lower specificity than the ACR (38% v 28% v 69%).100 A similar study including 3422 nodules noted that the ATA and Korean systems have the limitation of not including categories for all available nodules, with about 10% of these non-classifiable nodules harboring malignancy.104
A study of 1000 nodules found the ATA high risk category had a higher sensitivity and negative predictive value, whereas the TIRADS-5 category was more specific and had a higher positive predictive value. In this study, very hypoechoic nodules, those taller than they were wide, and those with punctate, macro, or peripheral calcification harbored malignancy in more than 70% of cases. A non-significant difference in the rate of detection of thyroid cancer was noted, favoring the ATA guidelines (97% v 91%; P=0.180). As expected, this was due to a lower number of FNAs recommended by TIRADS (482 v 582 for ATA), with a rate of unnecessary FNAs of 23% for ACR-TIRADS and 35% for ATA.108 However, the definition of unnecessary biopsies was based on a benign result; clinically, other factors might justify the need for biopsy.
A study that evaluated 502 nodules, with a malignancy rate of 7%, found that the ACR-TIRADS system had the highest rate (53%) of avoidance of thyroid biopsy. This study offered further insight into the causes of missed thyroid cancer diagnosis. All systems missed three thyroid cancers that were isoechoic or hyperechoic without any other suspicious features (1.1-1.4 cm, PTC). The ATA system missed five nodules that were isoechoic but had irregular margins and were not classifiable. Two solid, mildly hypoechoic nodules measuring 1.3 and 1.2 cm were missed by the ACR-TIRADS system but not by the ATA system, likely because of the lower biopsy threshold at 1 cm for intermediate risk nodules in the ATA system. A total of 11 malignancies were missed by at least one of the ultrasound systems.98
The diagnostic accuracy of ultrasound risk stratification systems is largely based on the diagnosis of differentiated thyroid cancer, mostly PTC. A systematic review that included 249 cases of MTC found that 97% of the cases were of high or intermediate suspicion on ultrasound according to the ATA criteria, suggesting that high risk features might also help to identify patients with MTC and supporting the biopsy threshold of 1 cm for ATA intermediate risk nodules.112
One study evaluated the likelihood for high risk FNA results (high risk indeterminate lesion, suspicious for malignancy, and malignant; Italian reporting) according to ultrasound assessment. The odds ratio for a high risk cytological result was 4.1 for TIRADS-4 and 25 for TIRADS-5; odds ratios were 3.3 for the ATA intermediate suspicion category, 20 for the high suspicion category, and 7.2 for those that were not classified.113
A study that included 3323 consecutive thyroid nodules with a mean size of 1.4 cm and a 26% rate of malignancy, with more than 58% of nodules measuring at least 1 cm, explored the effect of size thresholds. Using the current FNA indications for ATA, K-TIRADS, and ACR-TIRADS, the latter was associated with the highest specificity (75%), the lowest sensitivity (60%), and the lowest rate of unnecessary biopsy (21%), compared with 34%, 93%, and 55% for ATA and 29%, 93%, and 60% for K-TIRADS. However, if the threshold for FNA was increased from 1.0 cm to 1.5 cm for intermediate risk thyroid nodules and to 2.5 cm for low suspicion thyroid nodules, and FNA was not indicated for the very low risk nodules, the rates for the ATA system would be 76%, 61%, and 20%, comparable to those for ACR-TIRADS.101
A multicenter prospective study that included 380 patients with 948 thyroid nodules evaluated the reproducibility of the ACR-TIRADS classification among three experienced radiologists. The overall agreement for ACR-TIRADS categorization was 0.636.114 A prospective study evaluated two sets of around 500 thyroid nodule images reviewed by two clinicians with six years of experience and found that the inter-observer agreement increased after a joint review and discussion about discrepancies.115 Reproducibility of ultrasound assessment is important, as most studies are reported from centers with high expertise, and the diagnostic performance for less experienced users is unclear. Computer aided diagnosis systems can aid in the accurate identification of ultrasound features, with promising results, and could assist in the implementation of ultrasound risk assessment in practice.116117
A meta-analysis including seven studies and 10 817 thyroid nodules with definitive histological diagnosis found an increased risk of thyroid cancer (odds ratio 1.26, 1.13 to 1.39) for nodules measuring 3-5.9 cm compared with those measuring less than 3 cm and a decreased risk of thyroid cancer in nodules measuring more than 6 cm (odds ratio 0.84, 0.73 to 0.98) compared with those smaller than 3 cm.118 A systematic review including 15 studies and 13 180 participants found inconsistent results related to the association of size and the POM.119 Most studies have not adjusted for ultrasound risk pattern and other clinical variables that can affect the POM to a larger extent than size of thyroid nodules.120121 A study of 2000 consecutive thyroid nodules of at least 1 cm found no change in the risk of thyroid malignancy with increase in nodule size. However, the POM in nodules of 3 cm or larger compared with those below 3 cm was higher for those in the intermediate risk (40% v 23%) and low suspicion (11% v 7%) categories.121 These studies are limited by confounders and inappropriate exclusions.
Estimating the risk of thyroid cancer and management after thyroid biopsy
Fine needle aspiration biopsy
FNA is a safe procedure that helps to distinguish benign from malignant thyroid nodules. Serious complications are rare, although pain and small hematomas can occur.124125 The main limitation of this diagnostic test is the indeterminate result found in around 20-30% of samples.126127
A systematic review that included 24 articles and 4428 thyroid nodules found that the capillary technique had a pooled rate of non-diagnostic results of 13% compared with 16% for the aspiration technique (relative risk 0.57, 95% confidence interval 0.34 to 0.92). In addition, studies including FNAs using a larger needle (21-23 g) had a pooled non-diagnostic rate of 19.2% compared with 14% for studies including FNAs using a smaller gauge (25-27 g) (relative risk 0.60, 0.24 to 1.50).128
Standardized reporting systems for FNA are available. In the US, the Bethesda System for Reporting Cytopathology (BSRTC) is commonly used (table 6). Each FNA result category is associated with an estimated POM and management strategy.47739394129130131 Rates of non-diagnostic specimens vary, with reported rates of approximately 10%.132 The use of rapid onsite evaluation for adequacy is associated with a lower risk of inadequate samples (relative risk 0.44, 0.26 to 0.73); however, the value of this tool is dependent on the baseline risk of inadequacy.133134 Although benign and malignant results can accurately guide clinical decisions, the management of patients with indeterminate FNA results is more challenging given the uncertainty related to risk of thyroid cancer.
The AUS/FLUS category describes a heterogeneous group of aspirates in which atypical changes or features are present (table 7) This category should be used as a last resort and represents no more than 7% of aspirates.135 For many patients (30-50%), a repeat FNA can result in reclassification.136137 A long term follow-up study of patients with 2893 FNAs found adequate diagnostic performance of the Bethesda system in the hands of expert pathologists who were able to avoid the use of the AUS/FLUS category (<1%). This decrease in AUS/FLUS was not associated with an increase in other indeterminate categories, but rather with an increase in benign diagnoses, associated with a negative predictive value of 96%.138
The 2017 BSRTC recommends providing a sub-classification of the AUS/FLUS category.7132 In a meta-analysis including 15 articles, the risk of malignancy varied between these subcategories, with the highest value found for the cytological atypia category (44%, 37% to 52%).139 Another systematic review that evaluated 20 studies and specifically compared indeterminate thyroid nodules with and without nuclear atypia confirmed increased odds of malignancy in those with nuclear atypia (3.63, 3.06 to 4.35).140
An observational study evaluated 776 surgically resected thyroid nodules measuring at least 1 cm in 14 academic centers and 35 community centers. This study reported 90% concordance between pathologists for diagnosis of thyroid cancer, highlighting that even the gold standard used for the diagnosis of thyroid cancer has limitations. Moreover, the concordance in the Bethesda system categories was 64% for central pathologists (three experienced thyroid pathologists), with more variation noted in the AUS/FLUS (35% concordance) and suspicious for malignancy categories; central pathologists were less likely to use an indeterminate diagnosis than were local pathologists (41% v 55%).138141
A systematic review evaluating the value of a second opinion for thyroid cytology and including the evaluation of 7154 thyroid FNAs in nine studies found an overall discordance rate between initial and secondary evaluation of 29%, with a range of 13-60%. Commonly, the second opinion resulted in a final diagnosis of benign or malignant category for those initially labeled as indeterminate and was better supported by diagnostic or follow-up information. The second opinion resulted in changes in clinical management in 30% of the cases, ranging between 15% and 62%.142 These findings underscore the value of expert cytological interpretation of FNAs.
Core needle biopsy (CNB) attempts to overcome some of the diagnostic limitations of FNA. Two meta-analyses, including studies at high risk of bias, found that the overall rate for non-diagnostic results was higher for FNA than CNB.143144 Similarly, a systematic review including studies in which CNB was performed as a first line diagnostic procedure found a proportion of 3.5% for non-diagnostic results and 13.8% for indeterminate results.145 However, a systematic review that included studies at high risk of bias evaluated the outcomes of 14 818 patients who underwent CNB and found an overall complication rate of 1.11%, including rare major complications (permanent voice changes and hematomas requiring hospital admission).146
Another concern about thyroid FNA is the reliability in patients with large thyroid nodules. A systematic review of observational studies that categorized thyroid nodules according to size found a false negative rate of benign nodules for thyroid nodules under 3 cm that ranged from 0% to 22% (median 4.8%) compared with 7-17% (median 11.7%) for those above 3 cm.147 A retrospective study that evaluated 632 nodules of at least 3 cm found an overall false negative rate of 3.6% that increased from 0.9% to 12% according to the ultrasound risk category. A large variation of false negative rate in nodules of 3 cm or larger has been reported in the literature (0.7-25%) suggesting that various clinical factors, rather than just nodule size, might be related to the diagnostic performance of FNA.147148
Molecular markers have the potential to help to guide management in patients with indeterminate results in whom no other indications for surgery exist, and where further refinement of the risk of thyroid cancer is needed to decide between observation and surgery.4 The most studied molecular tests are the Afirma and ThyroSeq tests, with newer versions available (table 7).149150151152153154155156157158159160
The positive and negative predictive values of a test are determined by the prevalence of the disease and the test’s intrinsic diagnostic characteristics. Owing to variability of FNA interpretation and other clinical variables, the pre-test probability associated with the indeterminate categories varies widely among centers, affecting the performance of molecular markers in practice.161 In general, a lower disease prevalence will result in a decrease in the positive predictive value (PPV) and an increase in the negative predictive value (NPV).159162163 As expected, a decrease in the PPV occurs when NIFTP is not considered a malignancy, ranging from 5% to 22%.164
In addition, molecular marker studies commonly report the benign call rate (BCR). The BCR represents the percentage of indeterminate nodules that are found to have a benign or negative molecular result. This variable provides information for management regardless of the rate of surgery; however, it is likely affected by factors leading to the selection of patients for molecular testing.
A review that included 22 studies evaluating the Afirma Gene Expression Classifier (GEC) showed a wide range of sensitivity values between 75% and 100%, specificity of 5-53%, a PPV of 13-100%, and an NPV of 20-100%.165 The newer version, the Afirma Gene Sequencing Classifier (GSC), was evaluated in a retrospective study including 486 samples tested with GEC and 114 tested with GSC, which found a higher BCR for the GSC (66% v 48%; P<0.001). The GSC BCR was higher than the GEC BCR for AUS nodules with changes in Hürthle cells (P=0.0063).166 A retrospective study in a single tertiary center analyzed 343 GEC and 164 GSC tested nodules. GSC had a higher BCR (76% v 48%), PPV (60% v 33%), and specificity (94% v 61%) (P<0.001). Similarly, the BCR was significantly higher in nodules with Hürthle cell changes (89% v 26%).167168
An analysis of eight studies evaluating ThyroSeq v2 found sensitivity values between 40% and 100%, specificity of 56-96%, a PPV of 13-90%, and an NPV 48-97%.165 A retrospective study including 190 indeterminate thyroid nodules found a negative result for ThyroSeq v2 in 76% of the cases, with a higher area under the curve in the Bethesda IV than the Bethesda III nodules (0.84 v 0.57).169 A retrospective multicenter study including 156 indeterminate thyroid nodules evaluated with ThyroSeq v2 found a BCR of 65%. The PPV was 22% (10-38%), and the NPV was 96% (78-99%).170 In a retrospective study including 224 indeterminate thyroid nodules, the BCR for the newer ThyroSeq v3 genomic classifier was 74%.171 These studies suggest that, given the high NPV, a negative result is associated with a significantly lower risk of malignancy, but a positive result could indicate non-specific genetic alterations.169170
This diagnostic literature suggests that molecular markers could inform clinical practice; however, other reports raise concerns about our true understanding of their reproducibility and generalizability. A retrospective study that included thyroid nodule samples from four institutions evaluated the diagnostic performance of the ThyroSeq v2 in patients with indeterminate thyroid nodules. The PPV was 35% (22-43%), and the NPV was 93% (88-100%).172 This variability in results suggests that other clinical variables are important when integrating molecular markers in practice.172173 For example, in thyroid nodules under the AUS category, the performance of the Afirma GEC varies according to the qualifiers of cytological atypia. In a study of 227 thyroid nodules, the risk of cancer in GEC suspicious nodules that underwent resection was 19% for those with only architectural atypia compared with 57% for those with both cytological and architectural atypia and 45% for those with cytological atypia.173
A meta-analysis evaluated the BCR in patients who had GEC testing. Nineteen publications including 2568 thyroid nodules were compared with the original validation study. Most of the included nodules were AUS/FLUS (73%), and 53% of the nodules underwent surgery, most commonly the GEC suspicious nodules. A correlation between the BCR and PPV fell outside the 95% confidence interval obtained from the original validation study for GEC, suggesting inadequate representation of the original cohort and the effect of other clinical, ultrasound, and institutional factors on the diagnostic performance.174 Similarly, a study evaluating the diagnostic properties of GEC in a group of 145 patients with indeterminate thyroid nodules compared the diagnostic performance by pooling 1303 patients from published series. Results confirmed significant variability in the diagnostic performance that was not completely explained by different disease prevalence.175
These limitations mean that the best way to integrate molecular markers into clinical practice that will result in benefit for most patients is unknown, leading to concerns about inadequate implementation. For example, two small retrospective studies (114-140 patients) have found that molecular markers alter the management plan in around 8-10% of patients compared with management algorithms.176177 A cohort study of 134 patients with nodules suspicious for Hürthle cell neoplasm or AUS/FLUS concerning for Hürthle cell neoplasm evaluated by the Afirma GEC found a malignancy rate for suspicious GEC nodules of only 14%, suggesting a high rate of unnecessary surgery in this group.178 In a retrospective study that included 649 patients with a single indeterminate thyroid nodule, the rates of total thyroidectomy (45% v 28%) and central lymph node dissection (19% v 12%) were higher in those who had oncogene panel evaluation than in the control group, driven by more aggressive management of those with a positive oncogene panel.179 Although high cost and their availability in different countries/settings are usually discussed as limitations of molecular markers, a better understanding of their performance in clinical practice is needed to explore their cost effectiveness with more validity.4131162180181
A retrospective study including 3297 patients with proven PTC found the highest odds ratio for malignancy (41-53) in patients with malignant and suspicious for malignancy FNA results, compared with 9 for high risk ultrasound features and 13 for the BRAF mutation. A nomogram that incorporates these variables into a more precise cancer estimate was proposed; similar models are available.113182183184185186187188 A study using the K-TIRADS classification and the FNA cytology result confirmed the expected relation between pre-test probability of malignancy based on ultrasound and FNA result. In this study of 1651 nodules of at least 1 cm, nodules that were benign and had a K-TIRADS of 2-4 were found to have a very low risk of malignancy (0-2.4%) compared with those with K-TIRADS of 5, for which the risk was between 4% and 20%. In the AUS/FLUS and suspicious for malignancy/suspicious for follicular neoplasm category, those with K-TIRADS of 3 had a risk of malignancy of 3.6-20% compared with 34-80% if the K-TIRADS was 4-5.189 A smaller study that included 463 nodules and used the ATA risk classification found that the POM in indeterminate nodules was affected by the ultrasound risk pattern.190 These data suggest that combining predictive factors for thyroid cancer enhances our estimation of the risk of thyroid cancer.
Integration of clinical evidence and patient’s values, preferences, and context
Three moments in the patient’s experience related to a diagnosis of thyroid nodules are critical (fig 1). The first is the detection of the thyroid nodules. Patients who present with symptoms of nodular growth, mass sensation, or compression complaints should be evaluated with the goal of ruling out high risk thyroid cancer, evaluating the need for surgical intervention to alleviate symptoms, or both.46566 On the other hand, the threshold for reporting thyroid nodules found incidentally on imaging studies varies. For example, the European Thyroid Association suggests that thyroid nodules larger than 5 mm should be reported, unless highly suspicious, and the ACR uses age, comorbidities, and size to guide reporting.95191
The second pivotal step is the decision between performing an FNA or follow-up with serial ultrasound. This decision should take into consideration not only clinical and radiological features, but also the patient’s preferences and context.69192 We expect variation in the values and preferences of patients in the diagnostic process of thyroid nodules. As a result, active participation of patients in the decision making process is extremely important.6192193 For example, a young patient with family history of thyroid cancer and previous neck radiation due to non-thyroid malignancy might find value in obtaining further diagnostic information on a thyroid nodule measuring 1.2 cm, intermediate risk for thyroid cancer by ATA (threshold for biopsy of 1 cm), and ACR-TIRADS 4 (threshold for biopsy of 1.5 cm). Although the risk of thyroid cancer significantly affects the thresholds and clinical recommendations for FNA by clinical guidelines, a survey study of 196 patients who had just undergone FNA found that 56% were not aware of their risk of thyroid cancer.193
Finally, once the FNA results are available, clinicians should not forget the pre-test probability of thyroid cancer that was derived from pre-biopsy testing (ultrasound, clinical variables) and should estimate a new risk for thyroid cancer. The next management decision (surgery, observation, further testing) should be guided by a collaboration with the patient (fig 2).194 For example, an older patient with multiple medical comorbidities and a 2 cm intermediate risk thyroid nodule (ATA) and AUS cytology might find high value in further refining the risk for thyroid cancer before deciding on diagnostic thyroidectomy, compared with a healthy young patient with a low threshold for uncertainty and other risk factors for thyroid cancer.46
Management and follow-up
Management recommendations will depend on our certainty about an underlying malignant process and ideally the likelihood of a clinically relevant malignancy. For patients with benign disease, surgery is reserved for symptomatic disease.4 For patients with malignancy, surgery is usually recommended, with the extent of surgery guided by initial clinical features and active surveillance considered for those with micro-PTC.4195 For patients with indeterminate thyroid nodules, the next step in management will depend on their risk of thyroid cancer after diagnostic evaluation and consideration of the values, preferences, and context of the patient.4
In general, for those thyroid nodules in which follow-up ultrasound is recommended, the frequency varies according to the ultrasound features and FNA results. Overall, the more suspicious the thyroid nodule pattern is on ultrasound, the sooner a follow-up scan should be performed (table 8).4593949596196
If thyroid nodules grow during follow-up, most guidelines recommend repeat FNA (table 8). However, the value of thyroid nodule growth in predicting malignancy in thyroid nodules with benign cytology has been challenged. A meta-analysis of 2743 patients showed that thyroid nodule growth cannot accurately discriminate between benign and malignant lesions, although there is low confidence in the body of evidence.197
Thyroid nodules’ doubling time is not a reliable indicator of their benign or malignant nature. In a small retrospective study of 61 thyroid nodules using a threshold of 1100 days, doubling time to predict malignancy had a sensitivity and specificity of 19% and 87%, respectively.198 In a recent retrospective cohort study of a 100 consecutive patients with follicular neoplasms (FNs), the size of most FNs increased exponentially, with tumor growth rates similar for benign and malignant thyroid nodules.199
For patients with small thyroid nodules with proven PTC or thyroid nodules of less than 1 cm with high risk ultrasound features, active surveillance is an alternative to surgery. Data from a large cohort of 1235 Japanese patients (mean follow-up 6.25 years) with low risk micro-PTC who opted for an observation approach found that an increase in tumor size of 3 mm or more occurred in only 8% and novel regional lymph node metastases in 3.8%, with young age associated with higher likelihood of progression.200201 Recently, a US tertiary referral center has shown the feasibility of this strategy for low risk PTC of 1.5 cm or less, based on data from a cohort of 291 patients. During a median follow-up of 25 months, growth in tumor of at least 3 mm was observed in 3.8% of patients, with no regional or distant metastases.79
Minimally invasive techniques as treatment options (instead of surgery) for symptomatic benign thyroid nodules have been investigated recently. They include ultrasound guided percutaneous ethanol injection (PEI) (mainly for cystic thyroid nodules) or thermal ablation techniques (radiofrequency ablation, laser therapy, high intensity focused ultrasound, ultrasound guided microwave ablation).202203
A systematic review of randomized controlled trials of minimally invasive techniques for benign thyroid nodules with low to moderate quality evidence showed that PEI, laser therapy, and radiofrequency ablation achieved thyroid nodule volume reductions and improvements in pressure symptoms and cosmetic complaints, with reported adverse events such as light to moderate periprocedural pain.204 PEI compared with cyst aspiration only was associated with a thyroid nodule volume reduction of at least 50% in 83% versus 44% of participants; improvement in compression symptoms occurred in 78% versus 38%, after six to 12 months. A retrospective study of radiofrequency ablation in 251 benign thyroid nodules showed a volume reduction ratio of 81% after 12 months.205 A multicenter study of 601 thyroid nodules treated with laser therapy or radiofrequency ablation showed that for larger thyroid nodules (volume >30 mL), a higher percentage of thyroid nodule volume reduction was achieved in the laser therapy group (−73) than the radiofrequency ablation group (–54) at 12 months.206 A systematic review of microwave ablation, including three studies and 522 thyroid nodules, showed a mean reduction of thyroid nodule volume ranging from 46% to 65%.207
Several guidelines are available for evaluation of thyroid nodules. The British Thyroid Association,94 ATA,4 and American Association of Clinical Endocrinologists/American College of Endocrinology/Italian Association of Clinical Endocrinologists93 provide strength/quality of evidence for their recommendations. Most guidelines recommend ultrasound risk assessment to determine the need for FNA and standardized reporting of thyroid FNAs, but different systems are proposed. Moreover, the use of molecular markers in routine clinical practice is not recommended by all groups. In terms of follow-up, most guidelines are against follow-up ultrasound for thyroid nodules at less than one year intervals, except for proven cancers. However, variations in the frequency of ultrasound assessment are noted (table 4, table 6, and table 8).
Important advances have been made in our understanding of the clinical significance and possible harms associated with inadequate evaluation of thyroid nodules, which are detected with increasing frequency owing to the widespread availability and use of imaging techniques. The development and validation of multiple ultrasound risk stratification systems have allowed clinicians to estimate to a certain extent the POM in thyroid nodules. The threshold for thyroid biopsy has been determined on the basis of this POM, while trying to balance the value of a diagnosis of thyroid cancer that is clinically relevant and the potential risk of a missed diagnosis. Similarly, standardized systems for reporting FNAs have allowed an estimation of thyroid cancer risk. Furthermore, avoiding a linear evaluation and combining clinical, cytological, and ultrasound assessment can better guide clinical management in patients with indeterminate results. Molecular markers have also emerged as a diagnostic aid that can help to avoid diagnostic thyroidectomies; however, their diagnostic accuracy across different practices is variable and needs further evaluation. Clinicians knowledgeable about the available tools to estimate risk of thyroid cancer in thyroid nodules are encouraged to share these estimates with their patients and decide together between thyroid biopsy and observation with thyroid ultrasound, once a thyroid nodule is discovered, or between surgery and observation after initial investigation provides indeterminate results.
How can we clinically distinguish benign thyroid nodules and low risk thyroid cancer from clinically significant thyroid cancer that would lead to important adverse outcomes for patients?
How can we improve the quality and reproducibility of ultrasound and cytological assessment of thyroid nodules?
How should we integrate molecular testing into the diagnostic pathway for thyroid nodules given variable diagnostic performance?
How should we follow up thyroid nodules with benign biopsy results or that do not meet criteria for thyroid biopsy?
Glossary of abbreviations
ACR-TIRADS—American College of Radiology Thyroid Imaging Reporting and Data Systems
ATA—American Thyroid Association
AUS/FLUS—atypia of undetermined significance/follicular lesion of undetermined significance
BSRTC—Bethesda System for Reporting Cytopathology
CNB—core needle biopsy
FNA—fine needle aspiration biopsy
GEC—Afirma Gene Expression Classifier
GSC— Afirma Gene Sequencing Classifier
K-TIRADS—Korean Thyroid Imaging Reporting and Data System
MTC—medullary thyroid cancer
NIFTP—non-invasive follicular thyroid neoplasm with papillary-like nuclear features
NPV—negative predictive value
PEI—percutaneous ethanol injection
POM—prevalence of malignancy
PPV—positive predictive value
PTC—papillary thyroid cancer
TSH—thyroid stimulating hormone
We thank Larry Prokop for his assistance with the literature search.
Series explanation: State of the Art Reviews are commissioned on the basis of their relevance to academics and specialists in the US and internationally. For this reason they are written predominantly by US authors
Contributors: All authors contributed to developing the outline for this article. NSO and NIA wrote the first draft of the manuscript. All authors reviewed and appraised the manuscript. NSO is the guarantor.
Competing interests: We have read and understood the BMJ policy on declaration of interests and declare the following interests: none.
Provenance and peer review: Commissioned; externally peer reviewed.
Patient involvement: A 69 year old woman incidentally found to have a thyroid nodule on imaging who underwent thyroid biopsy was invited to serve as a patient reviewer for the BMJ. The patient was supportive of including small clinical vignettes in the final version of the manuscript of an example of how different patients may value diagnostic and treatment options differently.