-
PDF
- Split View
-
Views
-
Cite
Cite
Richard C. Webb, Robin S. Howard, Alexander Stojadinovic, David Y. Gaitonde, Mark K. Wallace, Jehanara Ahmed, Henry B. Burch, The Utility of Serum Thyroglobulin Measurement at the Time of Remnant Ablation for Predicting Disease-Free Status in Patients with Differentiated Thyroid Cancer: A Meta-Analysis Involving 3947 Patients, The Journal of Clinical Endocrinology & Metabolism, Volume 97, Issue 8, 1 August 2012, Pages 2754–2763, https://doi.org/10.1210/jc.2012-1533
- Share Icon Share
Abstract
Decisions regarding initial therapy and subsequent surveillance in patients with differentiated thyroid cancer (DTC) depend upon an accurate assessment of the risk of persistent or recurrent disease.
The objective of this study was to examine the predictive value of a single measurement of serum thyroglobulin (Tg) just before radioiodine remnant ablation (preablation Tg) on subsequent disease-free status.
Sources included MEDLINE and BIOSYS databases between January 1996 and June 2011 as well as data from the author's tertiary-care medical center.
Included studies reported preablation Tg values and the outcome of initial therapy at surveillance testing or during the course of long-term follow-up.
Two investigators independently extracted data and rated study quality using the Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews-2 (QUADAS-2) tool.
Fifteen studies involving 3947 patients with DTC were included. Seventy percent of patients had preablation Tg values lower than the threshold value being examined. The negative predictive value (NPV) of a preablation Tg below threshold was 94.2 (95% confidence interval = 92.8–95.3) for an absence of biochemical or structural evidence of disease at initial surveillance or subsequent follow-up. The summary receiver operator characteristic curve based on a bivariate mixed-effects binomial regression model showed a clustering of studies using a preablation Tg below 10 ng/ml near the summary point of optimal test sensitivity and specificity.
Preablation Tg testing is a readily available and inexpensive tool with a high NPV for future disease-free status. A low preablation Tg should be considered a favorable risk factor in patients with DTC. Further study is required to determine whether a low preablation Tg may be used to select patients for whom radioiodine remnant ablation can be avoided.
After thyroidectomy and radioiodine remnant ablation (RRA) for differentiated thyroid cancer (DTC), 10–50% of patients will have persistent biochemical evidence of disease, manifested by detectable TSH-stimulated levels of serum thyroglobulin (Tg), and approximately one half of these will have or develop structurally identifiable disease, most frequently in cervical lymph nodes or distant sites, such as the lungs (1–3). Less commonly, patients with DTC have unresectable disease at diagnosis or recurrent structural disease after a period in which biochemical and anatomical surveillance had suggested a complete remission.
In 2009, the American Thyroid Association (ATA) thyroid cancer management guidelines proposed a risk-stratification system for predicting persistent or recurrent disease (4). Although this system demonstrates an enhanced ability to predict disease recurrence over traditional staging systems, even low-risk patients in the ATA classification system have been found to have a 14% risk of persistent biochemical or structural disease (2). Additional risk stratification based on the response to initial therapy improves long-term prognostication (2), but additional tools are clearly needed to enhance the accuracy of the initial risk appraisal, thus allowing a rational approach to the use of RRA and formulation of an appropriate surveillance strategy. This particularly applies to routine clinical practice, in which more than 70% of DTC patients are American Joint Committee on Cancer, Seventh Edition (AJCC-7) stage I or II (5), for whom ATA guidelines advise either no or only selected use of RRA (4).
Serum Tg measured just before RRA after near-total or total thyroidectomy (preablation Tg) has been examined in numerous studies for its ability to predict the presence of persistent and recurrent disease on follow-up (6–20) or the presence of metastatic disease on posttreatment whole-body scan (8, 16, 21–23). A consistent finding in these studies is the high negative predictive value (NPV) associated with low preablation Tg levels, using cutoff values generally ranging from 1.0–10.0 ng/ml. The current study provides a structured meta-analysis of existing studies in this area, including data from the authors' institution, with an objective of examining the utility of preablation Tg measurement in patients with DTC.
Materials and Methods
Meta-analysis
Identification of articles
A systematic literature search including MEDLINE and BIOSYS was performed using the search terms Tg and differentiated thyroid cancer, with publication over the 15-yr period between 1996 and 2011. The search was conducted in June 2011 and extended back through January 1996. Abstracts were reviewed to determine potential eligibility for inclusion in the meta-analysis. All articles potentially meeting eligibility criteria were then obtained in complete text and screened for inclusion into the study using a checklist of preferred items. Article preprints or additional data not included in the paper were requested directly from study authors. Additional articles were considered when found during the review of the initially obtained references.
Inclusion criteria
Studies sought for the meta-analysis involved predominately adults with DTC. Preablation Tg levels were required for all patients, obtained at least 4 wk postoperatively after total or near-total thyroidectomy under thyroid hormone withdrawal (THW) conditions, with negative assays for anti-Tg antibodies (TgAb). Patients were followed until at least one surveillance testing including stimulated Tg values was performed. Adequate data for the calculation of sensitivity and specificity of preablation Tg was required.
Data extraction
For included articles, two reviewers (R.W. and H.B.) extracted descriptive data including first author, year of publication, number of subjects, mean or median age and gender (when provided), types of DTC, preablation Tg cutoffs used, definitions for and methods used to detect biochemical or structural evidence of persistent and recurrent disease, and study exclusion criteria. Surveillance parameters were recorded for each of the studies and organized into three categories: A) metastatic disease on postablation whole-body scan, B) stimulated Tg testing at greater than 6 months after initial therapy, or C) the development of anatomical evidence of disease during the follow-up period.
Assessment of study quality
Two reviewers (H.B. and A.S.) independently assessed study quality using the Quality Assessment of Studies of Diagnostic Accuracy Included in Systematic Reviews-2 (QUADAS-2) tool (24) adapted to the surveillance for thyroid cancer. The QUADAS-2 tool, which is recommended by the Cochrane Diagnostic Test Accuracy Working Group for use in meta-analysis (25), assesses the quality of included studies in terms of the risk for bias and applicability to the clinical question being addressed. In assessing the risk of bias for an individual study, each of the following four domains is examined: 1) patient selection, 2) the index test, 3) the reference standard, and 4) flow and timing. Likewise, the applicability of an individual study to the clinical question at hand is assessed in the areas of 1) patient selection, 2) the index test, and 3) the reference standard. The signaling questions used to adapt the QUADAS-2 tool to the current study included 12 questions to assess the risk of bias and seven questions to assess the applicability (see Supplemental Material published on The Endocrine Society's Journals Online web site at http://jcem.endojournals.org). In each subsection of the QUADAS-2 tool, the number of affirmative responses to signaling questions was used to classify studies as having high, low, or intermediate risk.
Walter Reed study subjects: data collection
This study was granted exempt status by the Institutional Review Board at Walter Reed Army Medical Center (WRAMC). A clinical database containing 22,000 active and inactive patient convenience files maintained in the Endocrinology Clinic from approximately 1975–2003 was searched for individuals having a diagnosis of papillary thyroid cancer (PTC). Data were abstracted for inclusion into a database by four reviewers (H.B., D.G., M.W., and J.A.). All patients had histologically confirmed PTC and underwent total or near-total thyroidectomy. Patients with clinically apparent nodal disease preoperatively or at the time of surgery also underwent compartment-oriented neck dissection. Prophylactic neck dissection for the removal of lymph nodes that appeared normal on preoperative imaging and inspection was not performed. All patients underwent RRA under hypothyroid conditions (TSH > 25 mIU/liter), 4–6 wk after thyroidectomy, using generally from 30–150 mCi 131I. Serum TSH and preablation Tg were measured approximately 72 h before the RRA. Patients routinely underwent withdrawal or recombinant human TSH (rhTSH)-stimulated whole-body scanning and Tg testing 6–12 months after RRA and at 6- to 12-month intervals thereafter, based on their physician's assessment of baseline risk and the clinical course of disease. Serum Tg was measured in reference laboratories, first in the early 1980s by RIA reporting a lower limit of detection of 4 ng/ml, followed by an assay lower limit of 1 ng/ml from 1989–1997, 0.5 ng/ml from 1998–2001, 0.9 ng/ml using an immunoradiometric assay from 2002–2005, and finally 0.2 ng/ml starting in 2006.
Patients from the WRAMC database were considered as having no evidence of disease (NED) if the stimulated Tg level was below 2 ng/ml with negative TgAb, and no tumor was identified on whole-body scan or cross-sectional imaging, if obtained. Persistent disease was classified as structural if found on cross-sectional imaging, biopsy, and post-RRA scan or biochemical, defined as a stimulated Tg value of at least 2 ng/ml in the absence of structural evidence of disease. Disease recurrence was defined as anatomic or biochemical evidence of disease after a period in which the patient had NED.
Statistical analysis
Based on the results from 2 × 2 contingency tables for the 15 studies, we estimated a summary receiver operating characteristic (sROC) curve together with pooled measures of sensitivity, specificity, likelihood ratios, diagnostic odds ratio (DOR) and the area under the sROC curve. The DOR combines sensitivity and specificity into one measure of diagnostic performance and is defined as DOR = (true positive/false positive) ÷ (false negative/true negative) = LR(+)/LR(−), where LR(+) is the likelihood ratio of having a disease when the test is positive, and LR(−) the likelihood of having the disease when the test result is negative. These estimates were derived using a bivariate mixed-effects logistic regression model (26). Data were analyzed with STATA software (version 11; StataCorp, College Station TX), using the Midas module (26). Estimated parameters are presented together with 95% confidence intervals (CI). Heterogeneity was examined quantitatively using the I2 statistic, with low heterogeneity defined as an I2 value less than 25%, high heterogeneity as above 75%, and moderate heterogeneity between 25 and 75%. Bayesian analysis was performed comparing the pretest probability of disease, based on prevalence data from the included studies, to the posttest probability of disease in the presence of a positive or negative test result. In this analysis, summary LR(+) and LR(−) estimated from the bivariate model are used in the Bayes' theorem. In these calculations, for the probability of disease (P), the posterior (posttest) probability of disease is defined as P(D) = (LR × P) ÷ [(1 − P) + (LR × P)], with calculation for probability of disease in the presence of a positive or negative test using LR(+) or LR(−), respectively.
Results
Literature search
Figure 1 shows the flow of studies through the review process. MEDLINE and BIOSYS searches resulted in 1058 articles. Review of the abstract resulted in elimination of all but 38 articles that were then reviewed in depth. Three additional articles were identified while abstracting data from these selected articles, two of which were included in the final in-depth analysis. Ultimately, 40 articles were subjected to intensive review, among which 26 were excluded and 14 included in the final meta-analysis in addition to data from the authors' medical center. Among the 26 excluded papers, 11 provided insufficient data to calculate sensitivity and specificity, five had problems with the timing of preablation Tg measurement, three provided no follow-up testing, three had highly restricted populations (only patients with metastatic disease or only patients with undetectable preablation Tg), three included data presented by the same authors in included studies, and one was performed solely in pediatric patients.
![The flow of articles through the review process.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jcem/97/8/10.1210_jc.2012-1533/2/m_zeg0081290660001.jpeg?Expires=1724449128&Signature=iOLu92IEHyz-JUxWPukIbh5vqtVDg5Ic9j3YobcrjUpGX6WgShH6r4haoSwmzzIJyRgEnHImadCtWCpKIRkoglhO5mM2lx4FUZNnOxZY4kG0jhfg90rO2n2V3NmSmehIrKD8fXnHwSPADebOJRaSAM4KAYwwQPQ0IC9YPBeDkKRFdAub-MX3Ii6Oe-zAjTYzdyTxcMLMOGPhgMI9sBB1CzCyYZvzgoqfF5bSVY7Ea7mL52GzimGYQcsWOCfiBzs5noQEUQjurBW8MWRgM~dGF2~h4rtg3lWfKu9IHShEdjcDfKUiNMKQw6KGnOBe9Ja5JjmDbNZiw8w9xiMz4QqlCw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Quality assessment
Figure 2 shows a summary of the QUADAS-2 study quality analysis. Among the 15 studies included, three (20%) were rated as being at risk for patient selection bias, due to a failure to include all stages of DTC in the analysis. Similarly, five studies (33%) were classified as generating higher levels of concern regarding applicability, due to the study site being a university hospital rather than a community practice. The QUADAS-2 procedure requires a determination of whether the index test (preablation Tg) was known to the authors at the time of determining the reference standard (persistence and recurrence status) (24). Because none of the included studies stipulated the independence of this assessment, all but one was rated as having an unclear or intermediate risk of bias for this subsection. Omission of this question from the quality analysis would have resulted in a rating of low risk for bias in the reference standard subsection for all studies.
![QUADAS-2 tool for analysis of included study quality. The risk of bias for the included articles and the level of concern regarding applicability of included articles are shown. A, Percentage of articles at low, unclear (intermediate), or high risk for bias is shown in blue, green, and red, respectively; B, percentage of articles with low, unclear (intermediate), or high levels of concern regarding applicability is shown.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jcem/97/8/10.1210_jc.2012-1533/2/m_zeg0081290660002.jpeg?Expires=1724449128&Signature=BvRN1qqAkDAskT1DTQlv56Lhhg1E3TW6KkQXuGf1d59h4FagrHVWkwVStAgeN3VywoJlFKPYny-fH94dlKEo~G7MnFckXtjs81mZiSR~dRZ47TI1GyKGI3uaCGefPSRDwHwO-dxLeggUn-9kagIZh5TNPkhFPDdIGRKGXJEkRXHWChV-v9HDPDB49c706bWlR-suIuFYFpyXYFPAipDdQME9J468SCvb~k-0GLkGoUUgg65qFRghqgDdghQY708M4dEKIAyeik2xy7tt1bo8k2Oenq-iJ4nAZr4wtiCpSCOXLVzir4o8Zhq2G-bLiHGQvy00cqI09ealPw8wBK-iLw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
QUADAS-2 tool for analysis of included study quality. The risk of bias for the included articles and the level of concern regarding applicability of included articles are shown. A, Percentage of articles at low, unclear (intermediate), or high risk for bias is shown in blue, green, and red, respectively; B, percentage of articles with low, unclear (intermediate), or high levels of concern regarding applicability is shown.
Data analysis
Summary of included studies
Table 1 provides a summary of the included articles. Articles included were derived from diverse geographical regions, with eight of 15 studies performed in Europe, three in North America, two in Asia, one in South America, and one in the Middle East. PTC accounted for 75–100% of cases, and women accounted for 71–85% of subjects, with mean ages ranging from 40.6–49.2 yr. The duration of follow-up varied from 0.6 yr (first surveillance testing) to as long as 16 yr. Seven studies included data pertaining to both biochemical persistence and structural evidence of recurrence (categories B and C, respectively), three studies analyzed data pertaining to category C, and five studies presented data only on category B, including two studies also providing data in category A (metastatic disease on posttreatment scan).
Author, year (Ref.) . | Country . | Number of patients reported (included)a . | Mean age (yr) . | Timing of preablation Tg (d) . | Outcome typeb (A, B, C) . | Follow-up (yr) (mean or range) . | Tg cutoff (ng/ml) . |
---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | Germany | 407 | 46.0 | 30 | B | 0.61 | 5.0 |
Familiar, 2009 (7) | Spain | 63 | 41.0 | NS | B, C | 5.0 | 10.0 |
Giovanella, 2005 (8) | Switzerland | 140 | 46.0 | 28 | A, B, C | 1.0 | 3.2 |
Heemstra, 2007 (9) | Netherlands | 222 | 48.0 | NS | C | 8.3 | 27.5 |
Kim, 2005 (10) | South Korea | 268 | 44.4 | 35 | B, C | 5.7 | 10.0 |
Lin, 2002 (11) | China | 847 (654) | 40.8 | 30 | B, C | 3.5–6.3 | 10.0 |
Oyen, 2000 (12) | Netherlands | 206 | 45.0 | 28 | C | 2.7 | 6.6 |
Pelttari, 2010 (13) | Finland | 495 (391) | 40.6 | 28 | B, C | 10–24 | 10.0 |
Polachek, 2011 (14) | Israel | 420 | 49.2 | 28 | B, C | 5.1 | 10.0 |
Ronga, 1999 (15) | Italy | 334 | 41.6 | 40 | C | 4.0–16.0 | 11.1 |
Rosario, 2011 (16) | Brazil | 237 | 43.0 | 90 | A, B, C | 0.7–1.0 | 10.0 |
Sawka, 2008 (17) | Canada | 141 | 43.7 | 84 | B | 1.2–7.0 | 10.0 |
Tamilia, 2011 (18) | Canada | 193 | 45.5 | 63 | B | 1.0–1.5 | 10.0 |
Toubeau, 2004 (19) | France | 212 (208) | 47.0 | 28 | B, C | 1.0–12 | 30.0 |
Webb, 2011 (20) | United States | 75 (63) | 40.7 | 28 | B, C | 6.6 | 10.0 |
Author, year (Ref.) . | Country . | Number of patients reported (included)a . | Mean age (yr) . | Timing of preablation Tg (d) . | Outcome typeb (A, B, C) . | Follow-up (yr) (mean or range) . | Tg cutoff (ng/ml) . |
---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | Germany | 407 | 46.0 | 30 | B | 0.61 | 5.0 |
Familiar, 2009 (7) | Spain | 63 | 41.0 | NS | B, C | 5.0 | 10.0 |
Giovanella, 2005 (8) | Switzerland | 140 | 46.0 | 28 | A, B, C | 1.0 | 3.2 |
Heemstra, 2007 (9) | Netherlands | 222 | 48.0 | NS | C | 8.3 | 27.5 |
Kim, 2005 (10) | South Korea | 268 | 44.4 | 35 | B, C | 5.7 | 10.0 |
Lin, 2002 (11) | China | 847 (654) | 40.8 | 30 | B, C | 3.5–6.3 | 10.0 |
Oyen, 2000 (12) | Netherlands | 206 | 45.0 | 28 | C | 2.7 | 6.6 |
Pelttari, 2010 (13) | Finland | 495 (391) | 40.6 | 28 | B, C | 10–24 | 10.0 |
Polachek, 2011 (14) | Israel | 420 | 49.2 | 28 | B, C | 5.1 | 10.0 |
Ronga, 1999 (15) | Italy | 334 | 41.6 | 40 | C | 4.0–16.0 | 11.1 |
Rosario, 2011 (16) | Brazil | 237 | 43.0 | 90 | A, B, C | 0.7–1.0 | 10.0 |
Sawka, 2008 (17) | Canada | 141 | 43.7 | 84 | B | 1.2–7.0 | 10.0 |
Tamilia, 2011 (18) | Canada | 193 | 45.5 | 63 | B | 1.0–1.5 | 10.0 |
Toubeau, 2004 (19) | France | 212 (208) | 47.0 | 28 | B, C | 1.0–12 | 30.0 |
Webb, 2011 (20) | United States | 75 (63) | 40.7 | 28 | B, C | 6.6 | 10.0 |
Included patients were those with sufficient data for meta-analysis. Total patients = 4260 (included 3947). NS, not stated.
A, Metastatic activity on posttreatment whole-body scan; B, surveillance stimulated Tg testing; C, recurrent structural disease.
Author, year (Ref.) . | Country . | Number of patients reported (included)a . | Mean age (yr) . | Timing of preablation Tg (d) . | Outcome typeb (A, B, C) . | Follow-up (yr) (mean or range) . | Tg cutoff (ng/ml) . |
---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | Germany | 407 | 46.0 | 30 | B | 0.61 | 5.0 |
Familiar, 2009 (7) | Spain | 63 | 41.0 | NS | B, C | 5.0 | 10.0 |
Giovanella, 2005 (8) | Switzerland | 140 | 46.0 | 28 | A, B, C | 1.0 | 3.2 |
Heemstra, 2007 (9) | Netherlands | 222 | 48.0 | NS | C | 8.3 | 27.5 |
Kim, 2005 (10) | South Korea | 268 | 44.4 | 35 | B, C | 5.7 | 10.0 |
Lin, 2002 (11) | China | 847 (654) | 40.8 | 30 | B, C | 3.5–6.3 | 10.0 |
Oyen, 2000 (12) | Netherlands | 206 | 45.0 | 28 | C | 2.7 | 6.6 |
Pelttari, 2010 (13) | Finland | 495 (391) | 40.6 | 28 | B, C | 10–24 | 10.0 |
Polachek, 2011 (14) | Israel | 420 | 49.2 | 28 | B, C | 5.1 | 10.0 |
Ronga, 1999 (15) | Italy | 334 | 41.6 | 40 | C | 4.0–16.0 | 11.1 |
Rosario, 2011 (16) | Brazil | 237 | 43.0 | 90 | A, B, C | 0.7–1.0 | 10.0 |
Sawka, 2008 (17) | Canada | 141 | 43.7 | 84 | B | 1.2–7.0 | 10.0 |
Tamilia, 2011 (18) | Canada | 193 | 45.5 | 63 | B | 1.0–1.5 | 10.0 |
Toubeau, 2004 (19) | France | 212 (208) | 47.0 | 28 | B, C | 1.0–12 | 30.0 |
Webb, 2011 (20) | United States | 75 (63) | 40.7 | 28 | B, C | 6.6 | 10.0 |
Author, year (Ref.) . | Country . | Number of patients reported (included)a . | Mean age (yr) . | Timing of preablation Tg (d) . | Outcome typeb (A, B, C) . | Follow-up (yr) (mean or range) . | Tg cutoff (ng/ml) . |
---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | Germany | 407 | 46.0 | 30 | B | 0.61 | 5.0 |
Familiar, 2009 (7) | Spain | 63 | 41.0 | NS | B, C | 5.0 | 10.0 |
Giovanella, 2005 (8) | Switzerland | 140 | 46.0 | 28 | A, B, C | 1.0 | 3.2 |
Heemstra, 2007 (9) | Netherlands | 222 | 48.0 | NS | C | 8.3 | 27.5 |
Kim, 2005 (10) | South Korea | 268 | 44.4 | 35 | B, C | 5.7 | 10.0 |
Lin, 2002 (11) | China | 847 (654) | 40.8 | 30 | B, C | 3.5–6.3 | 10.0 |
Oyen, 2000 (12) | Netherlands | 206 | 45.0 | 28 | C | 2.7 | 6.6 |
Pelttari, 2010 (13) | Finland | 495 (391) | 40.6 | 28 | B, C | 10–24 | 10.0 |
Polachek, 2011 (14) | Israel | 420 | 49.2 | 28 | B, C | 5.1 | 10.0 |
Ronga, 1999 (15) | Italy | 334 | 41.6 | 40 | C | 4.0–16.0 | 11.1 |
Rosario, 2011 (16) | Brazil | 237 | 43.0 | 90 | A, B, C | 0.7–1.0 | 10.0 |
Sawka, 2008 (17) | Canada | 141 | 43.7 | 84 | B | 1.2–7.0 | 10.0 |
Tamilia, 2011 (18) | Canada | 193 | 45.5 | 63 | B | 1.0–1.5 | 10.0 |
Toubeau, 2004 (19) | France | 212 (208) | 47.0 | 28 | B, C | 1.0–12 | 30.0 |
Webb, 2011 (20) | United States | 75 (63) | 40.7 | 28 | B, C | 6.6 | 10.0 |
Included patients were those with sufficient data for meta-analysis. Total patients = 4260 (included 3947). NS, not stated.
A, Metastatic activity on posttreatment whole-body scan; B, surveillance stimulated Tg testing; C, recurrent structural disease.
Exclusion criteria used by included studies are summarized in Table 2. Three studies excluded patients with microcarcinomas (6, 16, 17), and two studies included only patients with T1 (8) or T1+T2 tumors (16). Two studies excluded patients with T4 lesions (13, 16), two excluded patients with local lymph node metastases (16, 19), and four studies excluded patients with distant metastases at baseline (10, 13, 16, 19). A single study failed to exclude patients with positive TgAb from the analysis (11). Given the quality of this study based on sample size and the fact that inclusion of patients with positive TgAb would decrease rather than inflate the NPV, it was included in the final analysis.
Author, year (Ref.) . | TgAb (+) . | Lobe only . | T1a . | >T1 . | >T2 . | T4 . | N1 . | M1 . | After 131I scan (+) . |
---|---|---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | x | x | x | ||||||
Familiar, 2009 (7) | x | x | |||||||
Giovanella, 2005 (8) | x | x | x | x | |||||
Heemstra, 2007 (9) | x | x | |||||||
Kim, 2005 (10) | x | x | x | x | |||||
Lin, 2002 (11) | x | ||||||||
Oyen, 2000 (12) | x | x | |||||||
Pelttari, 2010 (13) | x | x | x | x | |||||
Polachek, 2011 (14) | x | x | |||||||
Ronga, 1999 (15) | x | x | |||||||
Rosario, 2011 (16) | x | x | x | x | x | x | x | ||
Sawka, 2008 (17) | x | x | x | ||||||
Tamilia, 2011 (18) | x | ||||||||
Toubeau, 2004 (19) | x | x | x | x | |||||
Webb, 2011 (20) | x | x |
Author, year (Ref.) . | TgAb (+) . | Lobe only . | T1a . | >T1 . | >T2 . | T4 . | N1 . | M1 . | After 131I scan (+) . |
---|---|---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | x | x | x | ||||||
Familiar, 2009 (7) | x | x | |||||||
Giovanella, 2005 (8) | x | x | x | x | |||||
Heemstra, 2007 (9) | x | x | |||||||
Kim, 2005 (10) | x | x | x | x | |||||
Lin, 2002 (11) | x | ||||||||
Oyen, 2000 (12) | x | x | |||||||
Pelttari, 2010 (13) | x | x | x | x | |||||
Polachek, 2011 (14) | x | x | |||||||
Ronga, 1999 (15) | x | x | |||||||
Rosario, 2011 (16) | x | x | x | x | x | x | x | ||
Sawka, 2008 (17) | x | x | x | ||||||
Tamilia, 2011 (18) | x | ||||||||
Toubeau, 2004 (19) | x | x | x | x | |||||
Webb, 2011 (20) | x | x |
NI, Positive cervical lymph node metastasis; MI, positive distant metastases; X, indicates the exclusion criterion applied.
Author, year (Ref.) . | TgAb (+) . | Lobe only . | T1a . | >T1 . | >T2 . | T4 . | N1 . | M1 . | After 131I scan (+) . |
---|---|---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | x | x | x | ||||||
Familiar, 2009 (7) | x | x | |||||||
Giovanella, 2005 (8) | x | x | x | x | |||||
Heemstra, 2007 (9) | x | x | |||||||
Kim, 2005 (10) | x | x | x | x | |||||
Lin, 2002 (11) | x | ||||||||
Oyen, 2000 (12) | x | x | |||||||
Pelttari, 2010 (13) | x | x | x | x | |||||
Polachek, 2011 (14) | x | x | |||||||
Ronga, 1999 (15) | x | x | |||||||
Rosario, 2011 (16) | x | x | x | x | x | x | x | ||
Sawka, 2008 (17) | x | x | x | ||||||
Tamilia, 2011 (18) | x | ||||||||
Toubeau, 2004 (19) | x | x | x | x | |||||
Webb, 2011 (20) | x | x |
Author, year (Ref.) . | TgAb (+) . | Lobe only . | T1a . | >T1 . | >T2 . | T4 . | N1 . | M1 . | After 131I scan (+) . |
---|---|---|---|---|---|---|---|---|---|
Bernier, 2005 (6) | x | x | x | ||||||
Familiar, 2009 (7) | x | x | |||||||
Giovanella, 2005 (8) | x | x | x | x | |||||
Heemstra, 2007 (9) | x | x | |||||||
Kim, 2005 (10) | x | x | x | x | |||||
Lin, 2002 (11) | x | ||||||||
Oyen, 2000 (12) | x | x | |||||||
Pelttari, 2010 (13) | x | x | x | x | |||||
Polachek, 2011 (14) | x | x | |||||||
Ronga, 1999 (15) | x | x | |||||||
Rosario, 2011 (16) | x | x | x | x | x | x | x | ||
Sawka, 2008 (17) | x | x | x | ||||||
Tamilia, 2011 (18) | x | ||||||||
Toubeau, 2004 (19) | x | x | x | x | |||||
Webb, 2011 (20) | x | x |
NI, Positive cervical lymph node metastasis; MI, positive distant metastases; X, indicates the exclusion criterion applied.
Sensitivity, specificity, positive predictive value (PPV), and NPV of included studies
Figure 3 shows the forest plots for sensitivity and specificity derived from the included articles as well as additional data provided by four of the primary authors (7, 13, 16, 18) for the purpose of this analysis. The tabulated summary to the left of the forest plots indicates the data used to calculate sensitivity and specificity as well as the Tg cutoff used in each of the included studies, and the calculated values for sensitivity, specificity, PPV, and NPV. Of note, the NPV was more than 90% in all but two studies, and the overall NPV was 94.2 (95% CI = 92.8–95.3). The majority of studies (nine of 15) provided data pertaining to a preablation Tg cutoff of 10 ng/ml (7, 10, 11, 13, 14, 16–18, 20). Additional cutoffs used included 3.2 ng/ml (8), 5.0 ng/ml (6), 6.6 ng/ml (12), 11.1 ng/ml (15), 27.5 ng/ml (9), and 30 ng/ml (19). Six studies, including our own (8, 9, 14, 15, 18, 20), used an ROC analysis to define the cutoff point used; the majority of the remaining studies selected the cutoff empirically. In each of the included studies, patients with evidence of persistent or recurrent disease at any time point in the follow-up period were considered as not being disease free, irrespective of the disease status after additional therapy was applied.
![Forest plot showing sensitivity and specificity for the included studies. The tabular summary shows values for true positives (TP), false negatives (FN), false positives (FP), true negatives (TN), preablation Tg cutoff used (TG), sensitivity (SN), specificity (SP), PPV, and NPV for each of the included studies.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jcem/97/8/10.1210_jc.2012-1533/2/m_zeg0081290660003.jpeg?Expires=1724449128&Signature=BnITlv0QOxxTEmeVZNkj-AUp~ZyqFx5G34OjTEgQvrdV-5LJV2LTo1PpboyIbwZlhv6cTFNbuBXNn1GDzbJ08amZlsYG65CYI-2OA3kQz51nXpcsgVSFsTjIs5ifjHwCOYu7~jza2a-bwDtI5FWRGRhDO8-4Ym0A7nVpxDsSJaI-kaaxwj0h7Oo4ZLdcs3r-Blb1UI1A8tblisnp9kaJc5dgA61HX6MZayVi480GSx7Ac3mJq9HAHu5JrSYJrvTyn9eL4X~JfJXoUCnYJsADN3bl9TR8fJiaSJc-UhET~kkyFFzKmO8gC9oOvqi67NzmitH4bKQQ3b0aqCbBLrQn3A__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Forest plot showing sensitivity and specificity for the included studies. The tabular summary shows values for true positives (TP), false negatives (FN), false positives (FP), true negatives (TN), preablation Tg cutoff used (TG), sensitivity (SN), specificity (SP), PPV, and NPV for each of the included studies.
Summary ROC analysis
Figure 4 shows the sROC curve for the 15 included studies. The mean threshold sensitivity and specificity, which are the best estimates of the true accuracy of a given test as estimated from the sROC analysis, were 76.1% (95% CI = 69.4–81.8%), and 85.2% (95% CI = 79.1–89.8%), respectively. The area under the sROC curve was 0.87 (0.84–0.89), indicating an excellent test discrimination ability for preablation Tg. There was a clustering of studies using a preablation Tg cutoff of 10 ng/ml (closed circles) near the apex of the curve at the mean threshold point. The I2 statistic, including data from all 15 studies was 98% (95% CI = 97–99%), indicating a high degree of heterogeneity. This value was reduced to 93% when limiting data to those studies using a preablation Tg cutoff of 10 ng/ml. The summary DOR obtained from the bivariate model was 18.4 (95% CI = 12.8–26.5) with a positive likelihood ratio of 5.16 (95% CI = 3.74–7.11) and a negative likelihood ratio of 0.28 (95% CI = 0.22–0.35).
![sROC curve showing a clustering of studies using a preablation Tg cutoff of 10 ng/ml near the summary point sensitivity and specificity indicated by a + sign. Each point shown represents results from an individual study. Adjacent to each point is text showing the preablation Tg cutoff and the study reference number.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jcem/97/8/10.1210_jc.2012-1533/2/m_zeg0081290660004.jpeg?Expires=1724449128&Signature=JUQ8OhBtWSOC-xV4788vBwadS9u2ZExOgdzUBTlXk53TPNo0aWoyqxQBkRmtaN0VBDD8vOWzBWA02KFovG5MW0usa~QC-WikEKqu1JHwseMZDt2ibnCIIsocRFtpAOHqUmn5gyWykXd3b~joOGYSBcaKhz7pAoZCQmEYYW1o7~yzK0tJJqsKIJyNX1y7iumdmotX91fKRNlK-TepcqnW6Vb~OSl7VoubALw6gO7lopIWhH8tfwK-O7hUM9y~nKtuz5dbfciAwXy15e6qAcfQC6FhQxy3bDztEh7-d4g7jeRWXWbHoXMTcdv~9EVkbjV16wAyFfKxm4gG~bv8Wg-viQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
sROC curve showing a clustering of studies using a preablation Tg cutoff of 10 ng/ml near the summary point sensitivity and specificity indicated by a + sign. Each point shown represents results from an individual study. Adjacent to each point is text showing the preablation Tg cutoff and the study reference number.
Bayesian analysis
Figure 5 shows the effect of a negative (below threshold) preablation Tg on the likelihood of disease persistence or recurrence. As shown on the inset, a negative preablation Tg test result reduces the likelihood of disease recurrence from 18% (the pretest probability for the included studies) to 5.8% (95% CI = 4.7–7.2%), representing a 3.1-fold decrease in risk. The upper arm of the Bayesian analysis graph is omitted from this figure.
![Bayesian analysis comparing the pretest probability to posttest probability in the presence of a preablation Tg value lower than threshold. The inset shows a magnification near the graph origin and indicates the effect of a low preablation Tg on the likelihood of disease persistence or recurrence, with a pretest value of 18% (the pretest probability for the included studies) falling to 5.8% in the presence of a preablation Tg lower than threshold.](https://cdn.statically.io/img/oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jcem/97/8/10.1210_jc.2012-1533/2/m_zeg0081290660005.jpeg?Expires=1724449128&Signature=3-bFectbDvmN9Ke4DqEW3HxtTpO0ZZt4u9xPZFT~G0dJ6TcCP3ZkrCq2Y6WIumWXotwkr4OTsOW7hXJPq-GEnSjMjEyc-8iAToghx1Pyksna4FLoaaed5XoNhcDtC3yi5iN2ROOp4gfYEcRAej-Oj8uJFPPDh5n5B0NBiHu5OETMEH9ZZE6aFnLD0fOI2VIwli7gIrACDgxdBAae~Zqjo87HvLPnlMJ7Y4Z~Jm18kk~scpjuQvm~gvhpasQeJ2FvIb3b9VNz9OwC1q6T2oSazPDSh7yPoVf3WiixGwFoKT-1BS8i7mpRJdlGXNKRHrhG7gGcDq-qNCQgb6V1TiUYRg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Bayesian analysis comparing the pretest probability to posttest probability in the presence of a preablation Tg value lower than threshold. The inset shows a magnification near the graph origin and indicates the effect of a low preablation Tg on the likelihood of disease persistence or recurrence, with a pretest value of 18% (the pretest probability for the included studies) falling to 5.8% in the presence of a preablation Tg lower than threshold.
Walter Reed patient outcomes
Among 323 consecutive patients included in the WRAMC PTC database, preablation Tg levels were available for 129 patients. These patients received care between 1983 and 2003. We excluded patients with positive TgAb (n = 24), those with follow-up less than 6 months (n = 11), those undergoing thyroid lobectomy alone (n = 7), those in whom RRA was not administered (n = 5), those with missing follow-up data (n = 6), and those with inadequate TSH elevation (n = 2). Included in the analysis were 74 subjects (53 females and 21 males), with a mean age of 41 ± 13 yr. All patients received total or near-total thyroidectomy, and 21 patients (28.4%) underwent concurrent dissection of clinically involved lymph nodes. All patients received RRA therapy using an average of 129.7 ± 55.4 mCi (median dose, 150 mCi; range, 24.7–271 mCi). Patients were followed for an average of 6.6 ± 5.5 yr (range, 0.7–26.6 yr). Persistent or recurrent disease was noted in 28 patients (37.8%), manifested by biochemical persistence alone in 11 patients (14.9%), and both biochemical persistence and structural disease in 13 patients (17.6%). Six patients with transient Tg elevation to more than 2 ng/ml on initial surveillance were noted to have a spontaneous decrease without further therapy to less than 2 ng/ml and considered free of disease for this analysis. Repeat operations for disease recurrence were conducted in nine patients. Twenty-three patients (31.1%) received multiple radioiodine therapies, including six patients who required three or more treatments for persistent or recurrent disease. Two patients developed late distant metastases, and one patient died from rapidly progressive disease within 3 months of diagnosis. Eleven patients with mild elevation of Tg immediately after ablation and then no further Tg values recorded were excluded from the final analysis. Among 38 patients with preablation Tg values below 10 ng/ml, seven had persistent disease, including four with structural disease and three with Tg elevation alone. Among 25 patients with preablation Tg of at least 10 ng/ml, 21 patients had persistent disease, including 13 with structural disease and eight with Tg elevation alone.
Discussion
Measurement of serum Tg has emerged as the most accurate means of detecting persistent or recurrent DTC (27, 28). The accuracy of stimulated Tg eclipses that of radionuclide imaging in the detection of persistent disease (29–31). As the sensitivity of Tg assays has improved, it is apparent that the majority of patients once thought to have recurrent DTC are better classified as having persistent disease that has progressed to the limits of detection for structural or functional imaging. Because the aggressiveness of early intervention and pace of surveillance are ideally linked to an estimate of the likelihood of persistent disease, there is a need for better tools to identify the one in five patients who will develop structural correlates of persistent disease.
Our study, including data from nearly 4000 patients across a broad spectrum of disease, clearly demonstrates that preablation Tg measurement has the potential to serve as a useful negative predictor of persistent and recurrent DTC. Specifically, a patient with a postoperative preablation Tg value of less than 10 ng/ml has only a 6% likelihood of having persistent disease. Furthermore, our data show that patients manifesting a low preablation Tg are by no means rare, representing 70% of individuals in the studies included in this meta-analysis and 71% of patients in studies using a preablation Tg cutoff of 10 ng/ml. Placed in the context of other traditional tools for risk assessment, such as the ATA classification system (4), the preablation Tg has the capacity to assist in planning initial therapy and plan the intensity of surveillance testing.
As opposed to the high NPV, the PPV of a preablation Tg over 10 ng/ml is quite poor, calculated in our study to be 47%. This is a direct result of residual normal thyroid tissue left in situ at the time of thyroidectomy, from which Tg may continue to be synthesized and released. However, given the presence of thyroid remnant in the vast majority of patients undergoing near-total thyroidectomy (16, 32), it is somewhat surprising that two thirds of patients have stimulated preablation Tg values of 10 ng/ml or less. It is possible that the devascularized normal thyroid remnant, despite having an absolute mass greater than that in lymph node micrometastases, is less capable of producing and releasing Tg. In this regard, metastatic thyroid carcinoma has been shown to express angiogenic factors such as vascular endothelial growth factor in a manner that supports the delivery of nutrients and growth of neoplastic follicular cells (33). In an effort to increase the PPV of preablation Tg, several studies have attempted to correct for the normal thyroid remnant using a variety of techniques, such as the ratio of preablation Tg to radioactive iodine uptake (22, 34), measuring the remnant volume sonographically (35), calculating a T4 to Tg ratio at the time of ablation (36), and assessing the rate of change in serum Tg after remnant ablation, with rapid decreases favoring a remnant source (6). However, none of these techniques has provided significant improvement in the PPV of preablation Tg, and therefore, the sole value of the preablation Tg is as a negative predictor of persistent and recurrent disease when the preablation Tg value is low.
Previous authors have advocated the use of preablation Tg in the selection of patients for RRA (16, 37, 38). The 2012 National Comprehensive Cancer Network thyroid cancer guidelines recommend withholding RRA in patients with a postoperative stimulated Tg below 1 ng/ml and negative radioiodine imaging for metastatic disease (39). Rosario and colleagues (16) suggested that a neck ultrasound could be used to augment the predictive value of the preablation Tg level. Although none of their patients with a preablation Tg below 1 ng/ml had disease detected by ultrasound, four (1.8%) of 217 patients with preablation Tg below 10 ng/ml were found to have metastatic disease to lymph nodes. Although the predictive value of a preablation Tg below 10 ng/ml is demonstrated in our meta-analysis, it is likely that lower cutoffs such as 5 or 1 ng/ml would demonstrate an even higher NPV, albeit for a progressively smaller group of patients.
The studies included in this meta-analysis exclusively used THW conditions rather than rhTSH to measure preablation Tg. Because rhTSH-stimulated Tg values tend to be lower than those obtained under THW (40), lower Tg cutoffs would likely be required for the application of this analysis to the use of rhTSH. Furthermore, rhTSH stimulated RRA is generally performed on d 3 after two consecutive doses of rhTSH on d 1 and 2, which is before the peak elevation of serum Tg (on d 5) in many patients undergoing rhTSH-stimulated Tg testing. Therefore independent diagnostic and therapeutic use of rhTSH would be required if the results of preablation Tg were to be used to influence the decision to use RRA.
Our study has several important limitations. First, four of 15 included studies provided less than 3 yr of follow-up data. In these studies, a negative stimulated Tg value and, if provided, absence of structural disease during surveillance were considered indications of a NED status. In support of this approach, the initial surveillance stimulated Tg result has been shown to be highly predictive of disease-free status, with a NPV of 98% (3, 41). Second, there was a high degree of study heterogeneity within the group of included studies as assessed by the I2 statistic. Despite this heterogeneity, the large number of patients included from a broad spectrum of disease, and the consistent use of standard measures for detecting persistent or recurrent disease mitigate the impact of these differences. Third, despite the central role Tg measurement plays in the surveillance of patients with DTC, the Achilles' heel of this assay remains interference by TgAb, which occurs during the course of follow-up in up to 25% of patients (42). Therefore, any measured Tg value, including preablation Tg, is only as useful as the TgAb assay sensitivity, because virtually all immunometric Tg assays register falsely low Tg values in the presence of TgAb (42). Notably, failure to detect an elevated Tg level due to TgAb interference would reduce the NPV of the preablation Tg, yet our NPV remained high at 94%, suggesting a minimal impact of undetected TgAb on our results. Only a test with a 100% NPV would provide an absolute assurance against disease recurrence, and therefore, additional measures are needed to risk-stratify the approximately 6% of patients that will not be accurately classified on the basis of a low preablation Tg alone. This is illustrated in a study by Phan and colleagues (32) in which two of 94 patients with an undetectable preablation Tg subsequently developed a detectable Tg and an additional three patients developed positive TgAb. Finally, the sensitivity of the techniques used to follow patients with thyroid cancer such as Tg measurement or neck ultrasound has improved steadily during the time periods encompassed by the included studies. Therefore, it is possible that patients felt to have NED at the time of their last visit could have subsequently been found to have persistent disease using assays and techniques with greater sensitivity. This is a limitation shared by any longitudinal study providing patient data collected sequentially over time.
In summary, our meta-analysis shows a high NPV of a preablation Tg value of less than 10 ng/ml in patients with DTC. In the context of a low-risk patient in whom RRA is being contemplated on the basis of pretest probability of persistent disease, the preablation Tg is a readily available and inexpensive tool to assist in this decision process. Additional prospective study is required to determine whether a low preablation Tg may be used to select patients for whom RRA can be avoided.
Acknowledgments
We thank the authors of included studies who graciously provided additional data, clarification, or reprints for this analysis including Dr. Hanna Pelttari (13), Dr. Carlos A. Benbassat (14), Dr. Cristina Familiar (7), Dr. Michael Tamilia and Dr. Jen-Der Lin (11), and Dr Pedro Rosario (16).
The views expressed in this manuscript are those of the authors and do not reflect the official policy of the Department of the Army or Navy, the Department of Defense, or the U.S. Government. We are military service members (or employee of the U.S. Government). This work was prepared as part of our official duties. Title 17 U.S.C. 105 provides that the “Copyright protection under this title is not available for any work of the United States Government.” Title 17 U.S.C. 101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person's official duties. We certify that all individuals who qualify as authors have been listed; each has participated in the conception and design of this work, the analysis of data (when applicable), the writing of the document, and/or the approval of the submission of this version; that the document represents valid work; that if we used information derived from another source, we obtained all necessary approvals to use it and made appropriate acknowledgments in the document; and that each takes public responsibility for it.
Disclosure Summary: The authors have nothing to disclose.
Abbreviations
- CI
Confidence interval
- DOR
diagnostic odds ratio
- DTC
differentiated thyroid cancer
- NED
no evidence of disease
- NPV
negative predictive value
- PPV
positive predictive value
- PTC
papillary thyroid cancer
- QUADAS-2
Quality Assessment of Studies of Diagnostic Accuracy Included in Systematic Reviews-2
- rhTSH
recombinant human TSH
- RRA
radioiodine remnant ablation
- sROC
summary receiver operating characteristic
- Tg
thyroglobulin
- TgAb
anti-Tg antibodies
- THW
thyroid hormone withdrawal.
References