Abstract

Timing of human immunodeficiency virus-1 (HIV-1) reservoir formation is important for informing HIV cure efforts. It is unclear how much of the variability seen in dating reservoir formation is due to sampling and gene-specific differences. We used a Bayesian extension of root to tip regression (bayroot) to reestimate formation date distributions in participants from Swedish and South African cohorts, and assessed the impact of variable timing, frequency, and depth of sampling on these estimates. Significant shifts in formation date distributions were only observed with use of faster-evolving genes, while timing, frequency, and depth of sampling had minor or no significant effect on estimates.

Human immunodeficiency virus-1 (HIV-1) integrates into the genome of host cells and can evade immune clearance by persisting in a nonproductive latent state despite antiretroviral therapy (ART), precluding a cure [1–3]. Collectively, these cells are called the latent HIV-1 reservoir (LVR). The timing of LVR formation is important as it may inform strategies for limiting or reversing LVR formation, particularly if the majority of the LVR is formed around the time of ART initiation, as some studies have suggested [4–6]. It is clear that reservoir formation begins almost immediately after infection, as viremia will rebound if ART is stopped, even in early treated individuals [7, 8]. The LVR is also augmented throughout viremia; however, there are varying data on the rate and level that latently infected cells are accrued throughout viremia [9, 10], or if ART initiation significantly alters the host environment to favor establishment of the majority of the reservoir [4–6, 11]. The latter of these patterns could be caused by an increase in short-lived reservoir cells (eg, half-life 2 weeks) that transition to long-lived cells upon ART exposure [4]. Interestingly, some studies where continuous reservoir establishment was more common include a few individuals whose reservoir largely dates to ART initiation [10]. Another study where the reservoir largely dated to ART initiation also includes a few individuals whose reservoir largely dates before ART [4].

While the above variations suggest underlying biological differences, it is unclear how much of the variability in reservoir formation dates is due to methodological differences, as opposed to differences in the populations examined in the different studies. For example, while some studies sequenced single HIV gene regions, others sequenced 2 regions, multiple discontinuous regions, or multiple continuous regions [4–6, 9, 11]. These gene regions experience different selection pressures and rates of evolution [12]. Also, participants were sampled over different timespans before and during ART, with different timing, frequency, and depth. Therefore, it is important to measure the impact of these sampling variations on reservoir formation dates estimated using the same analytical method. Typically, reservoir dating studies either map reservoir DNA sequences to pre-ART RNA sequences to generate discrete date distributions, or calibrate a molecular clock (regression) from RNA sequences and use it to predict continuous formation date estimates of reservoir DNA sequences whose evolution stops upon effective ART initiation. Here, we report the impact of using different gene regions to estimate reservoir formation dates, and the impact of variable timing, frequency, and depth of sampling before and during ART.

METHODS

Participants

We selected data from participants from 2 cohorts with estimated dates of seroconversion and multiple, well-distributed pre-ART sampling—a Swedish cohort as published in Brodin et al, 2016 [6], and the CAPRISA cohort in South Africa as published in Abrahams et al, 2019 [4]. In addition, participants in the Swedish cohort had 2 to 3 reservoir samples collected between 2 and 18 years during ART, while participants in the CAPRISA cohort had 1 reservoir sample collected between 4 and 6 years during ART initiation (Supplementary Figure 1).

Estimation of Formation Dates

We applied the R package bayroot to estimate reservoir formation dates [13]. Briefly, bayroot is a Bayesian extension of root to tip regression which enables the user to incorporate prior information about the time of infection and start of ART, and generate a continuous distribution of reservoir formation date estimates [13]. We ran bayroot with a lognormal prior for the molecular clock (initial rate = 0.01) and a uniform prior for the root location. We ran a total of 1e5 estimation chain steps and sampled every 100th step after discarding the first 10 steps as burn-in. We previously validated this method in the CAPRISA cohort (including running replicate chain samples to assess repeatability) and found predominant reservoir establishment around ART initiation, similar to what the authors reported but different from continuous reservoir establishment observed in most participants from the Rakai cohort in Uganda [10].

Assessing the Impact of Gene Region on Formation Date Estimates

We compared formation dates estimated from a fast- versus a moderate-evolving gene region (ie, C2C3 region of gp120 vs the C4C5 region of gp120 with the hypervariable loop V5 trimmed); fast- versus a slow-evolving gene region (ie, C2C3 vs the p17 region of gag); and a moderate- versus a relatively slow-evolving gene region (eg, C4C5 vs p17). In each case (fast vs moderate, fast vs slow, moderate vs slow), we included participants who had sequences for both gene regions. Differences in evolution rates across the HIV genome have previously been reported by our group and others [10, 12].

Assessing the Impact of Timing, Frequency, and Depth of Sampling on Formation Date Estimates

We estimated dates using C2C3 sequences of participants from the CAPRISA cohort and the p17 sequences of participants from the Swedish cohort. Participants with C2C3 sequences were selected for the CAPRISA cohort as this gene region has the highest evolution of all regions sequenced in this cohort [10], while all participants in the Swedish cohort had p17 sequences. A generalized estimating equation was used to estimate the impact of timing of sampling (time between infection and first pre-ART sample, time between last pre-ART sample and ART initiation, and time between ART and first sample during ART); frequency of sampling (average samples per year before ART, and average samples per year during ART); and depth of sampling (number of sequences in a pre-ART sample). These variables were assessed to rule out collinearity prior to inclusion in the model. We adjusted for multiple estimates in the same individual and for gene region used or cohort effect. Where statistical significance was observed in the model, we further performed secondary analyses at individual level, creating counterfactual datasets by omitting some pre-ART or post-ART samples or sequences and the resulting distributions were compared to distributions from complete datasets for each participant.

RESULTS

Impact of Gene Region on HIV Formation Date Estimates

Six CAPRISA cohort participants had estimates from both the sequences of C2C3 (fast-evolving region) and C4C5 (moderate-evolving region), five had estimates from both the sequences of C2C3 and p17 (relatively slow-evolving region), and 5 had estimates from both the sequences of C4C5 and p17. Using a fast-evolving gene region (C2C3) significantly shifted date estimates much closer to ART initiation compared to using a moderate-evolving (C4C5) or slow-evolving gene region (p17; Figure 1).

Distribution of HIV-1 reservoir formation dates estimated using the bayroot package in R, applied to sequences for a gene region with high versus moderate evolution (C2C3 vs C4C5 regions of gp120), fast versus slow evolution (C2C3 vs p17 region of gag in the matrix), and moderate versus slow evolution (C4C5 vs p17). Each stripchart represents pooled formation dates for reservoir sequences for all participants included in each case, with each reservoir sequence contributing100 bayroot estimates to the overall formation date distribution. The overlying boxplots show medians and interquartile ranges, and the whiskers around the boxplots show Tukey's thresholds for outlier values. Time -1 represents time of infection while time 0 represents ART initiation. Participants included in this analysis were from the CAPRISA cohort and the IDs of participants included in each pairwise comparison are shown below the respective headers. The P values were estimated using the non-parametric Wilcoxon rank sum test for medians. Abbreviations: ART, antiretroviral therapy; HIV-1, human immunodeficiency virus-1.
Figure 1.

Distribution of HIV-1 reservoir formation dates estimated using the bayroot package in R, applied to sequences for a gene region with high versus moderate evolution (C2C3 vs C4C5 regions of gp120), fast versus slow evolution (C2C3 vs p17 region of gag in the matrix), and moderate versus slow evolution (C4C5 vs p17). Each stripchart represents pooled formation dates for reservoir sequences for all participants included in each case, with each reservoir sequence contributing100 bayroot estimates to the overall formation date distribution. The overlying boxplots show medians and interquartile ranges, and the whiskers around the boxplots show Tukey's thresholds for outlier values. Time -1 represents time of infection while time 0 represents ART initiation. Participants included in this analysis were from the CAPRISA cohort and the IDs of participants included in each pairwise comparison are shown below the respective headers. The P values were estimated using the non-parametric Wilcoxon rank sum test for medians. Abbreviations: ART, antiretroviral therapy; HIV-1, human immunodeficiency virus-1.

Impact of Timing of Sampling on Formation Date Estimates

Eight participants from the CAPRISA cohort (CAP188, CAP217, CAP257, CAP287, CAP288, CAP302, CAP316, and CAP336), and all 10 participants from the Swedish cohort were included. Missing early pre-ART viremic samples or early post-ART reservoir samples resulted in only a small shift (<20%) of date estimates closer to the time of infection (Figure 2). In secondary analyses at an individual level, we similarly observed only small changes in formation date distributions (2 of 18 individuals) when some pre-ART samples were omitted (Supplementary Figure 2).

Percent shift in the distribution of HIV-1 reservoir formation dates due to change in the timing, frequency, or depth of sampling. The coefficients and confidence intervals were obtained from a generalized estimating equation, adjusting for the gene region sequenced or cohort effect. In the model, integration dates estimated from the bayroot package in R were scaled between 0 and 1, representing the time of infection and time of ART initiation, respectively. Similarly, numeric independent variables were scaled between 0 and 1, representing the minimum and maximum values for that variable, respectively. Scaling was used to compare across individuals with different durations of viremia and to achieve a more stable model. Abbreviations: ART, antiretroviral therapy; HIV-1, human immunodeficiency virus-1.
Figure 2.

Percent shift in the distribution of HIV-1 reservoir formation dates due to change in the timing, frequency, or depth of sampling. The coefficients and confidence intervals were obtained from a generalized estimating equation, adjusting for the gene region sequenced or cohort effect. In the model, integration dates estimated from the bayroot package in R were scaled between 0 and 1, representing the time of infection and time of ART initiation, respectively. Similarly, numeric independent variables were scaled between 0 and 1, representing the minimum and maximum values for that variable, respectively. Scaling was used to compare across individuals with different durations of viremia and to achieve a more stable model. Abbreviations: ART, antiretroviral therapy; HIV-1, human immunodeficiency virus-1.

Impact of Frequency of Sampling on Formation Date Estimates

The above 8 participants from the CAPRISA cohort and 10 participants from the Swedish cohort were included. The frequency of pre- or post-ART sampling showed no significant effects on date estimates (Figure 2). In the Swedish cohort where most participants had 2 or 3 reservoir samples, using sequences from single reservoir samples sometimes (not always) resulted in deviations in the distributions compared to using sequences from all available reservoir samples (P value < .05). But qualitatively, the deviations were small (fairly good overlap with estimates from multiple reservoir samples) and not unidirectional (Supplementary Figure 4).

Impact of Depth of Sampling on Formation Date Estimates

The same 8 participants from the CAPRISA cohort and 10 participants from the Swedish cohort were included. Deep sampling (more pre-ART sequences) resulted in only a small shift (<20%) of date estimates closer to ART initiation (Figure 2). In secondary analyses at an individual level, we similarly observed only small changes in formation date distributions (4 out of 18 individuals) when a proportion of pre-ART sequences at each time point were omitted (Supplementary Figure 3).

DISCUSSION

Ideally, estimation of HIV reservoir formation dates requires regular and deep sampling during viremia to capture viral evolution, and regular and deep sampling of the reservoir during ART as cells harboring integrated proviruses are rare. However, the nature of cohorts means that sampling times and regularity will vary greatly so understanding the influences of these on reservoir dating is critical. For example, some cohorts had regular pre-ART sampling but limited sampling of the reservoir [4, 5, 11], while others had more frequent reservoir sampling but less pre-ART sampling, for example, the Rakai cohort [10] (Supplementary Figure 1). While additional sampling of the reservoir is possible, cohorts with frequent pre-ART sampling are rare in the era of immediate treatment upon diagnosis. We therefore need to be certain if and when less intense sampling (especially pre-ART) can still be useful. Previously, Jones et al demonstrated that formation date estimation was possible using root to tip regression from as few as 2 well-spaced viremic samples [9].

Our findings demonstrate that although the timing and depth of sampling impact formation date estimates, the shifts are small and unlikely to result in qualitative changes of conclusions. We therefore believe that useful insights may still be drawn from cohorts with less intense, but well-spaced sampling, especially during viremia. However, as reservoir sequences in a single peripheral blood sample are usually few and often identical (not informative in formation date estimation), repeat reservoir sampling may improve formation date estimates. But if substantial unique sequences can be recovered in a single reservoir sample, these can predict reservoir formation quite well as demonstrated in the Swedish cohort (Supplementary Figure 4). Otherwise, there is currently no reliable analytical remedy for insufficient sampling of the HIV reservoir.

Lastly, in the absence of full-length HIV genomes or sequences of multiple gene regions (eg, as in the CAPRISA cohort [4]), most reservoir dating studies rely on 1 or 2 short HIV gene regions to represent evolution in the full genome from which they were derived. Our findings suggest that using gene regions that evolve at a rate much faster than the average rate for the whole genome would significantly bias formation date estimates closer to ART initiation. Use of codon-aware models or models using select codon positions would be a potential solution, but in the faster-evolving regions like env this is still limited by frequent selective sweeps, which cause saturation of diversity in chronic infection and loss of the temporal signal, irrespective of codon position used [14]. One remedy is to use the direction and magnitude of potential bias reported here for interpretation or refining estimates, or to sequence gene regions with more moderate rates of evolution as these may yield more representative estimates.

Supplementary Data

Supplementary materials are available at The Journal of Infectious Diseases online (http://jid.oxfordjournals.org/). Supplementary materials consist of data provided by the author that are published to benefit the reader. The posted materials are not copyedited. The contents of all supplementary data are the sole responsibility of the authors. Questions or messages regarding errors should be addressed to the author.

Notes

Acknowledgments. We thank the CAPRISA cohort investigators for providing more processed viral sequences and further descriptions of their metadata.

Author contributions. E. N. K. contributed conceptualization, analysis, and writing. A. D. R. contributed conceptualization and review. A. F. Y. P. contributed software and review. T. C. Q., L. W. C., and J. L. P. contributed review.

Data availability. The original viral sequences from the CAPRISA cohort are available via Genbank accession numbers MN097551MN097697 and additional metadata are available in the original paper [4]. Viral sequences from the Swedish cohort are available via the European Nucleotide Archive (study accession number PRJEB13841; sample accession numbers ERS1138001–ERS1138025) and additional metadata are available via github and in the original paper [6, 15].

Financial support. This work was supported in part by the Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health and by grant number UM1 AI164565, and Research Enterprise to Advance a Cure for HIV, Martin Delaney Collaboratories for HIV Cure Research grant number UM1 AI164565; and by the Canadian Institutes of Health Research (grant number PJT-155990). E. K. received training and salary support from amfAR, The Foundation for AIDS Research (grant number 109844); and the Fogarty International Center (grant number D43TW010557). J. L. P. was supported by the Canada Research Chairs Program, the Canada Foundation for Innovation John R. Evans Leaders Fund; and the Ministry of Colleges and Universities Ontario Research Fund (Small Infrastructure Fund).

References

1

Chun
 
T-W
,
Finzi
 
D
,
Margolick
 
J
,
Chadwick
 
K
,
Schwartz
 
D
,
Siliciano
 
RF
.
In vivo fate of HIV-1-infected T cells: quantitative analysis of the transition to stable latency
.
Nat Med
 
1995
;
1
:
1284
90
.

2

Finzi
 
D
,
Hermankova
 
M
,
Pierson
 
T
, et al.   
Identification of a reservoir for HIV-1 in patients on highly active antiretroviral therapy
.
Science
 
1997
;
278
:
1295
300
.

3

Wong
 
JK
,
Hezareh
 
M
,
Günthard
 
HF
, et al.   
Recovery of replication-competent HIV despite prolonged suppression of plasma viremia
.
Science
 
1997
;
278
:
1291
5
.

4

Abrahams
 
M-R
,
Joseph
 
SB
,
Garrett
 
N
, et al.   
The replication-competent HIV-1 latent reservoir is primarily established near the time of therapy initiation
.
Sci Transl Med
 
2019
;
11
:
eaaw5589
.

5

Pankau
 
MD
,
Reeves
 
DB
,
Harkins
 
E
, et al.   
Dynamics of HIV DNA reservoir seeding in a cohort of superinfected Kenyan women
.
PLoS Pathog
 
2020
;
16
:
e1008286
.

6

Brodin
 
J
,
Zanini
 
F
,
Thebo
 
L
, et al.   
Establishment and stability of the latent HIV-1 DNA reservoir
.
eLife
 
2016
;
5
:
e18889
.

7

Persaud
 
D
,
Patel
 
K
,
Karalius
 
B
, et al.   
Age at virologic control influences peripheral blood HIV reservoir size and serostatus in perinatally-infected adolescents
.
JAMA Pediatr
 
2014
;
168
:
1138
46
.

8

Chavez
 
L
,
Calvanese
 
V
,
Verdin
 
E
.
HIV latency is established directly and early in both resting and activated primary CD4 T cells
.
PLoS Pathog
 
2015
;
11
:
e1004955
.

9

Jones
 
BR
,
Kinloch
 
NN
,
Horacsek
 
J
, et al.   
Phylogenetic approach to recover integration dates of latent HIV sequences within-host
.
Proc Natl Acad Sci U S A
 
2018
;
115
:
E8958
67
.

10

Kankaka
 
EN
,
Redd
 
AD
,
Khan
 
A
, et al.   
Dating reservoir formation in virologically suppressed people living with HIV-1 in Rakai, Uganda
.
Virus Evol
 
2023
;
9
:
vead046
.

11

Brooks
 
K
,
Jones
 
BR
,
Dilernia
 
DA
, et al.   
HIV-1 variants are archived throughout infection and persist in the reservoir
.
PloS Pathog
 
2020
;
16
:
e1008378
.

12

Alizon
 
S
,
Fraser
 
C
.
Within-host and between-host evolutionary rates across the HIV-1 genome
.
Retrovirology
 
2013
;
10
:
49
.

13

Ferreira
 
R-C
,
Wong
 
E
,
Poon
 
AFY
.
Bayroot: Bayesian sampling of HIV-1 integration dates by root-to-tip regression
.
Virus Evol
 
2023
;
9
:
veac120
.

14

Puller
 
V
,
Neher
 
R
,
Albert
 
J
.
Estimating time of HIV-1 infection from next-generation sequence diversity
.
PloS Comput Biol
 
2017
;
13
:
e1005775
.

15

Neherlab
. neherlab/HIVEVO_reservoir. 2023. https://github.com/neherlab/HIVEVO_reservoir. Accessed 27 January 2024.

Author notes

Potential conflicts of interest. All authors: No reported conflicts of interest. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Conflicts that the editors consider relevant to the content of the manuscript have been disclosed.

This work is written by (a) US Government employee(s) and is in the public domain in the US.

Supplementary data