FormalPara Key points for decision makers

As the efficacy of triple-therapy antiretroviral regimens remains consistently high, patient well-being (e.g., patient-reported outcomes [PROs]) has become an important differentiator between regimens.

We evaluated PROs in two studies comparing co-formulated bictegravir, emtricitabine, and tenofovir alafenamide (B/F/TAF) with co-formulated abacavir, dolutegravir, and lamivudine (ABC/DTG/3TC), where treatment differences were noted if prevalence was statistically significantly different at two or more time points in the adjusted logistic regression model or at one time point in the adjusted logistic regression model and in the longitudinal model.

Among treatment-naïve participants, initiating B/F/TAF was associated with lower prevalence of fatigue/loss of energy, dizzy/lightheadedness, nausea/vomiting, difficulty sleeping, and loss of appetite compared with ABC/DTG/3TC.

For virologically suppressed participants, switching to B/F/TAF was associated with lower prevalence of dizzy/lightheadedness, nausea/vomiting, sad/down/depressed, nervous/anxious, difficulty sleeping, and loss of appetite compared with remaining on ABC/DTG/3TC.

In both patient populations, no symptom had a greater prevalence with B/F/TAF compared with ABC/DTG/3TC.

1 Introduction

Over the past decade, antiretroviral treatments for HIV infection have demonstrated high potency, yielding viral suppression rates above 90%. Integrase strand inhibitors (INSTIs; e.g., bictegravir, dolutegravir, elvitegravir, and raltegravir) in combination with two nucleos(t)ide reverse transcriptase inhibitors (NRTIs) are currently recommended as initial treatment for HIV in many international guidelines [1,2,3]. As the success of antiretroviral therapy largely depends on patient adherence, fixed-dose combinations (FDCs) have emerged as preferred treatment regimens and are associated with less treatment failure [4]. The impact of FDCs on an individual’s overall health, including sense of wellbeing, has also become an important consideration for clinicians when prescribing HIV treatment.

Bictegravir (BIC, B) and dolutegravir (DTG) are INSTIs that do not require the co-administration of a pharmaco-enhancer (i.e., cobicistat or ritonavir) [5]. Both have been co-formulated with NRTIs as FDCs: BIC with emtricitabine (FTC, F) and tenofovir alafenamide (TAF) as B/F/TAF and DTG with abacavir (ABC) and lamivudine (3TC) as ABC/DTG/3TC. ABC/DTG/3TC is currently recommended as first-line antiretroviral therapy (ART) [1,2,3] and represents a logical, standard-of-care comparator for new INSTI-based FDCs. ABC requires HLA B*5701 testing prior to use, and the ABC/DTG/3TC co-formulation does not provide adequate treatment for hepatitis B in individuals co-infected with HIV [1, 3]. The US AIDS Clinical Trials Group (ACTG) 5202 study found a significantly shorter time to treatment modification and a greater incidence of nausea with the ABC/3TC NRTI backbone than with FTC/tenofovir disoproxil fumarate [6]. Furthermore, some reports have emerged showing associations between DTG and neuropsychiatric adverse events, while others have not [7,8,9,10,11,12,13].

BIC has a high barrier to resistance and a low potential for drug–drug interactions [14]. In four studies reported to date (two in treatment-naïve adults and two in virologically suppressed participants), the B/F/TAF FDC has shown similar high efficacy and tolerability to standard-of-care comparators, with no treatment-emergent resistance [15,16,17,18]. The primary outcome of all studies demonstrated non-inferiority of B/F/TAF to either ABC/DTG/3TC, DTG + F/TAF, or boosted protease-inhibitor (PI) regimens. Switching to B/F/TAF from ABC/DTG/3TC specifically has the potential to maintain high rates of suppression while avoiding limitations of ABC, such as potential cardiovascular risk [19,20,21], as well as central nervous system adverse effects and discontinuations that have been reported more frequently with DTG in clinical practice and cohort studies than in published results of clinical trials [9, 22, 23]. B/F/TAF also provides two NtRTIs with hepatitis B activity and the tablet is less than half the size of co-formulated ABC/DTG/3TC, which may improve acceptability amongst individuals and ability to use in pediatric populations [24,25,26, data on file]. B/F/TAF has been approved in the US and EU for treatment of HIV-1 infection in adults and adolescents and is now also recommended as first-line ART in current treatment guidelines [3].

Health-related quality-of-life (HRQL) outcomes have long been regarded as important in the evaluation and differentiation of treatment strategies [27]. To better understand the two FDCs of B/F/TAF and ABC/DTG/3TC, patient-reported outcomes (PROs), including one specifically designed for HIV-1-infected participants, were measured in both treatment-naïve and HIV-1-suppressed adults using previously validated questionnaires in comparative trials of these regimens. For treatment-naïve individuals, characterization of PROs provided the opportunity to assess the two regimens directly after initiating treatment. Comparison of the treatments in virologically suppressed individuals who were presumably tolerating their baseline regimen of ABC/DTG/3TC provided further delineation of how changes to medication may alter quality-of-life outcomes. We aimed to achieve a better understanding of the relationship between these treatments and bothersome HIV symptoms and/or adverse events that impact adherence and persistence.

2 Methods

2.1 Study Design

Study design and participant recruitment for both trials have been previously described [15, 17]. Briefly, both were international, double-blind, randomized, phase III studies that compared the efficacy and safety of B/F/TAF (50/200/25 mg) with ABC/DTG/3TC (600/50/300 mg) in adults who were either treatment naïve (GS-US-380-1489, ClinicalTrials.gov NCT02607930) or virologically suppressed (GS-US-380-1844, ClinicalTrials.gov NCT02603120). For both studies, post-baseline study visits occurred at weeks 4, 8, 12, 24, 36, and 48, with PRO measures administered at baseline and weeks 4, 12, and 48. Participants and investigators were masked to treatment allocation in both studies. Translated versions of all PRO tools were administered in the participant’s preferred language.

2.2 Baseline Demographics and Characteristics

Demographics (sex, age, race, and ethnicity) and clinical characteristics (serious mental illness, CD4 cell count, asymptomatic HIV infection status, years since HIV diagnosis, years since first antiretroviral therapy use (GS-US-380-1844 only), the Veterans Aging Cohort Study [VACS] Risk Index, and Fibrosis [FIB]-4 score) were collected or calculated. Serious mental illness was defined as a diagnostic history entered as existing medical history at the time of study entry of one or more of the following conditions based on medical chart review: major depression, anxiety, schizophrenia, bipolar disorder, post-traumatic stress disorder, or other serious mental illness. The VACS Index was calculated to quantify the overall mortality risk associated with HIV and is a summary score based on age, CD4 count, HIV-1 RNA, the FIB-4 score, creatinine, and coinfection with viral hepatitis C infection, and is used to predict all-cause and cause-specific mortality and other outcomes in those living with HIV infection [28]. The FIB-4 score, computed using age, platelet count, and aspartate and alanine transaminase values, provides an estimate of the degree of liver fibrosis in HIV and hepatitis C virus co-infected patients [29].

2.3 Patient-Reported Outcomes (PROs)

2.3.1 HIV Symptom Distress Module (HIV-Symptom Index [HIV-SI])

The dependent variable in this study was the HIV-Symptom Index (HIV-SI). The HIV-SI is a validated PRO instrument that assesses the burden of 20 common symptoms associated with HIV treatment or disease [30]. The instrument is considered to be the gold standard in contemporary HIV symptom research [31]. Respondents are asked about their experience with each of 20 symptoms during the past 4 weeks using a 5-point, Likert-type scale. Response options and scores are as follows: (0) “I don’t have this symptom;” (1) “I have this symptom and it doesn’t bother me;” (2) “I have this symptom and it bothers me a little;” (3) “I have this symptom and it bothers me;” (4) “I have this symptom and it bothers me a lot.”

The 20 symptoms comprising the HIV-SI are fatigue/loss of energy, difficulty sleeping, nervous/anxious, diarrhea/loose bowels, changes in body composition, feeling sad/down/depressed, bloating/pain/gas in stomach, muscle aches/joint pain, problems with sex, trouble remembering, headaches, pain/numbness/tingling in hands/feet, skin problems/rash/itching, cough/trouble breathing, fever/chills/sweats, dizzy/lightheadedness, weight loss/wasting, nausea/vomiting, hair loss/changes, and loss of appetite/food taste.

Consistent with prior analyses by Edelman and colleagues [32], we dichotomized symptoms into not bothersome (scores of 0 or 1) or bothersome (scores of 2, 3 and 4). The overall bothersome symptom count at baseline was generated by counting the number of individual symptoms scored as bothersome and used as a covariate in regression analyses and longitudinal modeling.

2.3.2 Other Descriptive PRO Measures

Other PRO instruments used to describe HRQL included the Short Form (SF)-36, the Pittsburgh Sleep Quality Index (PSQI), and the Work Productivity and Activity Impairment (WPAI)—General Health questionnaire. The SF-36 is an instrument supported by extensive evidence of good psychometric properties in a range of therapeutic areas [33, 34], including HIV-infected individuals [35]. The score ranges from 0 to 100, with a higher score indicating better functioning. For our analyses, we further transformed the physical component summary (PCS) and mental component summary (MCS) scores into norm-based scoring, with a mean of 50 and a standard deviation of 10 using the QualityMetric Health Outcomes™ Scoring Software 4.5 (QualityMetric Incorporated [now part of OPTUM], Lincoln, RI, USA). The PSQI is a validated, 19-item instrument that assesses sleep quality and disturbances over a 1-month time interval; the total score is dichotomized into poor or good sleep quality [36]. The total score ranges from 0 to 21, with a lower score indicating better sleep quality. The WPAI—General Health questionnaire is a 6-item tool that measures health-related, work productivity loss for the employed population as per the amount of absenteeism, presenteeism, and daily activity impairment attributable to general health [37]. For each of these measures, a lower value indicates better quality of life outcome. Additionally, the PCS and MCS from SF-36 were used as covariates in regression and longitudinal analyses of the HIV-SI.

2.4 Statistical Analysis

Analyses were performed using SAS v9.4M3 (SAS Institute, Cary, NC, USA). If multiple responses were provided for an item on the HIV-SI, the most severe (i.e., largest value) of the responses was used.

Baseline demographic and clinical characteristics were summarized using descriptive statistics. To determine between-treatment comparisons, we used the Fisher exact test for categorical data or a 2-sided Wilcoxon rank sum test for continuous data. The change from baseline in SF-36 PCS/MCS scores and WPAI scores were summarized using descriptive statistics, and a 2-sided Wilcoxon rank sum test was used for between-treatment comparisons. The prevalence (i.e., number and percentage of participants) of reported poor sleep quality on the PSQI was also summarized, and the Fisher exact test was used for between-treatment comparison. Unadjusted and adjusted analyses at weeks 4, 12, and 48 were performed to evaluate the relationship between treatment assignment and the probability of experiencing an HIV-SI symptom, with and without additional covariates (adjusted and unadjusted, respectively). Specifically, HIV-SI symptoms were modeled as binary outcomes using a logistic regression. Descriptive variables were evaluated for multicollinearity and models were fitted/tested with independent variables of treatment and a subset of clinical and demographic covariates (selected from a larger list of potential demographic and clinical variables). The subset of clinical and demographic covariates was selected in two steps. First, for each symptom/time point, stepwise model selection was used to identify statistically significant covariates from all potential covariates (i.e., all these baseline demographic and clinical characteristics listed in Table 1, baseline HIV-1 RNA, and baseline C4 cell counts). Second, all the sixty models (i.e., 20 symptoms × 3 visits) were reviewed and a single set of covariates were selected across all models to ensure easy interpretation and simplification of the model. Furthermore, the same model was generally used for both studies, with the exception of utilizing years since HIV diagnosis in Study 380-1844 only.

Table 1 Patient-reported baseline demographics and clinical characteristics

Longitudinal modeling was performed using generalized mixed-effects models to show symptom patterns over each of the four study visits using data from the HIV-SI. The functional form of the change pattern was assessed visually from the observed prevalence in each group. Linear and quadratic patterns were tested to determine optimal fit, ultimately favoring a linear function. Post-baseline data were modeled with baseline values included as a covariate. To assess the possibility that the effect of treatment may itself vary over time, the models included an interaction between study treatment and time in addition to the indicator of a simple study treatment group. Continuous variables were mean centered for ease of interpretation and model fit. The fit of each of the derived models was compared with a simple unadjusted model that included time and study treatment, along with a random intercept to account for the longitudinal nature of the data. The comparison was based on Bayesian information criterion.

Poor sleep quality according to the PSQI was analyzed in a similar manner as the HIV-SI, but only used baseline sleep quality (good vs poor) and baseline SF-36 MCS as covariates.

3 Results

3.1 Baseline Characteristics

Baseline demographics and clinical characteristics were similar between treatment groups (Table 1) in each study. Most participants were White and male. Participants reported a median of three to four bothersome HIV-SI symptoms at baseline, with 17–25% reporting no symptoms. Overall, there were no significant differences at baseline between those assigned B/F/TAF compared with ABC/DTG/3TC in any PRO measure (Table 1). Among the items captured by the HIV-SI for both treatment-naïve and virologically suppressed adults, the only significant difference between treatment groups was more weight loss/wasting reported at baseline among those assigned B/F/TAF versus ABC/DTG/3TC (18.3 vs 11.3%; p = 0.017) in treatment-naive participants (study 380-1489) (Tables 2, 3).

Table 2 Frequency of bothersome HIV symptoms by treatment and study visit in Study GS-US-380-1489 (treatment-naïve)
Table 3 Frequency of bothersome HIV symptoms by treatment and study visit in Study GS-US-380-1844 (HIV-1 suppressed)

3.2 Changes in PRO Measures from Baseline to Week 48

In treatment-naive adults (Study 380-1489), median [IQR] SF-36 PCS and MCS scores were comparable between the B/F/TAF and ABC/DTG/3TC groups at baseline (57.4 [52.6–60.0] vs 56.6 [52.2–59.3], p = 0.18 and 49.0 [37.7–55.2] vs 49.5 [40.0–56.3], p = 0.40, respectively). Changes from baseline at week 48 in PCS and MCS scores were also similar between groups; median (IQR) change from baseline in PCS was 0.1 (− 3.3 to 3.1) versus 0.2 (− 2.6 to 2.8), p = 0.85 and in MCS was 2.3 (− 1.6 to 9.0) versus 2.1 (− 4.0 to 7.0), p = 0.090. At baseline, 48.1 and 47.6% of participants on B/F/TAF and ABC/DTG/3TC, respectively, reported poor sleep quality on the PSQI (p = 0.93). At week 48, the proportion had decreased to 38.7 and 41.5% of participants, respectively (p = 0.50). The WPAI questionnaire did not show any differences between treatment groups at baseline or any post-baseline time point.

In virologically suppressed adults (Study 380-1844), median [IQR] SF-36 PCS and MCS scores were comparable between the B/F/TAF and ABC/DTG/3TC groups at baseline (55.5 [50.5–59.1] vs 56.6 [51.0–59.2], p = 0.31 and 51.9 [44.5–57.5] vs 53.2 [46.6–57.6], p = 0.14, respectively). At week 48, median [IQR] PCS and MCS scores were largely unchanged from baseline (PCS change: − 0.4 [− 3.6 to 2.7] vs 0.2 [− 2.3 to 2.7], p = 0.17; MCS change: median [IQR]: 0.3 [− 3.0 to 4.6] vs 0.1 [− 3.9 to 3.5], p = 0.13). At baseline, 51.9 and 46.1% of adults on B/F/TAF and ABC/DTG/3TC, respectively, reported poor sleep quality on the PSQI (p = 0.19). At week 48, the proportion had decreased to 41.8 and 45.7% of individuals, respectively (p = 0.42).

The WPAI questionnaire did not show any differences between treatment groups in any of the four measures at baseline or any post-baseline time point.

3.3 Associations Between HIV-SI Bothersome Symptoms and Treatment in Logistic Regression Models and Longitudinal Analyses

The association between treatment and each bothersome symptom from the HIV-SI was examined by logistic regression models and longitudinal analyses. In the final models, treatment group (B/F/TAF vs ABC/DTG/3TC) was the independent variable and covariates included age, sex, race (White vs non-White), baseline bothersome symptom count, VACS Index score, medical history of serious mental illness (yes vs no), baseline SF-36 PCS, baseline SF-36 MCS, and years since HIV diagnosis (for the suppressed study only).

In treatment-naive participants, the adjusted logistic regression models showed that B/F/TAF was associated with a lower risk of experiencing four bothersome symptoms at two or more time points (three symptoms included week 48): fatigue/loss of energy, dizzy/lightheadedness, nausea/vomiting, and difficulty sleeping. In virologically suppressed adults, the adjusted logistic regression models showed that B/F/TAF was associated with a lower risk of experiencing three symptoms at two or more time points (all symptoms included week 48): nausea/vomiting, sad/down/depressed, and nervous/anxious. A complete table of the unadjusted and adjusted logistic regression models appears as Table ESM1 in the Electronic Supplementary Material.

The prevalence of bothersome symptoms over time was evaluated using mixed-effects logistic models adjusted for the same covariates as specified above and also included time and time-by-treatment in the model. In treatment-naïve adults, the adjusted longitudinal models revealed a statistically significant difference in the prevalence of three bothersome symptoms (fatigue/loss of energy, nausea/vomiting, and loss of appetite) between the B/F/TAF and ABC/DTG/3TC groups over time, all favoring the B/F/TAF group. Three additional symptoms also showed that the effect over time was dependent on treatment group. These were headaches, bloating/pain/gas in stomach, and changes in body composition, each with lower prevalence over 48 weeks for B/F/TAF. In virologically suppressed participants, the adjusted longitudinal models revealed a statistically significant difference between treatments over time (favoring B/F/TAF over ABC/DTG/3TC) in the prevalence of six symptoms: dizzy/lightheadedness, nausea/vomiting, sad/down/depressed, nervous/anxious, difficulty sleeping, and loss of appetite. No symptom favored the ABC/DTG/3TC group.

No covariate was significant in all symptom models; however, the presence of the bothersome symptoms at baseline, the HIV-SI symptom count at baseline, and SF-36 PCS score at baseline were significant for most symptom models. Each covariate was significant in at least one model. A complete table showing the coefficients, including findings for main effects and time by treatment interactions is provided in Table ESM2 of the Electronic Supplementary Material.

Tables 4 and 5 summarize the results for symptoms with statistically significant findings in the regression and/or longitudinal analyses. Fig. 1a and b show the observed prevalence of each symptom from Table 3 over time by treatment group that was statistically significantly different at two or more time points in the adjusted logistic regression model or at one time point in the adjusted logistic regression model and in the longitudinal model.

Table 4 Summary of results from adjusted logistic regression analyses at weeks 4, 12, and 48 and longitudinal analyses in Study GS-US-380-1489 (treatment-naïve)
Table 5 Summary of results from adjusted logistic regression analyses at weeks 4, 12, and 48 and longitudinal analyses in Study GS-US-380-1844 (HIV-1 suppressed)
Fig. 1
figure 1

a Prevalence of bothersome symptoms over time by treatment group in Study GS-US-380-1489 (Treatment-Naïve). b Prevalence of bothersome symptoms over time by treatment group in Study GS-US-380-1844 (HIV-1 Suppressed)

For treatment-naïve adults, the proportion of participants with poor sleep quality according to the PSQI was numerically less for B/F/TAF than AB/DTG/3TC at all postbaseline time points, with a significantly significant difference noted at week 12 (favoring B/F/TAF) in the adjusted logistic regression model. For virologically suppressed participants, a significant treatment difference was noted at weeks 4 and 12 in the adjusted logistic regression model as well as in the longitudinal model (all favoring B/F/TAF). Results from the PSQI are shown in Tables 4 and 5 as well as plotted in Fig. 1b.

4 Discussion

Our analyses represent the first prospective, randomized, double-blind comparison of patient-reported symptoms among HIV-1-infected adults who were either treatment naïve or virologically suppressed and randomized to receive B/F/TAF versus ABC/DTG/3TC. The results indicate that B/F/TAF was associated with fewer patient-reported bothersome symptoms over 48 weeks, including nausea and some neuropsychiatric symptoms.

Descriptive analyses showed that differences between B/F/TAF and ABC/DTG/3TC appeared as early as week 4 and were generally maintained through 48 weeks. In treatment-naïve participants, the greatest differences between the two treatment groups were seen in reports of nausea/vomiting, difficulty sleeping, fatigue/loss of energy, and dizzy/lightheadedness (Table 4). In virologically suppressed adults, nausea/vomiting and difficulty sleeping were again observed to be reported by a significantly greater proportion of those receiving ABC/DTG/3TC, as were the symptoms of feeling sad/down/depressed and nervous/anxious (Table 5). Sleep findings on the HIV-SI were supported by findings of poor sleep quality on the PSQI. Both unadjusted/adjusted and longitudinal analyses were reported here, since unadjusted/adjusted logistic regression models at fixed time points are useful for assessing treatment differences at the first postbaseline visit (week 4) and at the primary analysis time point (week 48), while the longitudinal model aims to assess treatment differences over time. This provides a collective assessment of all of the time points, using the individual’s responses at all time points (and the associated variability) to provide a singular assessment. The longitudinal analysis has the advantage of distilling the numerous time points into a single value, albeit from a more complicated model. As both the fixed time point and the longitudinal analyses seek to answer different (but related) questions, they both provide insights into the data. As shown in Tables 4 and 5, some symptoms may be significant using fixed time-point models, but not in the longitudinal model. This is expected as the longitudinal model looks across all time points. A marginal finding at a single time point may not be significant when considering all time points. Importantly, no models showed significant treatment differences in the longitudinal model without at least one fixed time-point model showing significance as well. However, it should be noted that there were no large difference in PCS and MCS, aggregate measures from the SF-36, over time for either regimen.

The use of PRO tools within these studies provides insight into patient-reported symptoms that may be underreported by individuals during standard screening for adverse drug events [38]. Others have suggested increasing the use of PRO tools in clinical research in order to differentiate in greater detail the benefit of various regimens [31, 39, 40]. Additionally, the use of longitudinal modeling allows for a greater understanding of the prevalence of HIV symptoms over time.

As the efficacy of well tolerated antiretroviral FDC regimens in clinical trials reaches above 90%, it is important to demonstrate the comparative impact of various FDC options on HRQL measures and the degree to which they are perceived by the individual as being bothersome. Tolerability and acceptance of long-term medication affect adherence to and persistence with that therapy. Adherence to antiretroviral treatments has been shown to improve outcomes and lower healthcare costs. Results from a real-world database analysis found that individuals who were 95% or more adherent to therapy had lower hospitalization rates and associated healthcare costs compared with those who had lower adherence rates [41]. Similarly, increased cost has been associated with adverse events. Among 11 adverse events (rash, nausea or vomiting, diarrhea, dizziness, headache, sleep-related symptoms, hepatotoxicity, lipid disorders, depression, anxiety, and suicide or self-injury) observed with NRTI use, sleep-related adverse events had the highest outpatient care cost, with a mean per-episode cost of US$6438 (IQR 615–5882; median US$1785). Over a 5-year time period, sleep-related adverse events accounted for the second highest overall mean healthcare cost of US$8307, with only nausea and vomiting exceeding it with US$12,833 [42].

There are limitations to this analysis that should be considered. While the results of the two studies are generalizable to a broad patient population as both treatment-naïve and virologically suppressed adults were evaluated, they may not be generalizable to those who are not White males as the majority of the study participants were male and White. Furthermore, unlike investigator-reported symptoms, those reported by study participants are not graded using a standard scheme. Therefore, the severity of participant-reported symptoms may not be clear. However, the Likert scale did provide a sense of the degree to which a symptom was considered bothersome. Arguably, any symptom perceived to be bothersome to any extent is undesirable. Lastly, generalizability of study findings must be done with caution as the current study populations were relatively healthy with few co-morbidities; those who switched to B/F/TAF were stably suppressed with no tolerability issues to their antiretroviral therapy at baseline.

5 Conclusions

Overall, these studies demonstrated that over approximately 1 year of follow-up after initiating or switching antiretroviral treatments, B/F/TAF was associated with fewer bothersome symptoms, especially nausea and vomiting, as well as sleep difficulties, than with ABC/DTG/3TC. Patient-reported symptoms may be an important consideration when selecting among highly efficacious options for the treatment of HIV infection and should continue to be studied in clinical trials.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are not publicly available due to the fact that they are proprietary intellectual property of the Sponsor (Gilead Sciences), but are available from the corresponding author on reasonable request.