Subjective well-being (SWB) refers to being satisfied with one's life, having positive affect and having little negative affect. We may understand it as a subjective definition of good life, or in colloquial terms “happiness”, and it has been associated with several important benefits such as lower mortality. In the last decades, several randomized controlled trials (RCT) have investigated the efficacy of several interventions in increasing SWB in the general population but results from different disciplines have not been integrated.
MethodsWe conducted an umbrella review of systematic reviews and meta-analyses of RCT that assess the efficacy of any kind of interventions in increasing SWB in the general population, including both positive psychology interventions (PPI) and other interventions. We (re)calculated the meta-analytic statistics needed to objectively assess the quality of the evidence of the efficacy of each type of intervention in improving each component of SWB according to the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach.
ResultsThere was moderate-quality evidence that PPI might induce small decreases of negative affect, and low-quality evidence that they might induce moderate increases of positive affect. We found similar results for those PPI specifically consisting in conducting acts of kindness (especially spending money on or giving items to others), for which there was low-quality evidence that they might induces small increases of life satisfaction, but not for PPI specifically consisting in practicing gratitude. Quality of the evidence of the efficacy for the other interventions included in the umbrella review (yoga, resilience training, physical activity, leisure, control enhancement, psychoeducation, and miscellaneous) was very low.
ConclusionThere is some evidence that PPI, and specially conducting acts of kindness such as spending money on others, may increase the SWB of the general population. The quality of the evidence of the efficacy for other interventions (e.g., yoga, physical activity, or leisure) is still very low.
Registration number: PROSPERO CRD42020111681.
El bienestar subjetivo (BS) se refiere a estar satisfecho con la vida, tener afecto positivo y tener poco afecto negativo. Podemos entenderlo como una definición subjetiva de la buena vida, o en términos coloquiales, como «felicidad», y se ha asociado con varios beneficios importantes, como una menor mortalidad. En las últimas décadas, varios ensayos controlados aleatorizados (ECA) han investigado la eficacia de varias intervenciones para aumentar el BS en la población general, pero los resultados de las diferentes disciplinas no se han integrado.
MétodosRealizamos una revisión paraguas de revisiones sistemáticas y metaanálisis de ECA que evaluasen la eficacia de cualquier tipo de intervención para aumentar el BS en la población general, incluidas tanto las intervenciones de psicología positiva (IPP) como otras intervenciones. (Re)calculamos los estadísticos metaanalíticos necesarios para evaluar objetivamente la calidad de la evidencia de la eficacia de cada tipo de intervención para mejorar cada componente del BS de acuerdo con el Grading of Recommendations Assessment, Development and Evaluation (GRADE).
ResultadosHubo evidencia de moderada calidad de que las IPP podrían inducir pequeñas disminuciones de afecto negativo, así como evidencia de baja calidad de que podrían inducir aumentos moderados de afecto positivo. Encontramos resultados similares para aquellas IPP que consistían específicamente en realizar actos de bondad (especialmente gastar dinero en o dar artículos a otros), para las cuales había evidencia de baja calidad de que podrían inducir pequeños aumentos de satisfacción con la vida, pero no para las IPP que consistían específicamente en practicar la gratitud. La calidad de la evidencia de la eficacia para las otras intervenciones incluidas en la revisión paraguas (yoga, entrenamiento de resiliencia, actividad física, ocio, mejora del control, psicoeducación y miscelánea) fue muy baja.
ConclusiónExiste alguna evidencia de que las IPP, y especialmente la realización de actos de bondad como gastar dinero en otros, pueden aumentar el BS de la población general. La calidad de la evidencia de la eficacia para otras intervenciones (p.ej., yoga, actividad física u ocio) sigue siendo muy baja.
Número de registro: PROSPERO CRD42020111681.
Ed Diener defined subjective well-being (SWB) as an overall satisfaction with one's life, having many pleasant emotions and moods (positive affect), and having few unpleasant ones (negative affect).1 It may be understood as a subjective definition of good life, or colloquially as “happiness”, but we will avoid these terms because they may have other meanings that are out of the scope of the present work. In the following we will refer to the definition of SWB presented above, known as the Diener tripartite model.
A high SWB has been associated with several important positive outcomes, such as substantially lower risk of exercise stress-induced myocardial ischemia (logistic regression-derived odds ratio=0.55)2 or lower mortality (hazard ratio=0.92 when comparing the group with highest SWB with the group with lowest SWB).3 SWB has also been linked to higher self-esteem and self-efficacy, and it is a relevant component of mental health.4–7 Some abnormalities of SWB (e.g., depressed mood and anxiety as extreme forms of few pleasant emotions and many unpleasant emotions) are core symptoms of mood/anxiety disorders (the most common mental/psychiatric disorders),8–10 and a component of many other mental/psychiatric disorders (e.g., schizophrenia is characterized by decreased ability to experience pleasure11). Lower SWB has indeed been associated to higher suicidal ideation and behavior.12,13
SWB consists of three main components: positive affect, negative affect and life satisfaction1 Positive affect refers to pleasant emotions and moods, such as interest, enthusiasm or pride, and negative affect to unpleasant ones, such as guilt, irritability or fear.14 Life satisfaction refers to thinking that one's life is close to one's ideal, that the important things one wants in life have been already achieved.15 Positive and negative affect are sometimes combined in an “affective” component to distinguish them from the “cognitive” component (life satisfaction). All components show a weak to moderate correlation,16,17 for what the improvement of one component (e.g., having more positive emotions) may be accompanied by improvements of other components (e.g., feeling more satisfied). However, this should not be taken for granted. There are common exceptions where individuals with “many” pleasant emotions also have “many” unpleasant emotions or feel unsatisfied. Or similarly, individuals with a depressive episode (associated with high negative affect and low positive affect) may also feel “high” (which should be considered as positive affect), known as mixed features.11 It is thus more desirable to study each component separately.1
Individuals with mental disorders could receive interventions that increase SWB18 but the promotion of mental health should also include interventions that increase SWB in healthy individuals. Note that the aims of mental health prevention interventions include targeting risk factors and strengthen abilities to prevent the development of one or more conditions, while mental health promotion interventions aim to promote psychological wellbeing, increase the ability to achieve developmental milestones, strengthen abilities to adapt to adversity and build resilience and competence.19
The major predictor of SWB is by far personality, specially neuroticism (strongly correlated with negative affect and moderately inversely correlated with life satisfaction), and extraversion (moderately correlated with positive affect).20 Thus, interventions that were able to accurately “modulate” our personality would potentially increase our SWB quite permanently. However, while personality does change with time,21 interventions that modulate the personality effectively are still scarce.22
Another kind of interventions that could increase our SWB would be the improvement of our circumstances, e.g., a salary increase.23 However, circumstances cannot be always improved, and when they do, their effect on SWB fades with time. Few months after being fired or promoted, our SWB returns mostly to normal.24 We use to think that an achievement will make us very happy, or a misfortune very unhappy, yet when it happens we realize that our prediction was wrong, that our happiness has changed less than expected.25 To integrate this evidence, Headey and Wearing26 proposed, in their dynamic equilibrium model, that people have levels of SWB determined by their personality, and that changes in circumstances produce increases or decreases in SWB, but over time individuals tend to return to their baseline SWB. That said, this adaptation to circumstances is not total, e.g., richer people are still slightly happier than poorer people.27
Beyond changes in personality and circumstances, various intervention frameworks have investigated whether there are other ways to increase individual SWB. Unfortunately, these interventions have been traditionally disintegrated in different disciplines, which we may be globally divide in those within the umbrella of positive psychology,28 and those with other global aims but with interventions that still may increase SWB, such as mindfulness,29 physical activity30 or even diet.31 Positive psychology is a branch of psychology that instead of being focused on treating mental disorders, it is focused on the improvement of SWB and other valued subjective experiences such as optimism, as well as on positive traits (e.g., the capacity for love) and virtues (e.g., altruism).32 For improving SWB, positive psychologists have created a series of varied interventions like conducting acts of kindness (e.g., prosocial purchases), thinking about positive experiences, practicing gratitude, cultivating sacred moments or savoring the moment, to cite some.33
Unfortunately, while great synthesis work exists within the field of PPI, to our knowledge there are no wider, umbrella syntheses that combine both PPI and other interventions, while an intervention could increase the SWB independently of whether it belongs to the positive psychology or not. In addition, many meta-analyses did not assess the efficacy of these interventions separately for positive affect, negative affect, and life satisfaction.33
We present here an umbrella review of the systematic reviews and meta-analyses of randomized controlled trials (RCT) that have assessed the efficacy of any kind of interventions in increasing SWB, as compared to control conditions, in the general population. The greatest advantage of umbrella reviews is that they summarize and systematically assess and grade the existing evidence on a specific topic only including the highest level of evidence, namely other systematic reviews and meta-analyses.34 This systematic integration of evidence from multiple meta-analyses is necessary35,36 because when there are many types of intervention to choose from, a meta-analysis typically assesses only one type of intervention (e.g., positive psychology), and because different meta-analyses use different methods so that two meta-analyses on the same intervention may reach different conclusions even when published within the same year. An integrated, systematic, umbrella review is thus necessary to provide an objective picture of the wide range of interventions from different disciplines that might potentially increase SWB.
The aim of this review was thus to synthetize the evidence of the different interventions that might improve the SWB.
Materials and methodsWe pre-registered the study protocol with the International Prospective Register of Systematic Reviews (PROSPERO; CRD42020111681) and we completed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist37 (tables available upon request).
Search strategy and eligibility criteriaWe searched PubMed, Web of Science and Scopus from inception to August 20, 2018 for systematic reviews and meta-analyses of RCT of any kind of interventions assessing the increase of any of the following measures of SWB: positive affect, negative affect, and life satisfaction. The search terms were [(“systematic review” or “meta-analysis”) and “subjective well-being”], and we did not restrict them to appear in a specific section (e.g., in the title). With the aim of conducting a balanced umbrella review of both PPI and other interventions of any nature, we did not use search terms that would detect only interventions of a specific discipline (e.g., “optimism” for PPI).
The inclusion criteria for the individual RCT were: (a) they were published in peer-reviewed journals; (b) they were conducted in non-clinical populations; (c) they assessed the efficacy of interventions in increasing positive affect, negative affect or life satisfaction; and (d) the effects were compared to control groups (e.g., waiting list or psychological placebo interventions). Conversely, we excluded non-systematic reviews, studies not using a control group, studies not randomizing individuals to the control and intervention arms, and studies on patients or caregivers. We did not impose language restrictions. Two investigators performed the search independently (either AAE, AS and/or JR; we distributed the work in different combinations of peers), and we resolved disagreements by consensus.
We initially intended to include pooled measures of SWB, but we later decided to exclude them because they were too heterogeneous. For example, Lyubomirsky et al.38 created a composite of life satisfaction, happiness, pleasant affect and unpleasant affect by averaging their z-scores (after reverse-coding unpleasant affect), Page et al.39 summed life satisfaction and positive affect and subtracted negative affect, or King et al.,40 Sheldon et al.,41 Aknin et al.42 and Donnelly et al.43 subtracted negative affect from positive affect (i.e., the affect balance). Similarly, we initially also intended to include measures of subjective “happiness”, but this outcome was discarded during the review process because the multiple meanings of this word might lead to confusion. In any case, we still include these studies if they reported separate scores for positive affect, negative affect, and/or life satisfaction.
Data extraction and selectionWe used a systematic approach to extract and select the data. First, we checked the inclusion criteria for each systematic review or meta-analysis. Second, we checked the inclusion criteria for each individual study in the included systematic reviews and meta-analyses. Third, we extracted the following data (from the systematic review or meta-analysis or, when not reported, from the individual RCT): reference, type of intervention and control group, time of assessment (e.g., 1-month follow-up), specific population under investigation, number of participants in each group, age, measure of SWB (positive affect, negative affect, or life satisfaction), assessment instrument (e.g., Positive and Negative Affect Schedule14), effect size (Hedges’ g) of the comparison between intervention and control groups and the corresponding 95% confidence interval or data to estimate them, and any potential study limitation (e.g., unclear blinding or substantial loss to follow-up). Fourth, we rated the quality of the systematic review or meta-analysis using the Assessment of Multiple Systematic Reviews (AMSTAR 2) tool,44 and assessed the risk of bias of the studies included using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE).45 Two investigators (either AAE, AS and/or JR; we distributed the work in different combinations of peers) conducted these steps independently, and we resolved disagreements by consensus.
If the RCT had used more than one control condition, we preferred the more emotionally neutral one and we discarded the others. For example, Froh et al.46 compared a group conducting a gratitude intervention with two groups: one group instructed to recall hassles and burdens, and one group instructed to only complete the measurements; we preferred the latter and discarded the hassles condition group. If the RCT reported more than one effect size (e.g., at different follow-up times), we averaged them. For example, Froh et al.46 reported two measurements of negative affect after their gratitude vs. control intervention: one at post-test and one at 3-week follow-up; we calculated the standardized mean difference between groups at each time point (post-test: 0.09, follow-up: 0.14), and then we averaged them (0.11). This averaged effect size simply represents the mean effect size, it is not the effect size of a combination of the different measurements. To understand the meaning of this averaged effect size, it is important to observe that differences in the reported effect sizes are not only due to potential differences between post-test and 3-week follow-up but also due to measurement error. In other words, even if there were no true differences between post-test and 3-week follow-up, the effect sizes would be possibly different just by chance. The average of the two effect sizes, thus, may represent the effect size at a time-point between post-test and 3-week follow-up but, also, it represents an effect size with lower measurement error. If the sample sizes used to estimate the two reported effect sizes differed slightly (e.g., due to few loss-to-follow-up before the second measurement), we selected the largest. We preferred the comparison of the changes in measures of SWB from pre- to post-intervention (or the slope in linear models), but we extracted the comparison of post-intervention measures of SWB in those studies that did not report other data. For the latter studies, we used the effect size of the comparison between post-intervention scores. If a manuscript reported more than one SWB measure (e.g., life satisfaction and positive affect), we analyzed them separately.
Statistical analyses and grading of the evidenceFor each included systematic review (or meta-analysis) and measure, we conducted a random-effects meta-analysis to estimate the summary effect size of the comparison between groups, its 95% confidence interval, and the between-study heterogeneity I2 statistic (values >50% might indicate high heterogeneity, and values >75% very high).47 The use of a random-effects model with its estimation of heterogeneity accommodates potential differences between studies, e.g., related to the use of different scales. We also conducted a random-effects meta-regression by the standard error to detect potential publication bias in small studies. We conducted the calculations with in-house umbrella-review scripts for R48 and the packages “meta”49 and “metansue”.50 The latter allows to include unbiasedly studies reporting “non-statistically significant results” but not reporting any statistic; note that excluding these studies would inflate the effect size, whereas including them assuming a null effect size would cause a bias toward zero.51 When reporting the results, we refer to Hedges’ g<0.3 as small, g=0.3–0.4 as small-to-medium, g=0.4–0.6 as medium, g=0.6–0.7 as medium-to-large, and g>0.7 as large. This classification is based on Cohen's recommendations (0.2: small; 0.5: medium; 0.8: large).52
We graded the evidence of the efficacy of each intervention in improving each measure according to the GRADE guidelines.53 Specifically, we assessed the risk of bias, inconsistency, indirectness, imprecision, and publication bias, and derived the quality of the evidence from these assessments. To assess the risk of bias we looked for limitations of the included studies such as lack of blinding, significant loss of follow-up or lack of intention-to-treat analysis. Note that we only included trials that had randomized the participants (or clusters) to different arms, thus minimizing other sources of bias such as lack of allocation concealment, for what we did not consider any study to have very serious limitations. We assessed inconsistency with the I2 heterogeneity statistic complemented with a description of the proportion of studies reporting contradictory findings. For example, we only rated very serious inconsistency when heterogeneity was very high plus the findings were contradictory (i.e., at least >10% studies showing results in the opposite direction from that of most studies). When there was only one study, we could note rate inconsistency, but we conservatively scored the quality of the evidence as if there was serious inconsistency. Indirect evidence would refer to that from trials conducted in special population groups or from trials that had not measured SWB but surrogates thereof. However, we had decided not to include any special population group or any surrogate of SWB. We had only included direct measures of SWB (positive affect, negative affect, and life satisfaction) in the general population. Meta-analyses were rated as imprecise if the 95% confidence interval included both null and large effects (or could include if unknown), or if they did not meet the optimal information size (i.e., the sample size required to detect small or medium effects with 80% statistical power). Specifically, we rated serious imprecision if the confidence interval included both null and large effects (or could include if unknown), or if the overall number of individuals in each arm was inferior to 394. This number corresponds to the sample size required to detect small effects (g=0.2) with 80% statistical power according to standard formula (R function “power.t.test”). We rated very serious imprecision if the overall number of individuals in each arm was inferior to 394 and the confidence interval included both null and large effects, or if the overall number of individuals in each arm was inferior to 64. This number corresponds to the sample size required to detect medium effects (g=0.5) with 80% statistical power. Finally, we rated likely potential publication bias when the corresponding test when the Egger test p-value <0.1. Again, when there were too few studies to conduct the test, we conservatively scored the quality of the evidence as if there was likely publication bias.
To summarize and grade the quality of the evidence,53 we first gave each intervention four pluses, and subsequently we subtracted one plus if there were serious limitations, one plus if there was serious inconsistency or this could not be assessed, two pluses if there was very serious inconsistency, one plus if there was serious imprecision, two pluses if there was very serious imprecision, and one plus if there was likely publication bias or this could not be assessed. If the final score was lower than one plus, we gave the intervention a final score of one plus.
We would have considered interventions with four pluses to have high-quality evidence (i.e., we would be very confident that the true effect is approximately the effect reported here).53 We considered interventions with three pluses to have moderate-quality evidence (the true effect is likely to be like the effect reported here, but there is a possibility that it is substantially different). We considered interventions with two pluses to have low-quality evidence (the effect may be substantially different from the effect reported here). Finally, we considered interventions with one plus to have very low-quality evidence (the true effect is likely to be substantially different from the effect reported here).
For the sake of completeness, we also report an analysis of all PPI combined, an analysis of all other interventions combined, and an analysis of all interventions (PPI and other).
ResultsThe initial search yielded 132 manuscripts. Of these, we discarded 51 because they were in clinical populations or their caregivers, 34 because they did not include RCT, 21 because they were not systematic reviews or meta-analyses, 15 because they did not report separate measures of SWB, and three because they used duplicated datasets (Fig. 1). Some of these excluded manuscripts met more than one exclusion criterion. We finally included eight systematic reviews or meta-analyses. Their AMSTAR 2 score ranged from 6.5 to 14.5. They contained 136 RCT but we excluded 17 because they were in clinical populations or their caregivers, six because the randomization was unclear, 44 because they did not report separate measures of SWB, 12 because they had not been published in a scientific journal, and 12 because they were old studies without the required data to derive effect sizes (Fig. 1 and Supplement for details). We therefore included 45 RCT, of which one54 had been included in two reviews.33,55
With few exceptions,56–60 all the RCT that had assessed differences in positive or negative affect used the Positive and Negative Affect Schedule (PANAS)14,61 to ask for these affects currently,62–64 in the past day,46 week,65,66 2 weeks67 or few weeks.68–72 Burton et al.,56 Spence et al.,57 Layous et al.,58 Aknin et al.,59 and Martela et al.60 asked the participants the degree to which they felt each of a number of adjectives (e.g., joyful, pleased or upset), analog to the PANAS. Finally, the assessment instruments used for measuring life satisfaction included the Satisfaction with Life Scale (SWLS),15 the Life Satisfaction Index A (LSIA)73 and the Brief Multidimensional Students’ Life Satisfaction Scale (BMSLSS).74
We grouped the interventions as they were presented in the included systematic reviews of meta-analyses. For this reason, we first present a meta-analysis of the overall efficacy of PPI,33 and subsequently we present separate meta-analyses of the efficacy of PPI consisting in conducting acts of kindness55 and the efficacy of PPI consisting in practicing gratitude.75 Similarly, we divide the interventions in the meta-analysis by Okun et al.76 according to the groups they did: control enhancement, psychoeducational, social activity and miscellaneous.
Positive psychologyThe meta-analysis by Bolier et al.33 of the overall efficacy of PPI included 39 RCT, of which 17 met our inclusion criteria.54,56,57,62–68,77–83 The interventions included were rather diverse: best possible self, positive writing, solution-focused coaching, life coaching and attainment of goals, writing about positive experiences, thinking about positive life experiences, doing acts of kindness, an optimism and gratitude exercise, practicing gratitude by counting one's blessings, using own strengths in a new way, savoring the moment, and cultivating sacred moments. Participants in the control groups maintained their lifestyle (e.g., waiting list) or conducted psychological placebo interventions. We found moderate-quality evidence that PPI might induce small decreases of negative affect, and low-quality evidence that they might induce medium increases of positive affect (Table 1). The quality of the evidence for potential increases in life satisfaction was very low.
GRADE evidence profile for interventions aimed to increase individual subjective wellbeing (SWB), separately for each systematic review or meta-analysis.
Quality assessment | Summary of findings | Quality | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Measure (N of RCT) | Limitations | Inconsistency | Indirectness | Imprecision | Publication bias | Number of participants | Hedges’ g (95% CI) | ||||
Intervention | Control | ||||||||||
Positive psychology interventions | Overall | PA (12) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Serious inconsistency (I2=76%, but only ∼8% Hedges’ g<0) | No serious indirectness | No serious imprecision | Undetected | 652 | 581 | 0.44 (0.19, 0.68) | ⊕⊕((Low |
NA (11) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=0%) | No serious indirectness | No serious imprecision | Undetected | 572 | 544 | −0.19 (−0.31, −0.06) | ⊕⊕⊕(Moderate | ||
LS (8) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=20%) | No serious indirectness | Serious imprecision (small overall sample size) | Suspected (funnel plot asymmetry) | 385 | 327 | 0.22 (0.05, 0.39) | ⊕(((Very low | ||
Conducting acts of kindness | PA (4) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Very serious inconsistency (I2=81%, with 25% Hedges’ g<0) | No serious indirectness | Very serious imprecision (small overall sample size large and effect not excluded) | Suspected (funnel plot asymmetry) | 355 | 362 | n.s. (−0.05, 0.77) | ⊕(((Very low | |
NA (1) | Serious limitations (unclear blinding and losses to follow-up analyzed per protocol) | – | No serious indirectness | Very serious imprecision (very small overall sample size and large effect not excluded) | – | 34 | 42 | n.s. (−0.87, 0.04) | ⊕(((Very low | ||
LS (5) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=36%) | No serious indirectness | No serious imprecision | Suspected (funnel plot asymmetry) | 394 | 402 | 0.23 (0.02, 0.43) | ⊕⊕((Low | ||
Practicing gratitude | PA (3) | Serious limitations (unclear blinding) | Serious inconsistency (I2=54%) | No serious indirectness | Serious imprecision (small overall sample size) | Undetected | 164 | 148 | n.s. (−0.14, 0.57) | ⊕(((Very low | |
NA (3) | Serious limitations (unclear blinding) | No serious inconsistency (I2=0%) | No serious indirectness | Serious imprecision (small overall sample size) | Undetected | 164 | 148 | n.s. (−0.09, 0.36) | ⊕⊕((Low | ||
LS (2) | Serious limitations (unclear blinding) | No serious inconsistency (I2=7%) | No serious indirectness | Serious imprecision (small overall sample size) | – | 120 | 103 | n.s. (−0.46, 0.26) | ⊕(((Very low | ||
Other interventions | Yoga | PA (1) | Serious limitations (unclear blinding) | – | No serious indirectness | Very serious imprecision (very small overall sample size) | – | 36 | 15 | n.s. (−0.54, 0.66) | ⊕(((Very low |
NA (1) | Serious limitations (unclear blinding) | – | No serious indirectness | Very serious imprecision (very small overall sample size and large effect not excluded) | – | 36 | 15 | n.s. (−0.87, 0.34) | ⊕(((Very low | ||
Resilience training | PA (1) | Serious limitations (unclear blinding) | – | No serious indirectness | Very serious imprecision (very small overall sample size and large effect not excluded) | – | 25 | 25 | n.s. (−0.33, 0.79) | ⊕(((Very low | |
NA (1) | Serious limitations (unclear blinding) | – | No serious indirectness | Very serious imprecision (very small overall sample size and large effect not excluded) | – | 25 | 25 | n.s. (−0.72, 0.39) | ⊕(((Very low | ||
Physical activity | LS (1) | Serious limitations (unclear blinding) | – | No serious indirectness | Very serious imprecision (small overall sample size and large effect not excluded) | – | 85 | 89 | n.s. (−?, ?) | ⊕(((Very low | |
Leisure | LS (1) | Serious limitations (unclear blinding and losses to follow-up analyzed per protocol) | – | No serious indirectness | Very serious imprecision (very small overall sample size) | – | 13 | 15 | 1.43 (0.58, 2.27) | ⊕(((Very low | |
Control enhancement | LS (2) | Serious limitations (unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=36%) | No serious indirectness | Very serious imprecision (very small overall sample size) | – | 34 | 34 | 0.788 (0.15, 1.43) | ⊕(((Very low | |
Psychoeducational | LS (4) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Very serious inconsistency (I2=88%, with ∼25% Hedges’ g<0) | No serious indirectness | Very serious imprecision (small overall sample size and large effect not excluded) | Undetected | 81 | 81 | n.s. (−0.65, 1.28) | ⊕(((Very low | |
Miscellaneous (part-time work) | LS (1) | Serious limitations (unclear blinding and losses to follow-up analyzed per protocol) | – | No serious indirectness | Very serious imprecision (very small overall sample size and large effect not excluded) | – | 23 | 23 | n.s. (−0.03, 1.15) | ⊕(((Very low |
LS: life satisfaction; NA: negative affect; n.s.: non-statistically significant; PA: positive affect; RCT: randomized controlled trials
The meta-analysis by Curry et al.55 of the efficacy of PPI consisting in conducting acts of kindness included 26 RCT, of which 12 met our inclusion criteria.43,54,58–60,84–90 The interventions included prosocial purchases, social recycling, benevolence, and other acts of kindness. Participants in the control groups maintained their lifestyle or conducted psychological placebo interventions. We found low-quality evidence that acts of kindness might induce small to medium increases in life satisfaction (Table 1). The quality of the evidence for potential changes of positive or negative affect was very low.
The meta-analysis by Renshaw and Steeves75 of the efficacy of PPI consisting in practicing gratitude in youth included six RCT, of which three met our inclusion criteria.46,69,72 Participants in the control groups maintained their lifestyle or conducted psychological placebo interventions. We found low-quality evidence for a potential non-statistically significant change of life satisfaction with gratitude interventions. The quality of the evidence for potential changes of positive or negative affect was very low.
Finally, when we combined all PPI from these meta-analyses, the results were similar to those of the meta-analysis of the overall efficacy of PPI.33 We found moderate-quality evidence that PPI might induce small decreases of negative affect, and low-quality evidence that they might induce small increases of life satisfaction (Table 2). The quality of the evidence for potential increases in positive affect was very low.
GRADE evidence profile for interventions aimed to increase individual subjective wellbeing (SWB), combining all systematic reviews and meta-analyses.
Quality assessment | Summary of findings | Quality | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Measure (N of RCT) | Limitations | Inconsistency | Indirectness | Imprecision | Publication bias | Number of participants | Hedges’ g(95% CI) | |||
Intervention | Control | |||||||||
Positive psychology interventions | PA (19) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Very serious inconsistency (I2=75%, with ∼11% Hedges’ g<0) | No serious indirectness | No serious imprecision | Suspected (funnel plot asymmetry) | 1170 | 1092 | 0.38 (0.20, 0.56) | ⊕(((Very low |
NA (15) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=17%) | No serious indirectness | No serious imprecision | Undetected | 770 | 734 | −0.12 (−0.24, −0.002) | ⊕⊕⊕(Moderate | |
LS (14) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=0%) | No serious indirectness | No serious imprecision | Suspected (funnel plot asymmetry) | 871 | 804 | 0.14 (0.04, 0.23) | ⊕⊕((Low | |
Other interventions | PA (2) | Serious limitations (unclear blinding) | No serious inconsistency (I2=0%) | No serious indirectness | Very serious imprecision (very small overall sample size) | – | 61 | 40 | n.s. (−0.26, 0.56) | ⊕(((Very low |
NA (2) | Serious limitations (unclear blinding) | No serious inconsistency (I2=0%) | No serious indirectness | Very serious imprecision (very small overall sample size) | – | 61 | 40 | n.s. (−0.62, 0.20) | ⊕(((Very low | |
LS (9) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Very serious inconsistency (I2=79%, but only 25% Hedges’ g<0) | No serious indirectness | Very serious imprecision (small overall sample size and large effect not excluded) | Undetected | 236 | 242 | n.s. (−0.02, 0.85) | ⊕(((Very low | |
All interventions | PA (21) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Serious inconsistency (I2=72%) | No serious indirectness | No serious imprecision | Undetected | 1231 | 1132 | 0.36 (0.20, 0.53) | ⊕⊕((Low |
NA (17) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | No serious inconsistency (I2=11%) | No serious indirectness | No serious imprecision | Undetected | 831 | 774 | −0.13 (−0.24, −0.02) | ⊕⊕⊕(Moderate | |
LS (23) | Serious limitations (some unclear blinding and losses to follow-up analyzed per protocol) | Serious inconsistency (I2=70%) | No serious indirectness | No serious imprecision | Suspected (funnel plot asymmetry) | 1107 | 1046 | 0.27 (0.09, 0.44) | ⊕(((Very low |
LS: life satisfaction; NA: negative affect; n.s.: non-statistically significant; PA: positive affect; RCT: randomized controlled trials.
Yoga: The systematic review by Mansfield et al.91 of the efficacy of sport and dance participation among healthy young people included 11 RCT, of which one met our inclusion criteria.70 It was about the efficacy of yoga, and participants in the control group conducted usual physical education. The yoga program consisted of physical postures, breathing exercises, relaxation, and meditation for 30min, two or three times a week for ten weeks.70 The quality of the evidence for potential changes of positive or negative affect with yoga was very low (Table 1).
Resilience training: The systematic review by Robertson et al.92 of the efficacy of work-based resilience training included eight RCT, of which one met our inclusion criteria.71 Participants in the control group maintained their lifestyle (waiting list). The quality of the evidence for potential changes of positive or negative affect with resilience training was very low (Table 1).
Physical activity: The systematic review by Zhang and Chen93 of the efficacy of physical activity interventions included six RCT, of which two met our inclusion criteria.94,95 They were in elder individuals, and participants in the control groups did moderate or stretching and toning exercise. The quality of the evidence for potential changes in life satisfaction with physical activity was very low (Table 1).
Leisure: The meta-analysis by Kuykendall et al.96 of the efficacy of leisure interventions included six RCT, of which one met our inclusion criteria.97 It was in elder individuals, and participants in the control group maintained their lifestyle. Leisure interventions consisted of a variety of activities such as discussion exercises, paper and pencil exercises, role playing, and recreation activity participation.97 The quality of the evidence for a potential increase in life satisfaction with leisure was very low (Table 1).
Control enhancement, psychoeducation, and part-time work: The meta-analysis by Okun et al.76 of the efficacy of several interventions for elders included 34 RCT, grouped in control-enhancement, psychoeducational, social activity, and miscellaneous interventions. We could include three RCT about the efficacy of control enhancement interventions (e.g., offering the responsibility for taking care of bird feeders, education about stress management, nutritional awareness, immediate environment, self-responsibility, physical fitness and spirituality),98–100 four about the efficacy of psychoeducational interventions (e.g., increasing the knowledge and skills)101–104 and one in the miscellaneous category (part-time work).105 Participants in the control groups maintained their lifestyle or conducted psychological placebo interventions. The quality of the evidence for a potential increase in life satisfaction with control enhancement interventions, or potential changes of life satisfaction with psychoeducational or part-time work interventions, was very low (Table 1).
When we combined all interventions from these meta-analyses, the quality of the evidence for potential changes in positive affect, negative affect or life satisfaction was very low (Table 2).
All interventions combinedWhen we combined all PPI and other interventions, we found moderate-quality evidence that they might induce small decreases of negative affect, and low-quality evidence that they might induce small-to-medium increases of positive affect (Table 2). The quality of the evidence for potential increases in life satisfaction was very low.
DiscussionThis umbrella review systematically assessed the evidence of the efficacy of any kind of interventions in increasing individual SWB, including both PPI and other interventions. We aimed to provide the picture of the wide range of interventions from different disciplines that might potentially increase SWB, and thus we included any intervention as far as the inclusion criteria were met.
The main finding was that there is low- to moderate-quality evidence that PPI might increase positive affect and decrease negative affect. The larger effect size was the increase in positive affect (Hedges’ g∼0.4), but the quality of its evidence was low due to serious inconsistency (i.e., heterogeneity between studies). Indeed, effect size ranged from very large increases to even a (small) decrease depending on the RCT. The decrease in negative affect associated to PPI was more consistent but also smaller (Hedges’ g∼−0.2).
A potential source of the heterogeneity in the increase in positive affect may be related to the different types of PPI, although this was not apparent when looking at the specific PPI associated to different effect sizes. Specifically, PPI associated to very large increases of positive affect (Hedges’ g 0.9–1.3) consisted of conducting acts of kindness, of writing about an intensely positive experience, of solution-focused life coaching group program, and of writing about oneself in the future imagining that everything has gone as well as it possibly could.56,59,63,68 PPI associated to moderate increases of positive affect (Hedges’ g 0.3–0.7) consisted again of conducting acts of kindness, of other solution-focused life coaching programs, of similar writing about oneself in the future imaging that everything has gone as well as it possibly could, of education about the appraisal of benefit exchanges, and of writing things for which one could feel grateful.57,60,64,66,72,77,83,84 Finally, PPI associated to week or null increases of positive affect (Hedges’ g 0.2 or lower) consisted once again of conducting acts of kindness, once again of writing about oneself in the future imaging that everything has gone as well as it possibly could, again of writing things for which one could feel grateful, of savoring the moment (e.g., smiling while enjoying time well spent with friends), of writing about happy experiences, and of practicing one's character strengths (e.g., creativity, social intelligence or humor.46,58,62,65,67,69,80 That said, two additional meta-analyses showed different effect sizes on subjective wellbeing for two of these types of PPI. We found low-quality evidence that PPI consisting in conducting acts of kindness might increase life satisfaction, whereas we did not find statistically significant effects for PPI consisting in practicing gratitude. A possible beneficial effect of conducting acts kindness on SWB might be associated with the increased social support as opposed to loneliness.106 Other factors such as the duration of the PPI might play a role in the heterogeneity. In their meta-analysis of PPI, Bolier et al.33 did not find statistically significant moderator effects for this, but those analyses may have been under-powered. Future evidence syntheses may provide larger evidence databases that would allow a better-powered exploration of the sources of heterogeneity.
Besides the inconsistency, there are some other reasons why the quality of the evidence of the PPI included in this umbrella review is relatively low. First, most studies had unclear blinding, which may be difficult to implement in these settings. Second, most studies used a per-protocol analysis instead of conducting intention-to-treat analyses, a study limitation that may exaggerate the effects, may introduce bias, and it may even violate the principle of randomization.107
The quality of the evidence of the studies assessing other interventions (e.g., leisure, physical activity, or yoga) was in general very low, especially due to a lack of large overall sample sizes. However, great caution is warranted until the overall sample sizes are larger and the quality of the evidence higher. Indeed, some interventions only included one study. Future studies might provide evidence that these interventions do increase the SWB. At this regard, we want to highlight the important differences between quality of the evidence and effect size. The quality of the evidence helps us know how much we can trust in the overall results, independently of whether the results are that the interventions are efficacious or not. Therefore, we may have high-quality evidence that an intervention has no effect, or we may have low-quality evidence that an intervention has huge beneficial effects. In addition, here we only studied the effects on SWB. It is well-known, for example, that physical activity has other important positive outcomes.108
Most individuals from highly developed countries report that their SWB is in the upper range, e.g., that they already feel very satisfied with life.109 This may result in a ceiling effect because an intervention will hardly increase the life satisfaction of an individual who already feels very satisfied with life. Therefore, it is possible that the effect size of the interventions would be larger in individuals who reported that their SWB is in the middle or lower ranges. One observation supporting this possibility is the finding that the efficacy of antidepressants is higher in patients with more severe depression,110 although this observation has been questioned.111 A related issue is that one could even question whether it is sensible to try to increase SWB in individuals with already high SWB. When comparing individuals with high and very high SWB, the latter are more successful in close relationships and volunteer work, but less successful in income, education, and political participation.112
We would like to highlight some limitations of our review. First, umbrella reviews do not include those studies that have not been included in a systematic review or meta-analysis. The possibility that there are no systematic reviews or meta-analyses on the effects of a given intervention seems unlikely, because meta-analyses are nowadays performed massively.113 However, published systematic reviews and meta-analyses might not include recent trials. Second, our search strategy was designed to find interventions from any discipline, but it was probably sub-optimal for finding interventions of a specific discipline. For example, we probably failed to include some meta-analyses of PPI that we could possibly have identified using positive psychology terms such as “optimism” or “gratitude”. However, we considered that including additional positive psychology terms would bias the inclusion of studies toward PPI to the detriment of other interventions. We refer the reader to a recently published synthesis in the field of positive psychology28 that may have included some works that we failed to include. We would also want to note that for any kind of intervention for which we were able to find a meta-analysis, we should have theoretically been able to include any RCT published until the date of the systematic search of the meta-analysis. Third, for studies that did not report pre-intervention scores or pre-post statistics, we used the effect size of the comparison between post-intervention scores. While suboptimal, this is equivalent to the effect size of the comparison between pre-post score differences under the general assumptions that pre-intervention mean scores are similar between the two groups, variances are similar, and pre-post correlation is about 0.5. Fourth, when the study reported several post-intervention and follow-up effect sizes, we averaged them. We considered that follow-up information is very relevant, given that we understand that the interest of these interventions is that the increase in SWB lasts a time, and thus we even considered using only the last effect size. However, effect size usually decreases with follow-up, and it would have been unfair to include the last, usually smaller effect size from studies conducting a long follow-up, while the initial, usually larger effect size from studies not conducting any follow-up after the intervention. To balance the situation, we preferred a medium consisting on averaging the effect sizes of the different follow-up points. In addition, as we noted earlier, such average has lower measurement error. Fifth, we focused on the common assessments of SWB, but other assessments are possible and may have indeed some advantages. For example, experience-sampling methods or ecological momentary assessments ask the participants their SWB at random moments of their everyday lives, potentially circumventing memory biases. Sixth, even when measuring the same components of SWB (positive affect, negative affect, and life satisfaction), the studies used different scales, which may have introduced heterogeneity in the analysis. Seventh, PPI were rather varied, a factor that may have increased heterogeneity. Finally, as we noted earlier, some other interventions included only one RCT and should be thus taken with more caution.
Future umbrella syntheses with more studies may be able to better model the heterogeneity between interventions. This may be achieved by stratifying the interventions, or more sophisticatedly, by conducting meta-regressions that model the relevant characteristics of each intervention. Additionally, there have been few studies in population subgroups scoring low in specific SWB components but not in others (which should benefit more from these interventions),114 and little research on potential moderators.115,116 The study of subgroups and moderators when there are enough data may be of great interest, as it is entirely plausible that an intervention works for an individual but not for another. Finally, given the importance of personality for SWB, the creation of interventions that modulate the personality (e.g., decreasing neuroticism and increasing extraversion) could be also promising.
In conclusion, despite its limitations, this umbrella review shows that there is moderate-quality evidence that PPI increase SWB. Conversely, the evidence for other interventions (e.g., yoga, physical activity, or leisure) is still very low.
Ack****nowledgementsThis work was supported by Miguel Servet Research Contract (CP14/00041 and CPII19/00009) to J.R., PFIS Predoctoral Contract FI16/00311 to A.A.E. and Research Projects PI14/00292 and PI19/00394 from the Plan Nacional de I+D+i 2013–2016, the Instituto de Salud Carlos III-Subdirección General de Evaluación y Fomento de la Investigación and the European Regional Development Fund (FEDER, ‘Investing in your future’). The funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Conflict of interestThe authors have no interests to declare in relation to this manuscript.mmc