Elsevier

Health & Place

Volume 35, September 2015, Pages 136-146
Health & Place

Exploring the forest instead of the trees: An innovative method for defining obesogenic and obesoprotective environments

https://doi.org/10.1016/j.healthplace.2015.08.002Get rights and content

Highlights

  • Obesogenic environments (OGE) consist of a diverse cluster of spatially co-occurring risk factors.

  • We used a machine learning algorithm to identify the risk factors that matter most for childhood obesity.

  • OGEs are best described by a diverse set of social, physical activity, and food features.

  • Social features are of particular importance.

  • Obesogenic and obesoprotective environments are best defined by different sets of risk factors.

Abstract

Past research has assessed the association of single community characteristics with obesity, ignoring the spatial co-occurrence of multiple community-level risk factors. We used conditional random forests (CRF), a non-parametric machine learning approach to identify the combination of community features that are most important for the prediction of obesegenic and obesoprotective environments for children. After examining 44 community characteristics, we identified 13 features of the social, food, and physical activity environment that in combination correctly classified 67% of communities as obesoprotective or obesogenic using mean BMI-z as a surrogate. Social environment characteristics emerged as most important classifiers and might provide leverage for intervention. CRF allows consideration of the neighborhood as a system of risk factors.

Introduction

The concept of the “obesogenic environment” was first proposed in the late 1990’s (Hill and Peters, 1998, Poston and Foreyt, 1999, Swinburn et al., 1999) as a framework for understanding the joint impact of multiple dimensions of place on obesity risk. Through their physical, institutional, or social features, obesogenic environments impede healthy energy balance-related behaviors by promoting inactivity and excess caloric intake. Since the concept was proposed, a rich body of research has linked numerous environmental characteristics with obesity in a variety of populations.

Multilevel studies have shown associations of several features of the built environment such as land use mix and population density (Frank et al., 2007, Franzini et al., 2009, Rundle et al., 2009, Schwartz et al., 2011b), as well as food establishments (Casey et al., 2008, Cummins and Macintyre, 2006, Drewnowski, 2004, Fleischhacker et al., 2011, Franco et al., 2008, Fraser et al., 2012, Giskes et al., 2011, Inagami et al., 2006, Lake and Townshend, 2006, Mehta and Chang, 2008, Michimi and Wimberly, 2010, Morland et al., 2006) and physical activity features (Gordon-Larsen et al., 2006, Kipke et al., 2007) with body composition and obesity at the individual level. Some studies control for social and economic characteristics as potential confounders (Meyer et al., 2015). We followed the socio-ecological literature and conceptualized and modeled social and economic community characteristics as key features of the risk landscape for physical inactivity and caloric over-consumption as well as risk-regulators that influence the likelihood of exposure to other obesogenic features of environments (Block et al., 2004, Greves Grow et al., 2010, Janssen et al., 2006, Larson et al., 2009, Nau et al., 2015).

Despite the recognition that obesogenic environments represent a diverse cluster of spatially co-occurring features, many studies presume that there are separate environments for food, social factors and physical activity-related features; this assumption has not been tested. Furthermore, most studies have used regression analysis to assess the independent effect of each “exposure” in isolation. Individuals, however, experience their community environment as a unified ensemble of features that may act jointly to affect health. A variable-by-variable approach risks what the sociologist, Gordon, called the partialling fallacy (Gordon, 1968). That is, the effect of the obesogenic context cannot be fully determined because it involves multiple variables that measure different dimensions of the same construct. Many studies have found small effects across a wide range of community risk factors when examined in isolation. None captures the totality of the impact of the obesogenic environment because that impact represents what Marini and Singer call a conjunctive plurality of causes that cannot be represented by the independent effects detected in linear additive regression models (Marini and Burton, 1988). Further, many studies adjust for related features of environments that are facets of the spatially co-occurring structures that shape obesogenic environments. This exacerbates the partialling fallacy and biases observed associations toward the null. While theorists posit obesogenic environments as a complex multidimensional construct, standard regression analysis does not permit us to identify the combination of interacting risk factors that render an environment obesogenic. Researchers have begun to measure multiple environmental risk factors using factor analysis and latent class analysis (Adams et al., 2011, Meyer et al., 2015, Wall et al., 2012).

We expand this new body of work by demonstrating an innovative approach that allows us to identify from a large set of theoretically plausible risk factors those community features that are most important for rendering a community obesogenic. Our method allows us to reorient the focus from understanding if a particular risk factor matters to identifying the set of risk factors that matter most. We implement a method called conditional random forests (CRF) (Strobl et al., 2008) to analyze and identify community characteristics that together can predict observed rates of obesity at an ecological level. CRF is a supervised machine learning algorithm that has been used in biomedical research to identify, for example, the set of proteins associated with the presence or absence of a particular cancer (Izmirlian, 2004) or the combination of genetic and dietary factors that jointly increase the risk for the development of Metabolic Syndrome (de Edelenyi et al., 2008). Random Forests (RF) and CRF have also been applied in engineering (Kaur and Malhotra, 2008), geography (Pal, 2005) and ecology. To our knowledge, Basu and Siddiqi (2014) are the only authors to date who have used RF to identify risk environments by identifying features of geographic regions with high mortality.

We define obesogenic and obesoprotective environments for children as communities that fall into the highest or lowest quartile of the community level mean BMI z-score distribution. We use community level BMI-z because obesogenic and obesoprotective environments are ecological constructs that cannot be reduced to individual characteristics. Our approach is ecological, with the strengths and limits that this approach implies. We adopt an ecological approach because our goal is primarily about identification of constellations of community features. Obesogenicity and obesoprotectiveness are much like other community features, such as “walkability” or deprivation - properties of larger aggregate ecologies - not a function of the individuals who reside in them. In this study, we limit the scope of our inferences to relations between community constructs and average BMI among children in a community. We avoid the ecological fallacy by restricting our inferences to the group level (Schwartz, 1994). Further, this stage of the analysis is for proof of concept in a new method. We plan on a subsequent analysis using a multilevel analytic framework to address how CRF can inform variation in individual risk for obesity. CRF considers the coordinated effect of a large set of community features in order to classify communities. The CRF algorithm ranks the classifying variables in terms of their contribution to classification success. It identifies the combination of risk factors that matter most for differentiating obesogenic from obesoprotective communities.

The present study uses data from a large electronic health record of measured height and weight on children geo-coded to a diverse set of 1288 communities in 37 counties in Pennsylvania. We assembled a large dataset of community features from secondary data sources that have been linked to obesity in prior research. This set consists of 44 characteristics that cover multiple domains of community risk for obesity including social factors, food availability, and physical activity-related features including land use characteristics and physical activity establishments.

Using CRF we are able to examine the joint, spatially co-occurring pattern of features that may constitute obesogenic and obesoprotective environments. Results of these analyses allow us to: (1) identify the combination of features that are most important in rendering an environment obesegenic; (2) determine the relative importance of particular environmental features; (3) identify factors that do not improve classification accuracy; (4) characterize environmental features that may be targets for policy intervention.

Section snippets

Data and measures

For the outcome of interest, we used electronic health records from the Geisinger Health System on measured height and weight of children ages 10–18 during 2010 (N=22,497). Data was drawn from a dataset that provides information from the Geisinger Health System from 2001 to 2012 for children ages 0-18. Geisinger is the largest healthcare provider in Pennsylvania, serving patients in 37 counties in central and northeastern Pennsylvania. The study population has been found to be representative of

Results

The analysis was based on prediction of 50 high-obesity and 49 low-obesity communities (communities in the upper and lower quartile of the BMI-z distribution of communities with at least 50 children). The average community BMI-z across all communities was 0.69 (standard deviation (SD)=0.18); the highest quartile had an average BMI z-score of 0.92 (SD=0.1) while the lowest quartile had an average BMI z-score of 0.47 (SD=0.09). Table 2 shows descriptive characteristics of the four quartiles of

Discussion

This study combined a theory-driven approach to candidate variable selection with a data-driven analysis strategy. We believe this is the first study to use conditional random forests (CRF), a machine learning technique, to consider the risk landscape of obesogenic environments as a diverse set of spatially co-occuring risk factors. Our results suggest that environments represent a unified risk topography that is not easily divided along dimensional lines. We identified 13 variables that

Conflict of interest

None.

Acknowledgments

The project described was supported by Grant number U54HD070725 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The project is co-funded by the NICHD and the Office of Behavioral and Social Sciences Research (OBSSR). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NICHD or OBSSR. Dr. Nau was supported by the training core of the Johns Hopkins Global Obesity Prevention Center.

References (70)

  • H.M. Greves Grow et al.

    Child obesity associated with social disadvantage of children’s neighborhoods

    Soc. Sci. Med.

    (2010)
  • D.R. Holtgrave et al.

    Is social capital a protective factor against obesity and diabetes? Findings from an exploratory study

    Ann. Epidemiol.

    (2006)
  • S. Inagami et al.

    You are where you shop: grocery store locations, weight, and neighborhoods

    Am. J. Prev. Med.

    (2006)
  • I. Janssen et al.

    Influence of individual-and area-level measures of socioeconomic status on obesity, unhealthy eating, and physical inactivity in Canadian adolescents

    Am. J. Clin. Nutr.

    (2006)
  • M.D. Kipke et al.

    Food and park environments: neighborhood-level risks for childhood obesity in east Los Angeles

    J. Adolesc. Health

    (2007)
  • P. Kitsantas et al.

    Risk profiles for overweight/obesity among preschoolers

    Early Hum. Dev.

    (2010)
  • A.Y. Liu et al.

    The contextual influence of coal abandoned mine lands in communities and type 2 diabetes in Pennsylvania

    Health Place

    (2013)
  • J. Lopez-Zetina et al.

    The link between obesity and the built environment. Evidence from an ecological analysis of obesity and vehicle miles of travel in California

    Health Place

    (2006)
  • R.J. Marshall

    The use of classification and regression trees in clinical epidemiology

    J. Clin. Epidemiol.

    (2001)
  • N.K. Mehta et al.

    Weight status and restaurant availability a multilevel analysis

    Am. J. Prev. Med.

    (2008)
  • K.A. Meyer et al.

    Combined measure of neighborhood food and physical activity environments and weight-related outcomes: the CARDIA study

    Heath Place

    (2015)
  • K. Morland et al.

    Supermarkets, other food stores, and obesity: the atherosclerosis risk in communities study

    Am. J. Prev. Med.

    (2006)
  • W.S. Poston et al.

    Obesity is an environmental issue

    Atherosclerosis

    (1999)
  • D.D. Reidpath et al.

    An ecological study of the relationship between social and environmental determinants of obesity

    Health Place

    (2002)
  • B.S. Schwartz et al.

    Body mass index and the built and social environments in children and adolescents using electronic health records

    Am. J. Prev. Med.

    (2011)
  • B. Swinburn et al.

    Dissecting obesogenic environments: the development and application of a framework for identifying and prioritizing environmental interventions for obesity

    Prev. Med.

    (1999)
  • M. Tseng et al.

    Is neighbourhood obesogenicity associated with body mass index in women? Application of an obesogenicity index in socioeconomically disadvantaged neighbourhoods

    Health & Place

    (2014)
  • D. Vandegrift et al.

    Obesity rates, income, and suburban sprawl: an analysis of US states

    Health Place

    (2004)
  • M.M. Wall et al.

    Patterns of obesogenic neighborhood features and adolescent weight: a comparison of statistical approaches

    Am. J. Prev. Med.

    (2012)
  • M.A. Adams et al.

    Patterns of neighborhood environment attributes related to physical activity across 11 countries: a latent class analysis

    Int. J. Behav. Nutr. Phys. Act.

    (2013)
  • S. Basu et al.

    Geographic disparities in US mortality: “hot-spotting” large databases

    Epidemiology

    (2014)
  • L. Breiman

    Random forests

    Mach. Learn.

    (2001)
  • L. Breiman

    Statistical modeling: The two cultures

    Stat. Sci.

    (2001)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • CDC, 2014a. Cut-offs to define outliers in the 2000 CDC Growth Charts, in: Control, C.f.D....
  • Cited by (27)

    • Influence of home/school environments on children's obesity, diet, and physical activity: the SUECO study protocol

      2022, Gaceta Sanitaria
      Citation Excerpt :

      Furthermore, accessibility to recreational facilities (e.g., playgrounds), walkability, sports facilities, and active school transportation opportunities are associated with increased physical activity.7,8 Yet, scarce research has simultaneously examined more than one dimension of the obesogenic environment concerning children's obesity.9,10 Also, most supporting evidence comes from the US, Australia, and the UK; however, the structure of the built and the food environment differ across geographic contexts.11–13

    • Systematic review of approaches to use of neighborhood-level risk factors with clinical data to predict clinical risk and recommend interventions

      2021, Journal of Biomedical Informatics
      Citation Excerpt :

      Of the studies to which this applied, five reported how they ensured inclusion in the population for the entire time period [39,40,44,47,51]. Six of the studies did not attempt to control for individual-level SDOH as a confounder by inclusion of data that is commonly found in the EHR in the form of race, ethnicity, or insurance class [36,38,41,44,48,52]. Additionally, several potential biases were identified in the articles.

    • Using electronic health record data for environmental and place based population health research: a systematic review

      2018, Annals of Epidemiology
      Citation Excerpt :

      Most of the built environment and land use articles studied associations with cardiometabolic conditions or obesity (n = 15, 88%). Nearly all were cross-sectional design and studied the impacts of access to physical activity [120–123], walkability [21,23,124–127], or the food environment (e.g., access to healthy foods) [21,123,127–131]. For example, in a cross-sectional study, Fiechtner et al. (2015) [129] found that living less than or equal to 0.5 miles from fast food restaurants was associated with higher BMI Z-scores.

    View all citing articles on Scopus
    View full text