Exploring the forest instead of the trees: An innovative method for defining obesogenic and obesoprotective environments
Introduction
The concept of the “obesogenic environment” was first proposed in the late 1990’s (Hill and Peters, 1998, Poston and Foreyt, 1999, Swinburn et al., 1999) as a framework for understanding the joint impact of multiple dimensions of place on obesity risk. Through their physical, institutional, or social features, obesogenic environments impede healthy energy balance-related behaviors by promoting inactivity and excess caloric intake. Since the concept was proposed, a rich body of research has linked numerous environmental characteristics with obesity in a variety of populations.
Multilevel studies have shown associations of several features of the built environment such as land use mix and population density (Frank et al., 2007, Franzini et al., 2009, Rundle et al., 2009, Schwartz et al., 2011b), as well as food establishments (Casey et al., 2008, Cummins and Macintyre, 2006, Drewnowski, 2004, Fleischhacker et al., 2011, Franco et al., 2008, Fraser et al., 2012, Giskes et al., 2011, Inagami et al., 2006, Lake and Townshend, 2006, Mehta and Chang, 2008, Michimi and Wimberly, 2010, Morland et al., 2006) and physical activity features (Gordon-Larsen et al., 2006, Kipke et al., 2007) with body composition and obesity at the individual level. Some studies control for social and economic characteristics as potential confounders (Meyer et al., 2015). We followed the socio-ecological literature and conceptualized and modeled social and economic community characteristics as key features of the risk landscape for physical inactivity and caloric over-consumption as well as risk-regulators that influence the likelihood of exposure to other obesogenic features of environments (Block et al., 2004, Greves Grow et al., 2010, Janssen et al., 2006, Larson et al., 2009, Nau et al., 2015).
Despite the recognition that obesogenic environments represent a diverse cluster of spatially co-occurring features, many studies presume that there are separate environments for food, social factors and physical activity-related features; this assumption has not been tested. Furthermore, most studies have used regression analysis to assess the independent effect of each “exposure” in isolation. Individuals, however, experience their community environment as a unified ensemble of features that may act jointly to affect health. A variable-by-variable approach risks what the sociologist, Gordon, called the partialling fallacy (Gordon, 1968). That is, the effect of the obesogenic context cannot be fully determined because it involves multiple variables that measure different dimensions of the same construct. Many studies have found small effects across a wide range of community risk factors when examined in isolation. None captures the totality of the impact of the obesogenic environment because that impact represents what Marini and Singer call a conjunctive plurality of causes that cannot be represented by the independent effects detected in linear additive regression models (Marini and Burton, 1988). Further, many studies adjust for related features of environments that are facets of the spatially co-occurring structures that shape obesogenic environments. This exacerbates the partialling fallacy and biases observed associations toward the null. While theorists posit obesogenic environments as a complex multidimensional construct, standard regression analysis does not permit us to identify the combination of interacting risk factors that render an environment obesogenic. Researchers have begun to measure multiple environmental risk factors using factor analysis and latent class analysis (Adams et al., 2011, Meyer et al., 2015, Wall et al., 2012).
We expand this new body of work by demonstrating an innovative approach that allows us to identify from a large set of theoretically plausible risk factors those community features that are most important for rendering a community obesogenic. Our method allows us to reorient the focus from understanding if a particular risk factor matters to identifying the set of risk factors that matter most. We implement a method called conditional random forests (CRF) (Strobl et al., 2008) to analyze and identify community characteristics that together can predict observed rates of obesity at an ecological level. CRF is a supervised machine learning algorithm that has been used in biomedical research to identify, for example, the set of proteins associated with the presence or absence of a particular cancer (Izmirlian, 2004) or the combination of genetic and dietary factors that jointly increase the risk for the development of Metabolic Syndrome (de Edelenyi et al., 2008). Random Forests (RF) and CRF have also been applied in engineering (Kaur and Malhotra, 2008), geography (Pal, 2005) and ecology. To our knowledge, Basu and Siddiqi (2014) are the only authors to date who have used RF to identify risk environments by identifying features of geographic regions with high mortality.
We define obesogenic and obesoprotective environments for children as communities that fall into the highest or lowest quartile of the community level mean BMI z-score distribution. We use community level BMI-z because obesogenic and obesoprotective environments are ecological constructs that cannot be reduced to individual characteristics. Our approach is ecological, with the strengths and limits that this approach implies. We adopt an ecological approach because our goal is primarily about identification of constellations of community features. Obesogenicity and obesoprotectiveness are much like other community features, such as “walkability” or deprivation - properties of larger aggregate ecologies - not a function of the individuals who reside in them. In this study, we limit the scope of our inferences to relations between community constructs and average BMI among children in a community. We avoid the ecological fallacy by restricting our inferences to the group level (Schwartz, 1994). Further, this stage of the analysis is for proof of concept in a new method. We plan on a subsequent analysis using a multilevel analytic framework to address how CRF can inform variation in individual risk for obesity. CRF considers the coordinated effect of a large set of community features in order to classify communities. The CRF algorithm ranks the classifying variables in terms of their contribution to classification success. It identifies the combination of risk factors that matter most for differentiating obesogenic from obesoprotective communities.
The present study uses data from a large electronic health record of measured height and weight on children geo-coded to a diverse set of 1288 communities in 37 counties in Pennsylvania. We assembled a large dataset of community features from secondary data sources that have been linked to obesity in prior research. This set consists of 44 characteristics that cover multiple domains of community risk for obesity including social factors, food availability, and physical activity-related features including land use characteristics and physical activity establishments.
Using CRF we are able to examine the joint, spatially co-occurring pattern of features that may constitute obesogenic and obesoprotective environments. Results of these analyses allow us to: (1) identify the combination of features that are most important in rendering an environment obesegenic; (2) determine the relative importance of particular environmental features; (3) identify factors that do not improve classification accuracy; (4) characterize environmental features that may be targets for policy intervention.
Section snippets
Data and measures
For the outcome of interest, we used electronic health records from the Geisinger Health System on measured height and weight of children ages 10–18 during 2010 (N=22,497). Data was drawn from a dataset that provides information from the Geisinger Health System from 2001 to 2012 for children ages 0-18. Geisinger is the largest healthcare provider in Pennsylvania, serving patients in 37 counties in central and northeastern Pennsylvania. The study population has been found to be representative of
Results
The analysis was based on prediction of 50 high-obesity and 49 low-obesity communities (communities in the upper and lower quartile of the BMI-z distribution of communities with at least 50 children). The average community BMI-z across all communities was 0.69 (standard deviation (SD)=0.18); the highest quartile had an average BMI z-score of 0.92 (SD=0.1) while the lowest quartile had an average BMI z-score of 0.47 (SD=0.09). Table 2 shows descriptive characteristics of the four quartiles of
Discussion
This study combined a theory-driven approach to candidate variable selection with a data-driven analysis strategy. We believe this is the first study to use conditional random forests (CRF), a machine learning technique, to consider the risk landscape of obesogenic environments as a diverse set of spatially co-occuring risk factors. Our results suggest that environments represent a unified risk topography that is not easily divided along dimensional lines. We identified 13 variables that
Conflict of interest
None.
Acknowledgments
The project described was supported by Grant number U54HD070725 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). The project is co-funded by the NICHD and the Office of Behavioral and Social Sciences Research (OBSSR). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NICHD or OBSSR. Dr. Nau was supported by the training core of the Johns Hopkins Global Obesity Prevention Center.
References (70)
- et al.
Neighborhood environment profiles related to physical activity and weight status: a latent profile analysis
Prev. Med.
(2011) - et al.
Fast food, race/ethnicity, and income: a geographic analysis
Am. J. Prev. Med.
(2004) - et al.
Impact of the food environment and physical activity environment on behaviors and weight status in rural U.S. communities
Prev. Med.
(2008) Obesity and the food environment: dietary energy density and diet costs
Am. J. Prev. Med.
(2004)- et al.
Disparities in obesity rates: analysis by ZIP code area
Soc. Sci. Med.
(2007) - et al.
Neighborhood characteristics and availability of healthy foods in Baltimore
Am. J. Prev. Med.
(2008) - et al.
Stepping towards causation: do built environments or neighborhood and travel preferences explain physical activity, driving, and obesity?
Soc. Sci. Med. (1982)
(2007) - et al.
Fast food and obesity: a spatial analysis in a large United Kingdom population of children aged 13–15
Am. J. Prev. Med.
(2012) - et al.
Behavioral science at the crossroads in public health: extending horizons, envisioning the future
Soc. Sci. Med.
(2006) - et al.
Multicontextual correlates of adolescent leisure-time physical activity
Am. J. Prev. Med.
(2014)
Child obesity associated with social disadvantage of children’s neighborhoods
Soc. Sci. Med.
Is social capital a protective factor against obesity and diabetes? Findings from an exploratory study
Ann. Epidemiol.
You are where you shop: grocery store locations, weight, and neighborhoods
Am. J. Prev. Med.
Influence of individual-and area-level measures of socioeconomic status on obesity, unhealthy eating, and physical inactivity in Canadian adolescents
Am. J. Clin. Nutr.
Food and park environments: neighborhood-level risks for childhood obesity in east Los Angeles
J. Adolesc. Health
Risk profiles for overweight/obesity among preschoolers
Early Hum. Dev.
The contextual influence of coal abandoned mine lands in communities and type 2 diabetes in Pennsylvania
Health Place
The link between obesity and the built environment. Evidence from an ecological analysis of obesity and vehicle miles of travel in California
Health Place
The use of classification and regression trees in clinical epidemiology
J. Clin. Epidemiol.
Weight status and restaurant availability a multilevel analysis
Am. J. Prev. Med.
Combined measure of neighborhood food and physical activity environments and weight-related outcomes: the CARDIA study
Heath Place
Supermarkets, other food stores, and obesity: the atherosclerosis risk in communities study
Am. J. Prev. Med.
Obesity is an environmental issue
Atherosclerosis
An ecological study of the relationship between social and environmental determinants of obesity
Health Place
Body mass index and the built and social environments in children and adolescents using electronic health records
Am. J. Prev. Med.
Dissecting obesogenic environments: the development and application of a framework for identifying and prioritizing environmental interventions for obesity
Prev. Med.
Is neighbourhood obesogenicity associated with body mass index in women? Application of an obesogenicity index in socioeconomically disadvantaged neighbourhoods
Health & Place
Obesity rates, income, and suburban sprawl: an analysis of US states
Health Place
Patterns of obesogenic neighborhood features and adolescent weight: a comparison of statistical approaches
Am. J. Prev. Med.
Patterns of neighborhood environment attributes related to physical activity across 11 countries: a latent class analysis
Int. J. Behav. Nutr. Phys. Act.
Geographic disparities in US mortality: “hot-spotting” large databases
Epidemiology
Random forests
Mach. Learn.
Statistical modeling: The two cultures
Stat. Sci.
Classification and Regression Trees
Cited by (27)
Influence of home/school environments on children's obesity, diet, and physical activity: the SUECO study protocol
2022, Gaceta SanitariaCitation Excerpt :Furthermore, accessibility to recreational facilities (e.g., playgrounds), walkability, sports facilities, and active school transportation opportunities are associated with increased physical activity.7,8 Yet, scarce research has simultaneously examined more than one dimension of the obesogenic environment concerning children's obesity.9,10 Also, most supporting evidence comes from the US, Australia, and the UK; however, the structure of the built and the food environment differ across geographic contexts.11–13
Systematic review of approaches to use of neighborhood-level risk factors with clinical data to predict clinical risk and recommend interventions
2021, Journal of Biomedical InformaticsCitation Excerpt :Of the studies to which this applied, five reported how they ensured inclusion in the population for the entire time period [39,40,44,47,51]. Six of the studies did not attempt to control for individual-level SDOH as a confounder by inclusion of data that is commonly found in the EHR in the form of race, ethnicity, or insurance class [36,38,41,44,48,52]. Additionally, several potential biases were identified in the articles.
Using electronic health record data for environmental and place based population health research: a systematic review
2018, Annals of EpidemiologyCitation Excerpt :Most of the built environment and land use articles studied associations with cardiometabolic conditions or obesity (n = 15, 88%). Nearly all were cross-sectional design and studied the impacts of access to physical activity [120–123], walkability [21,23,124–127], or the food environment (e.g., access to healthy foods) [21,123,127–131]. For example, in a cross-sectional study, Fiechtner et al. (2015) [129] found that living less than or equal to 0.5 miles from fast food restaurants was associated with higher BMI Z-scores.
Interaction between Geographical Areas and Family Environment of Dietary Habits, Physical Activity, Nutritional Knowledge and Obesity of Adolescents
2023, International Journal of Environmental Research and Public Health