Data aggregation impacts on built environment-mode share models around public transit stations

Seyed Sajjad Abdollahpour

Virginia Polytechnic Institute and State University

Huyen T. K. Le

The Ohio State University

Ralph Buehler

Virginia Polytechnic Institute and State University

Steve Hankey

Virginia Polytechnic Institute and State University

DOI: https://doi.org/10.5198/jtlu.2025.2676

Keywords: Travel behavior, Land use, Urban form, Zoning and scale effects, Modifiable areal unit of problem


Abstract

This study examines how data aggregation influences the relationship between the built environment (BE) and mode share around 2,794 rail and BRT stations in the United States, using both inferential and machine learning methods. The results indicate that data aggregation impacts the outcomes of BE-mode share models, regardless of the data analysis approach. Models using network buffers are less affected by data aggregation compared to those using circular buffers, Thiessen polygons, or administrative boundaries (block groups). In addition, the optimal buffer sizes for capturing BE effects and minimizing sensitivity to data aggregation for active and public transit modes are 800 meters for BRT stations and 1000 meters for rail stations, while 1200 meters is effective for private vehicle mode share at both rail and BRT stations. Furthermore, key BE features in commuting mode share models—such as employment density, jobs per household, intersection density, residential density, distance from the central business district, job accessibility (active), and regional population density—remain robust against data aggregation. We recommend that urban and transportation planners account for aggregation biases and apply multiple methods when evaluating BE's impact on mode share around public transit stations to inform more effective policy recommendations.


References

Abdollahpour, S. S., Buehler, R., Le, H. T., Nasri, A., & Hankey, S. (2024). Built environment’s nonlinear effects on mode shares around BRT and rail stations. Transportation Research Part D: Transport and Environment, 129, 104143.

Abdollahpour, S. S., Le, H. T., & Hankey, S. (2025). Changes in the predictors of transit ridership in post-COVID-19 US metropolitan areas. Travel Behavior and Society, 40, 101002.

Aghaabbasi, M., & Chalermpong, S. (2023). Machine learning techniques for evaluating the nonlinear link between built-environment characteristics and travel behaviors: A systematic review. Travel Behavior and Society, 33, e100640-e100640.

Akbari, P., & Bafarasat, A. Z. (2024). Exploring energy efficiency in historical urban fabrics for energy-conscious planning of new urban developments. Journal of Urban Planning and Development, 150(2), 04024011.

Alpaydin, E. (2020). Introduction to machine learning. Cambridge, MA: MIT press.

Babapourdijojin, M., Corazza, M. V., & Gentile, G. (2024). Systematic analysis of commuting behavior in Italy using K-means clustering and spatial analysis: Towards inclusive and sustainable urban transport solutions. Future Transportation, 4(4), 1430–1456.

Barri, E. Y., Farber, S., Jahanshahi, H., & Beyazit, E. (2022). Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms. Journal of Transport Geography, 105, 103482.

Boarnet, M. G., & Crane, R. (2001). Travel by design: The influence of urban form on travel. Oxford, UK: Oxford University Press.

Braun, L. M., Rodriguez, D. A., Cole-Hunter, T., Ambros, A., Donaire-Gonzalez, D., Jerrett, M., …, & de Nazelle, A. (2016). Short-term planning and policy interventions to promote cycling in urban centers: Findings from a commute mode choice analysis in Barcelona, Spain. Transportation Research Part A: Policy and Practice, 89, 164–183.

Breiman, L. (2001). Random forests. Machine learning, 45, 5–32.

Cardozo, O. D., García-Palomares, J. C., & Gutiérrez, J. (2012). Application of geographically weighted regression to the direct forecasting of transit ridership at station-level. Applied Geography, 34, 548–558.

Chen, E., Ye, Z., & Wu, H. (2021). Nonlinear effects of built environment on intermodal transit trips considering spatial heterogeneity. Transportation Research Part D: Transport and Environment, 90, 102677.

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794).

Cheng, L., De Vos, J., Zhao, P., Yang, M., & Witlox, F. (2020). Examining non-linear built environment effects on elderly’s walking: A random forest approach. Transportation research part D: transport and environment, 88, 102552.

Clark, A., & Scott, D. (2014). Understanding the impact of the modifiable areal unit problem on the relationship between active travel and the built environment. Urban Studies, 51(2), 284–299.

Cui, B., Boisjoly, G., Miranda-Moreno, L., & El-Geneidy, A. (2020). Accessibility matters: Exploring the determinants of public transport mode share across income groups in Canadian cities. Transportation Research Part D: Transport and Environment, 80, 102276.

De Vos, J., Cheng, L., Kamruzzaman, M., & Witlox, F. (2021). The indirect effect of the built environment on travel mode choice: A focus on recent movers. Journal of Transport Geography, 91, 102983.

Ding, C., Cao, X. J., & Næss, P. (2018). Applying gradient boosting decision trees to examine non-linear effects of the built environment on driving distance in Oslo. Transportation Research Part A: Policy and Practice, 110, 107–117.

Duan, Y., Yuan, C., Mao, X., Zhao, J., & Ma, N. (2023). Influence of the built environment on taxi travel demand based on the optimal spatial analysis unit. PLoS one, 18(10), e0292363.

Eldafrawi, M., Varghese, K., Afsari, M., Babapourdijojin, M., & Gentile, G. (2023). Predictive analytics for road traffic accidents: Exploring severity through conformal prediction. Paper presented at the 2024 TRB Annual Meeting, January 7–11, Washington DC, USA.

Eom, J. K., Choi, J., Park, M. S., & Heo, T.-Y. (2019). Exploring the catchment area of an urban railway station by using transit card data: Case study in Seoul. Cities, 95, 102364.

Ewing, R., & Cervero, R. (2010). Travel and the built environment: A meta-analysis. Journal of the American Planning Association, 76(3), 265–294.

Farber, S., & Marino, M. G. (2017). Transit accessibility, land development and socioeconomic priority: A typology of planned station catchment areas in the Greater Toronto and Hamilton Area. Journal of Transport and Land Use, 10(1), 879–902.

Feudo, F. L. (2014). How to build an alternative to sprawl and auto-centric development model through a TOD scenario for the North-Pas-de-Calais region? Lessons from an integrated transportation-land use modelling. Transportation Research Procedia, 4, 154–177.

Gao, F., Li, S., Tan, Z., Wu, Z., Zhang, X., Huang, G., & Huang, Z. (2021). Understanding the modifiable areal unit problem in dockless bike sharing usage and exploring the interactive effects of built environment factors. International Journal of Geographical Information Science, 35(9), 1905–1925.

Gao, F., Tang, J., & Li, Z. (2022). Effects of spatial units and travel modes on urban commuting demand modeling. Transportation, 49(6), 1549–1575.

Gehrke, S. R., & Clifton, K. J. (2014). Operationalizing land use diversity at varying geographic scales and its connection to mode choice: Evidence from Portland, Oregon. Transportation Research Record, 2453(1), 128–136.

Gu, X., Lin, S., & Wang, C. (2024). Integrated impact of urban mixed land use on TOD ridership: A multi-radius comparative analysis. Journal of Transport and Land Use, 17(1), 457–481.

Guerra, E., Cervero, R., & Tischler, D. (2012). Half-mile circle: Does it best represent transit station catchments? Transportation Research Record, 2276(1), 101–109.

Gutiérrez, J., Cardozo, O. D., & García-Palomares, J. C. (2011). Transit ridership forecasting at station level: An approach based on distance-decay weighted regression. Journal of Transport Geography, 19(6), 1081–1092.

Henao, A., Piatkowski, D., Luckey, K. S., Nordback, K., Marshall, W. E., & Krizek, K. J. (2015). Sustainable transportation infrastructure investments and mode share changes: A 20-year background of Boulder, Colorado. Transport Policy, 37, 64–71.

Hong, J., Shen, Q., & Zhang, L. (2014). How do built-environment factors affect travel behavior? A spatial analysis at different geographic scales. Transportation, 41, 419–440.

Hox, J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge.

Jamme, H.-T., Rodriguez, J., Bahl, D., & Banerjee, T. (2019). A twenty-five-year biography of the TOD concept: From design to policy, planning, and implementation. Journal of Planning Education and Research, 39(4), 409–428.

Jian, W., Liu, X., Liu, H., Hu, Y., & Gao, L. (2023). The impacts of the multiscale built environment on commuting mode choice: Spatial heterogeneity, moderating effects, and implications for demand estimation. Journal of Advanced Transportation, 2023(1), 9346631.

Khalil, M. A., & Fatmi, M. R. (2025). How effective are discrete-continuous multi-task learning compared to single-output models? Insights from travel mode and departure time analysis. Expert Systems with Applications, 127002.

Kuby, M., Barranda, A., & Upchurch, C. (2004). Factors influencing light-rail station boardings in the United States. Transportation Research Part A: Policy and Practice, 38(3), 223–247.

Laviolette, J., Morency, C., & Waygood, E. (2022). A kilometer or a mile? Does buffer size matter when it comes to car ownership? Journal of Transport Geography, 104, 103456.

Li, S., Lyu, D., Huang, G., Zhang, X., Gao, F., Chen, Y., & Liu, X. (2020). Spatially varying impacts of built environment factors on rail transit ridership at station level: A case study in Guangzhou, China. Journal of Transport Geography, 82, 102631.

Li, T., Zhang, M., Jiang, H., & Jing, P. (2022). Understanding the modifiable areal unit problem and identifying appropriate spatial units while studying the influence of the built environment on the traffic system state. Journal of Advanced Transportation, 2022(1), 8288248.

Li, Z., Tang, J., Ji, Y., Liang, X., Hu, L., & Hu, C. (2025). Relationship between the built environment and metro usage patterns: A motif-based perspective. Tunneling and Underground Space Technology, 159, 106488.

Liu, X., Chen, X., Tian, M., & De Vos, J. (2023). Effects of buffer size on associations between the built environment and metro ridership: A machine learning-based sensitive analysis. Journal of Transport Geography, 113, 103730.

Liu, Y., Nath, N., Murayama, A., & Manabe, R. (2022). Transit-oriented development with urban sprawl? Four phases of urban growth and policy intervention in Tokyo. Land Use Policy, 112, 105854.

Loo, B. P., Chen, C., & Chan, E. T. (2010). Rail-based transit-oriented development: Lessons from New York City and Hong Kong. Landscape and Urban Planning, 97(3), 202–212.

Luo, C., Hu, Y., & Wang, F. (2025). A big data approach to mitigating the MAUP in measuring excess commuting. Computational Urban Science, 5(1), 14.

Mitra, R., & Buliung, R. N. (2012). Built environment correlates of active school transportation: Neighborhood and the modifiable areal unit problem. Journal of Transport Geography, 20(1), 51–61.

Molnar, C. (2020). Interpretable machine learning. A guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/

Næss, P. (2011). 'New urbanism' or metropolitan-level centralization? A comparison of the influences of metropolitan-level and neighborhood-level urban form characteristics on travel behavior. Journal of Transport and Land Use, 4(1), 25–44.

Nakshi, P., & Debnath, A. K. (2021). Impact of built environment on mode choice to major destinations in Dhaka. Transportation Research Record, 2675(4), 281–296.

Nasri, A., & Zhang, L. (2014). The analysis of transit-oriented development (TOD) in Washington, DC and Baltimore metropolitan areas. Transport Policy, 32, 172–179.

Nasri, A., & Zhang, L. (2019). Multi-level urban form and commuting mode share in rail station areas across the United States: A seemingly unrelated regression approach. Transport Policy, 81, 311–319.

Oliver, L. N., Schuurman, N., & Hall, A. W. (2007). Comparing circular and network buffers to examine the influence of land use on walking for leisure and errands. International Journal of Health Geographics, 6, 1–11.

Openshaw, S. (1984). The modifiable areal unit problem. Concepts and techniques in modern geography. Norwich: Geo Books.

Pan, Q., & Sharifi, S. (2024). Third step of four step modeling (mode choice models). Transportation Land Use Modeling and Policy (TLUMP). Retrieved from https://open.umn.edu/opentextbooks/textbooks/transportation-land-use-modeling-and-policy-tlump

Pani, A., Sahu, P. K., Chandra, A., & Sarkar, A. K. (2019). Assessing the extent of modifiable areal unit problem in modelling freight (trip) generation: Relationship between zone design and model estimation results. Journal of Transport Geography, 80, 102524.

Papadakis, D. M., Savvides, A., Michael, A., & Michopoulos, A. (2024). Advancing sustainable urban mobility: Insights from best practices and case studies. Fuel Communications, 20, 100125.

Park, K., Ewing, R., Scheer, B. C., & Tian, G. (2018). The impacts of built environment characteristics of rail station areas on household travel behavior. Cities, 74, 277–283.

Renne, J. L., Hamidi, S., & Ewing, R. (2016). Transit commuting, the network accessibility effect, and the built environment in station areas across the United States. Research in Transportation Economics, 60, 35–43.

Sun, L.-S., Wang, S.-W., Yao, L.-Y., Rong, J., & Ma, J.-M. (2016). Estimation of transit ridership based on spatial analysis and precise land use data. Transportation Letters, 8(3), 140–147.

Tao, T., & Cao, J. (2023). Exploring nonlinear and collective influences of regional and local built environment characteristics on travel distances by mode. Journal of Transport Geography, 109, 103599.

Tao, T., Wu, X., Cao, J., Fan, Y., Das, K., & Ramaswami, A. (2023). Exploring the nonlinear relationship between the built environment and active travel in the twin cities. Journal of Planning Education and Research, 43(3), 637–652.

Tian, G., Kalantari, H. A., & Ewing, R. (2023). Are older adults living in compact development more active? Evidence from 36 diverse regions of the United States. Computational Urban Science, 3(1), 10.

Wade, C., & Glynn, K. (2020). Hands-on gradient boosting with XGBoost and scikit-learn: Perform accessible machine learning and extreme gradient boosting with Python. Packt Publishing Ltd.

Wey, W.-M., & Huang, J.-Y. (2018). Urban sustainable transportation planning strategies for livable City's quality of life. Habitat International, 82, 9–27.

Wu, X., Lu, Y., Gong, Y., Kang, Y., Yang, L., & Gou, Z. (2021). The impacts of the built environment on bicycle-metro transfer trips: A new method to delineate metro catchment area based on people's actual cycling space. Journal of Transport Geography, 97, 103215.

Xiao, W., & Wei, Y. D. (2023). Assess the non-linear relationship between built environment and active travel around light-rail transit stations. Applied Geography, 151, 102862.

Yang, H., Li, X., Li, C., Huo, J., & Liu, Y. (2021). How do different treatments of catchment area affect the station level demand modeling of urban rail transit? Journal of Advanced Transportation, 2021, 1–19.

Yang, L., Hu, L., & Wang, Z. (2019). The built environment and trip chaining behavior revisited: The joint effects of the modifiable areal unit problem and tour purpose. Urban Studies, 56(4), 795–817.

Yang, W., & Chang, J. S. (2025). A quasi-experimental study of light rail transit on jobs-housing balance by regional typology: A case study of South Korea. Journal of Transport Geography, 124, 104173.

Yin, Z., Li, W., Li, C., & Zheng, Y. (2025). The relationship between accessibility and land prices: A focus on accessibility to transit in the 15-min city. Travel Behavior and Society, 38, 100914.

Zhang, M., & Kukadia, N. (2005). Metrics of urban form and the modifiable areal unit problem. Transportation Research Record, 1902(1), 71–79.

Zhang, S., Li, Z., & Liu, Z. (2023). Examining built environment effects on metro ridership at station-to-station level considering circle heterogeneity: A case study from Xi’an, China. Journal of Advanced Transportation, 2023.

Zhou, X., Sun, C., Niu, X., & Shi, C. (2022). The modifiable areal unit problem in the relationship between jobs–housing balance and commuting distance through big and traditional data. Travel Behavior and Society, 26, 270–278.