Sydney’s residential relocation landscape: Machine learning and feature selection methods unpack the whys and whens

Maryam Bostanara

Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney

Amarin Siripanich

Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney

Milad Ghasri

School of Engineering and Information Technology (SEIT), University of New South Wales, Canberra

Taha Hossein Rashidi

Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney

DOI: https://doi.org/10.5198/jtlu.2024.2440

Keywords: Residential relocation, Machine learning, Survival analysis, Residential self-selection, Accessibility


Abstract

This study investigates household residential relocation timing, an aspect vital for transport and urban planning. Analyzing a high-dimensional dataset from 1,024 relocations in Sydney, Australia, the research contrasts ten machine learning survival techniques with three classical survival models. Results indicate that when classical models are paired with tree-based automated feature selectors, they align closely with machine learning outcomes. Notably, the GBM, XGBoost, and Random Forest models emerge as standout performers. The study provides a comprehensive comparison between automatic and manual feature selection, shedding light on variables influencing households’ duration of stay. While stacked ensemble modeling, which leverages predictions from various models, is used to enhance accuracy, the improvements are marginal, underscoring inherent modeling challenges, particularly the recurring issue of misclassifying specific pairs of households in the concordance index measure. A thorough feature analysis highlights homeownership as the foremost predictor, underscoring the importance of recent life events and accessibility features in relocation decisions. The research emphasizes the importance of considering the accessibility of both current and future homes in relocation models, with 20% feature significance in model outcomes. Building on these foundational insights, the study paves the way for a deeper understanding of individual decision-making processes in sustainable urban planning.


References

Aditjandra, P. T., Cao, X. Y., & Mulley, C. (2016). Exploring changes in public transport use and walking following residential relocation A British case study. Journal of Transport and Land Use, 9(3), 77-95. https://doi.org/10.5198/jtlu.2015.588

Aghaabbasi, M., Shekari, Z. A., Shah, M. Z., Olakunle, O., Armaghani, D. J., & Moeinaddini, M. (2020). Predicting the use frequency of ride-sourcing by off-campus university students through random forest and Bayesian network techniques. Transportation Research Part A: Policy and Practice, 136, 262-281. https://doi.org/https://doi.org/10.1016/j.tra.2020.04.013

Australian Bureau of Statistics (2016). 'SOCIO-ECONOMIC INDEXES FOR AREAS (SEIFA)'. Retrieved 27 October 2022 from https://www.abs.gov.au/ausstats/abs@.nsf/mf/2033.0.55.001

Axhausen, K. W., König, A., Scott, D. M., & Jürgens, C. (2004). Locations, Commitments and Activity Spaces Human Behaviour and Traffic Networks, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-07809-9_9

Bender, A., Rügamer, D., Scheipl, F., & Bischl, B. (2021). A General Machine Learning Framework for Survival Analysis Machine Learning and Knowledge Discovery in Databases, Cham. https://doi.org/10.1007/978-3-030-67664-3_10

Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., & Jones, Z. M. (2016). mlr: Machine Learning in R. The Journal of Machine Learning Research, 17(1), 5938-5942.

Bostanara, M., Hossein Rashidi, T., Khan, N. A., Auld, J., Ghasri, M., & Grazian, C. (2023). The co-determination of home and workplace relocation durations using survival copula analysis. Computers, Environment and Urban Systems, 99, 101898. https://doi.org/10.1016/j.compenvurbsys.2022.101898

Bostanara, M., Rashidi, T. H., Auld, J. A., & Ghasri, M. (2021). A comparison between residential relocation timing of Sydney and Chicago residents: A Bayesian survival analysis. Computers, Environment and Urban Systems, 89, 101659. https://doi.org/10.1016/j.compenvurbsys.2021.101659

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Buckle, C. (2017). Residential mobility and moving home. 11(5), e12314. https://doi.org/https://doi.org/10.1111/gec3.12314

Cai, Q., Abdel-Aty, M., Sun, Y., Lee, J., & Yuan, J. (2019). Applying a deep learning approach for transportation safety planning by using high-resolution transportation and land use data. Transportation Research Part A: Policy and Practice, 127, 71-85. https://doi.org/10.1016/j.tra.2019.07.010

Cao, X., Mokhtarian, P. L., & Handy, S. L. (2009). Examining the Impacts of Residential Self‐Selection on Travel Behaviour: A Focus on Empirical Findings. Transport Reviews, 29(3), 359-395. https://doi.org/10.1080/01441640802539195

Cervero, R. (2003). City CarShare: First-Year Travel Demand Impacts. 1839(1), 159-166. https://doi.org/10.3141/1839-18

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28. https://doi.org/10.1016/j.compeleceng.2013.11.024

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., & Chen, K. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2. https://cran.ms.unimelb.edu.au/web/packages/xgboost/vignettes/xgboost.pdf

Cheng, Z., Wang, W., Lu, J., & Xing, X. (2020). Classifying the traffic state of urban expressways: A machine-learning approach. Transportation Research Part A: Policy and Practice, 137, 411-428. https://doi.org/10.1016/j.tra.2018.10.035

Clark, W. A. V. (2013). Life course events and residential change: unpacking age effects on the probability of moving. Journal of Population Research, 30(4), 319-334. https://doi.org/10.1007/s12546-013-9116-y

Colabianchi, N. (2009). Does the built environment matter for physical activity? Current Cardiovascular Risk Reports, 3(4), 302-307. https://doi.org/10.1007/s12170-009-0046-3

de Palma, A., Picard, N., & Waddell, P. (2007). Discrete choice models with capacity constraints: An empirical analysis of the housing market of the greater Paris region. Journal of Urban Economics, 62(2), 204-230. https://doi.org/10.1016/j.jue.2007.02.007

De Vos, J., & Ettema, D. (2020). Travel and residential change: An introduction. Travel Behaviour and Society, 19, 33-35. https://doi.org/10.1016/j.tbs.2019.11.003

Dieleman, F. M. (2001). Modelling residential mobility; a review of recent trends in research. Journal of Housing and the Built Environment, 16(3-4), 249-265. https://doi.org/10.1023/A:1012515709292

Ding, C., Chen, P., & Jiao, J. (2018). Non-linear effects of the built environment on automobile-involved pedestrian crash frequency: A machine learning approach. Accident Analysis & Prevention, 112, 116-126. https://doi.org/10.1016/j.aap.2017.12.026

Frank, L. D., Sallis, J. F., Conway, T. L., Chapman, J. E., Saelens, B. E., & Bachman, W. (2006). Many Pathways from Land Use to Health: Associations between Neighborhood Walkability and Active Transportation, Body Mass Index, and Air Quality. Journal of the American Planning Association, 72(1), 75-87. https://doi.org/10.1080/01944360608976725

Ghasri, M., Rashidi, T., & Auld, J. (2022). Determinants of residential mobility: an adaptive retrospective survey method. Transportation Letters, 1-13. https://doi.org/10.1080/19427867.2022.2038347

Gordon, L., & Olshen, R. A. (1985). Tree-structured survival analysis. Cancer treatment reports, 69(10), 1065-1069. http://europepmc.org/abstract/MED/4042086

Habib, M. A., & Miller, E. J. (2009). Reference-Dependent Residential Location Choice Model within a Relocation Context. Transportation Research Record, 2133(1), 92-99. https://doi.org/10.3141/2133-10

Handy, S. L., & Clifton, K. J. (2001). Local shopping as a strategy for reducing automobile travel. Transportation, 28(4), 317-346. https://doi.org/10.1023/A:1011850618753

Hastie, T., & Qian, J. (2014). Glmnet vignette. 9(2016), 1-30.

Hedman, L., & van Ham, M. (2012). Understanding Neighbourhood Effects: Selection Bias and Residential Mobility. In M. van Ham, D. Manley, N. Bailey, L. Simpson, & D. Maclennan (Eds.), Neighbourhood Effects Research: New Perspectives (pp. 79-99). Springer Netherlands. https://doi.org/10.1007/978-94-007-2309-2_4

Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3), 355-373. https://doi.org/10.1093/biostatistics/kxj011

Jenkins, S. P. (2005). Survival analysis. Unpublished manuscript, Institute for Social and Economic Research, University of Essex, Colchester, UK, 42, 54-56.

Jović, A., Brkić, K., & Bogunović, N. (2015, 25-29 May 2015). A review of feature selection methods with applications. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Kern, C., Klausch, T., & Kreuter, F. (2019). Tree-based Machine Learning Methods for Survey Research. Surv Res Methods, 13(1), 73-93.

Kim, J. H., Pagliara, F., & Preston, J. (2005). The Intention to Move and Residential Location Choice Behaviour. Urban Studies, 42(9), 1621-1636. https://doi.org/10.1080/00420980500185611

Kogalur, H. I. a. U. B. (2022). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). manual. https://cran.r-project.org/package=randomForestSRC

Landau, W. M. (2021). The targets R package: a dynamic make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959. https://doi.org/10.21105/joss.02959

Lerman, S. R. (1975). A disaggregate behavioral model of urban mobility decisions Massachusetts Institute of Technology]. http://hdl.handle.net/1721.1/27388

Levinson, D. M. (2019). The 30-minute city: designing for access. Network Design Lab. https://hdl.handle.net/2123/21630

Li, L., Zhu, J., Zhang, H., Tan, H., Du, B., & Ran, B. (2020). Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transportation Research Part A: Policy and Practice, 136, 282-292. https://doi.org/10.1016/j.tra.2020.04.005

Lin, T., Wang, D., & Zhou, M. (2018). Residential relocation and changes in travel behavior: what is the role of social context change? Transportation Research Part A: Policy and Practice, 111, 360-374. https://doi.org/10.1016/j.tra.2018.03.015

Liu, D.-R., Li, H.-L., & Wang, D. (2015). Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, 12(3), 229-242. https://doi.org/10.1007/s11633-015-0893-y

Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496. https://doi.org/10.1016/j.jbi.2020.103496

Lowry, I. S. (1964). A model of metropolis. https://apps.dtic.mil/sti/citations/tr/AD0603670

Miller, E. J. (2018). Accessibility: measurement and application in transportation planning. Transport Reviews, 38(5), 551-555. https://doi.org/10.1080/01441647.2018.1492778

Mokhtarian, P. L., & Cao, X. (2008). Examining the impacts of residential self-selection on travel behavior: A focus on methodologies. Transportation Research Part B: Methodological, 42(3), 204-228. https://doi.org/10.1016/j.trb.2007.07.006

Parmar, J., Das, P., & Dave, S. M. (2021). A machine learning approach for modelling parking duration in urban land-use. Physica A: Statistical Mechanics and its Applications, 572, 125873. https://doi.org/10.1016/j.physa.2021.125873

Pereira, R. H., Saraiva, M., Herszenhut, D., Braga, C. K. V., & Conway, M. W. (2021). r5r: rapid realistic routing on multimodal transport networks with r 5 in r. TRANSPORT FINDINGS 21262.

Pineda-Jaramillo, J., & Arbeláez-Arenas, Ó. (2022). Assessing the Performance of Gradient-Boosting Models for Predicting the Travel Mode Choice Using Household Survey Data. 148(2), 04022007. https://doi.org/doi:10.1061/(ASCE)UP.1943-5444.0000830

Prillwitz, J., Harms, S., & Lanzendorf, M. (2007). Interactions between Residential Relocations, Life Course Events, and Daily Commute Distances. Transportation Research Record, 2021(1), 64-69. https://doi.org/10.3141/2021-08

Radovic, M., Ghalwash, M., Filipovic, N., & Obradovic, Z. (2017). Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics, 18(1), 9. https://doi.org/10.1186/s12859-016-1423-9

Rashidi, T. H., & Ghasri, M. (2017). A competing survival analysis for housing relocation behaviour and risk aversion in a resilient housing market. Environment and Planning B: Urban Analytics and City Science, 46(1), 122-142. https://doi.org/10.1177/2399808317703381

Rashidi, T. H., Mohammadian, A., & Koppelman, F. S. (2011). Modeling interdependencies between vehicle transaction, residential relocation and job change. Transportation, 38(6), 909. https://doi.org/10.1007/s11116-011-9359-4

Rossi, P. H. (1955). Why families move: A study in the social psychology of urban residential mobility. Free Press. https://doi.org/10.1177/1440783396032001

Sánchez, A. C., & Andrews, D. (2011). Residential Mobility and Public Policy in OECD Countries. OECD Journal: Economic Studies, 2011(1), 1-22. https://doi.org/10.1787/eco_studies-2011-5kg0vswqt240

Sarkar, J. P., Saha, I., Sarkar, A., & Maulik, U. (2021). Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers. Computers in Biology and Medicine, 131, 104244. https://doi.org/10.1016/j.compbiomed.2021.104244

Scheiner, J., & Holz-Rau, C. (2013). A comprehensive study of life course, cohort, and period effects on changes in travel mode use. Transportation Research Part A: Policy and Practice, 47, 167-181. https://doi.org/10.1016/j.tra.2012.10.019

Scheuer, S., Haase, D., Haase, A., Wolff, M., & Wellmann, T. (2021). A glimpse into the future of exposure and vulnerabilities in cities? Modelling of residential location choice of urban population with random forest. Nat. Hazards Earth Syst. Sci., 21(1), 203-217. https://doi.org/10.5194/nhess-21-203-2021

Schirmer, P. M., van Eggermond, M. A. B., & Axhausen, K. W. (2014). The role of location in residential location choice models: a review of literature. Journal of Transport and Land Use, 7(2), 3-21. https://doi.org/10.5198/jtlu.v7i2.740

Shen, Q. (2001). A Spatial Analysis of Job Openings and Access in a U.S. Metropolitan Area. Journal of the American Planning Association, 67(1), 53-68. https://doi.org/10.1080/01944360108976355

Sonabend, R., Király, F. J., Bender, A., Bischl, B., & Lang, M. (2021). mlr3proba: an R package for machine learning in survival analysis. Bioinformatics, 37(17), 2789-2791. https://doi.org/10.1093/bioinformatics/btab039

Spooner, A., Chen, E., Sowmya, A., Sachdev, P., Kochan, N. A., Trollor, J., & Brodaty, H. (2020). A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific Reports, 10(1), 20410. https://doi.org/10.1038/s41598-020-77220-w

Sprumont, F., & Viti, F. (2018). The effect of workplace relocation on individuals’ activity travel behavior. Journal of Transport and Land Use, 11(1). https://doi.org/10.5198/jtlu.2018.1123

Srour, I. M., Kockelman, K. M., & Dunn, T. P. (2002). Accessibility Indices: Connection to Residential Land Prices and Location Choices. Transportation Research Record, 1805(1), 25-34. https://doi.org/10.3141/1805-04

Therneau, T., Atkinson, B., Ripley, B., & Ripley, M. B. (2015). Package ‘rpart’. https://cran.r-project.org/web/packages/rpart/rpart.pdf

Thomas, M. J., Stillwell, J. C., & Gould, M. I. (2016). Modelling the duration of residence and plans for future residential relocation: A multilevel analysis. Transactions of the Institute of British Geographers, 41(3), 297-312. https://doi.org/10.1111/tran.12123

Tran, M. T., Zhang, J., Chikaraishi, M., & Fujiwara, A. (2016). A joint analysis of residential location, work location and commuting mode choices in Hanoi, Vietnam. Journal of Transport Geography, 54, 181-193. https://doi.org/10.1016/j.jtrangeo.2016.06.003

Wachs, M., & Kumagai, T. G. (1973). Physical accessibility as a social indicator. Socio-economic Planning Sciences, 7(5), 437-456. https://doi.org/10.1016/0038-0121(73)90041-4

Wang, P., Li, Y., & Reddy, C. K. (2019). Machine Learning for Survival Analysis: A Survey. 51(6 %J ACM Comput. Surv.), Article 110. https://doi.org/10.1145/3214306

Wright, M. N., & Ziegler, A. (2015). ranger: A fast implementation of random forests for high dimensional data in C++ and R. https://doi.org/10.48550/arXiv.1508.04409

Xu, T., Gao, J., & Li, Y. (2019). Machine learning-assisted evaluation of land use policies and plans in a rapidly urbanizing district in Chongqing, China. Land Use Policy, 87, 104030. https://doi.org/10.1016/j.landusepol.2019.104030

Xue, F., & Yao, E. (2022). Adopting a random forest approach to model household residential relocation behavior. Cities, 125, 103625. https://doi.org/10.1016/j.cities.2022.103625

Yi, C., & Kim, K. (2018). A Machine Learning Approach to the Residential Relocation Distance of Households in the Seoul Metropolitan Region. 10(9), 2996. https://www.mdpi.com/2071-1050/10/9/2996

Zhang, J. (2014). Revisiting residential self-selection issues: A life-oriented approach. Journal of Transport and Land Use, 7(3), 29-45. https://doi.org/10.5198/jtlu.v7i3.460

Zhou, B., & Kockelman, K. M. (2008). Microsimulation of Residential Land Development and Household Location Choices: Bidding for Land in Austin, Texas. Transportation Research Record, 2077(1), 106-112. https://doi.org/10.3141/2077-14

Zhou, M., Le, D.-T., Nguyen-Phuoc, D. Q., Zegras, P. C., & Ferreira, J. (2021). Simulating impacts of Automated Mobility-on-Demand on accessibility and residential relocation. Cities, 118, 103345. https://doi.org/10.1016/j.cities.2021.103345

Zolfaghari, A., Sivakumar, A., & Polak, J. W. (2012). Choice set pruning in residential location choice modelling: a comparison of sampling and choice set generation approaches in greater London. Transportation Planning and Technology, 35(1), 87-106. https://doi.org/10.1080/03081060.2012.635420

Zondag, B., & Pieters, M. (2005). Influence of Accessibility on Residential Location Choice. Transportation Research Record, 1902(1), 63-70. https://doi.org/10.1177/0361198105190200108