Sydney’s residential relocation landscape: Machine learning and feature selection methods unpack the whys and whens
Maryam Bostanara
Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney
Amarin Siripanich
Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney
Milad Ghasri
School of Engineering and Information Technology (SEIT), University of New South Wales, Canberra
Taha Hossein Rashidi
Research Center for Integrated Transport Innovation (rCITI), School of Civil and Environmental Engineering, University of New South Wales, Sydney
DOI: https://doi.org/10.5198/jtlu.2024.2440
Keywords: Residential relocation, Machine learning, Survival analysis, Residential self-selection, Accessibility
Abstract
This study investigates household residential relocation timing, an aspect vital for transport and urban planning. Analyzing a high-dimensional dataset from 1,024 relocations in Sydney, Australia, the research contrasts ten machine learning survival techniques with three classical survival models. Results indicate that when classical models are paired with tree-based automated feature selectors, they align closely with machine learning outcomes. Notably, the GBM, XGBoost, and Random Forest models emerge as standout performers. The study provides a comprehensive comparison between automatic and manual feature selection, shedding light on variables influencing households’ duration of stay. While stacked ensemble modeling, which leverages predictions from various models, is used to enhance accuracy, the improvements are marginal, underscoring inherent modeling challenges, particularly the recurring issue of misclassifying specific pairs of households in the concordance index measure. A thorough feature analysis highlights homeownership as the foremost predictor, underscoring the importance of recent life events and accessibility features in relocation decisions. The research emphasizes the importance of considering the accessibility of both current and future homes in relocation models, with 20% feature significance in model outcomes. Building on these foundational insights, the study paves the way for a deeper understanding of individual decision-making processes in sustainable urban planning.
References
Aditjandra, P. T., Cao, X. Y., & Mulley, C. (2016). Exploring changes in public transport use and walking following residential relocation A British case study. Journal of Transport and Land Use, 9(3), 77-95. https://doi.org/10.5198/jtlu.2015.588
Aghaabbasi, M., Shekari, Z. A., Shah, M. Z., Olakunle, O., Armaghani, D. J., & Moeinaddini, M. (2020). Predicting the use frequency of ride-sourcing by off-campus university students through random forest and Bayesian network techniques. Transportation Research Part A: Policy and Practice, 136, 262-281. https://doi.org/https://doi.org/10.1016/j.tra.2020.04.013
Australian Bureau of Statistics (2016). 'SOCIO-ECONOMIC INDEXES FOR AREAS (SEIFA)'. Retrieved 27 October 2022 from https://www.abs.gov.au/ausstats/abs@.nsf/mf/2033.0.55.001
Axhausen, K. W., König, A., Scott, D. M., & Jürgens, C. (2004). Locations, Commitments and Activity Spaces Human Behaviour and Traffic Networks, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-07809-9_9
Bender, A., Rügamer, D., Scheipl, F., & Bischl, B. (2021). A General Machine Learning Framework for Survival Analysis Machine Learning and Knowledge Discovery in Databases, Cham. https://doi.org/10.1007/978-3-030-67664-3_10
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., Studerus, E., Casalicchio, G., & Jones, Z. M. (2016). mlr: Machine Learning in R. The Journal of Machine Learning Research, 17(1), 5938-5942.
Bostanara, M., Hossein Rashidi, T., Khan, N. A., Auld, J., Ghasri, M., & Grazian, C. (2023). The co-determination of home and workplace relocation durations using survival copula analysis. Computers, Environment and Urban Systems, 99, 101898. https://doi.org/10.1016/j.compenvurbsys.2022.101898
Bostanara, M., Rashidi, T. H., Auld, J. A., & Ghasri, M. (2021). A comparison between residential relocation timing of Sydney and Chicago residents: A Bayesian survival analysis. Computers, Environment and Urban Systems, 89, 101659. https://doi.org/10.1016/j.compenvurbsys.2021.101659
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Buckle, C. (2017). Residential mobility and moving home. 11(5), e12314. https://doi.org/https://doi.org/10.1111/gec3.12314
Cai, Q., Abdel-Aty, M., Sun, Y., Lee, J., & Yuan, J. (2019). Applying a deep learning approach for transportation safety planning by using high-resolution transportation and land use data. Transportation Research Part A: Policy and Practice, 127, 71-85. https://doi.org/10.1016/j.tra.2019.07.010
Cao, X., Mokhtarian, P. L., & Handy, S. L. (2009). Examining the Impacts of Residential Self‐Selection on Travel Behaviour: A Focus on Empirical Findings. Transport Reviews, 29(3), 359-395. https://doi.org/10.1080/01441640802539195
Cervero, R. (2003). City CarShare: First-Year Travel Demand Impacts. 1839(1), 159-166. https://doi.org/10.3141/1839-18
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., & Chen, K. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2. https://cran.ms.unimelb.edu.au/web/packages/xgboost/vignettes/xgboost.pdf
Cheng, Z., Wang, W., Lu, J., & Xing, X. (2020). Classifying the traffic state of urban expressways: A machine-learning approach. Transportation Research Part A: Policy and Practice, 137, 411-428. https://doi.org/10.1016/j.tra.2018.10.035
Clark, W. A. V. (2013). Life course events and residential change: unpacking age effects on the probability of moving. Journal of Population Research, 30(4), 319-334. https://doi.org/10.1007/s12546-013-9116-y
Colabianchi, N. (2009). Does the built environment matter for physical activity? Current Cardiovascular Risk Reports, 3(4), 302-307. https://doi.org/10.1007/s12170-009-0046-3
de Palma, A., Picard, N., & Waddell, P. (2007). Discrete choice models with capacity constraints: An empirical analysis of the housing market of the greater Paris region. Journal of Urban Economics, 62(2), 204-230. https://doi.org/10.1016/j.jue.2007.02.007
De Vos, J., & Ettema, D. (2020). Travel and residential change: An introduction. Travel Behaviour and Society, 19, 33-35. https://doi.org/10.1016/j.tbs.2019.11.003
Dieleman, F. M. (2001). Modelling residential mobility; a review of recent trends in research. Journal of Housing and the Built Environment, 16(3-4), 249-265. https://doi.org/10.1023/A:1012515709292
Ding, C., Chen, P., & Jiao, J. (2018). Non-linear effects of the built environment on automobile-involved pedestrian crash frequency: A machine learning approach. Accident Analysis & Prevention, 112, 116-126. https://doi.org/10.1016/j.aap.2017.12.026
Frank, L. D., Sallis, J. F., Conway, T. L., Chapman, J. E., Saelens, B. E., & Bachman, W. (2006). Many Pathways from Land Use to Health: Associations between Neighborhood Walkability and Active Transportation, Body Mass Index, and Air Quality. Journal of the American Planning Association, 72(1), 75-87. https://doi.org/10.1080/01944360608976725
Ghasri, M., Rashidi, T., & Auld, J. (2022). Determinants of residential mobility: an adaptive retrospective survey method. Transportation Letters, 1-13. https://doi.org/10.1080/19427867.2022.2038347
Gordon, L., & Olshen, R. A. (1985). Tree-structured survival analysis. Cancer treatment reports, 69(10), 1065-1069. http://europepmc.org/abstract/MED/4042086
Habib, M. A., & Miller, E. J. (2009). Reference-Dependent Residential Location Choice Model within a Relocation Context. Transportation Research Record, 2133(1), 92-99. https://doi.org/10.3141/2133-10
Handy, S. L., & Clifton, K. J. (2001). Local shopping as a strategy for reducing automobile travel. Transportation, 28(4), 317-346. https://doi.org/10.1023/A:1011850618753
Hastie, T., & Qian, J. (2014). Glmnet vignette. 9(2016), 1-30.
Hedman, L., & van Ham, M. (2012). Understanding Neighbourhood Effects: Selection Bias and Residential Mobility. In M. van Ham, D. Manley, N. Bailey, L. Simpson, & D. Maclennan (Eds.), Neighbourhood Effects Research: New Perspectives (pp. 79-99). Springer Netherlands. https://doi.org/10.1007/978-94-007-2309-2_4
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., & Van Der Laan, M. J. (2006). Survival ensembles. Biostatistics, 7(3), 355-373. https://doi.org/10.1093/biostatistics/kxj011
Jenkins, S. P. (2005). Survival analysis. Unpublished manuscript, Institute for Social and Economic Research, University of Essex, Colchester, UK, 42, 54-56.
Jović, A., Brkić, K., & Bogunović, N. (2015, 25-29 May 2015). A review of feature selection methods with applications. 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)
Kern, C., Klausch, T., & Kreuter, F. (2019). Tree-based Machine Learning Methods for Survey Research. Surv Res Methods, 13(1), 73-93.
Kim, J. H., Pagliara, F., & Preston, J. (2005). The Intention to Move and Residential Location Choice Behaviour. Urban Studies, 42(9), 1621-1636. https://doi.org/10.1080/00420980500185611
Kogalur, H. I. a. U. B. (2022). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). manual. https://cran.r-project.org/package=randomForestSRC
Landau, W. M. (2021). The targets R package: a dynamic make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software, 6(57), 2959. https://doi.org/10.21105/joss.02959
Lerman, S. R. (1975). A disaggregate behavioral model of urban mobility decisions Massachusetts Institute of Technology]. http://hdl.handle.net/1721.1/27388
Levinson, D. M. (2019). The 30-minute city: designing for access. Network Design Lab. https://hdl.handle.net/2123/21630
Li, L., Zhu, J., Zhang, H., Tan, H., Du, B., & Ran, B. (2020). Coupled application of generative adversarial networks and conventional neural networks for travel mode detection using GPS data. Transportation Research Part A: Policy and Practice, 136, 282-292. https://doi.org/10.1016/j.tra.2020.04.005
Lin, T., Wang, D., & Zhou, M. (2018). Residential relocation and changes in travel behavior: what is the role of social context change? Transportation Research Part A: Policy and Practice, 111, 360-374. https://doi.org/10.1016/j.tra.2018.03.015
Liu, D.-R., Li, H.-L., & Wang, D. (2015). Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, 12(3), 229-242. https://doi.org/10.1007/s11633-015-0893-y
Longato, E., Vettoretti, M., & Di Camillo, B. (2020). A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. Journal of Biomedical Informatics, 108, 103496. https://doi.org/10.1016/j.jbi.2020.103496
Lowry, I. S. (1964). A model of metropolis. https://apps.dtic.mil/sti/citations/tr/AD0603670
Miller, E. J. (2018). Accessibility: measurement and application in transportation planning. Transport Reviews, 38(5), 551-555. https://doi.org/10.1080/01441647.2018.1492778
Mokhtarian, P. L., & Cao, X. (2008). Examining the impacts of residential self-selection on travel behavior: A focus on methodologies. Transportation Research Part B: Methodological, 42(3), 204-228. https://doi.org/10.1016/j.trb.2007.07.006
Parmar, J., Das, P., & Dave, S. M. (2021). A machine learning approach for modelling parking duration in urban land-use. Physica A: Statistical Mechanics and its Applications, 572, 125873. https://doi.org/10.1016/j.physa.2021.125873
Pereira, R. H., Saraiva, M., Herszenhut, D., Braga, C. K. V., & Conway, M. W. (2021). r5r: rapid realistic routing on multimodal transport networks with r 5 in r. TRANSPORT FINDINGS 21262.
Pineda-Jaramillo, J., & Arbeláez-Arenas, Ó. (2022). Assessing the Performance of Gradient-Boosting Models for Predicting the Travel Mode Choice Using Household Survey Data. 148(2), 04022007. https://doi.org/doi:10.1061/(ASCE)UP.1943-5444.0000830
Prillwitz, J., Harms, S., & Lanzendorf, M. (2007). Interactions between Residential Relocations, Life Course Events, and Daily Commute Distances. Transportation Research Record, 2021(1), 64-69. https://doi.org/10.3141/2021-08
Radovic, M., Ghalwash, M., Filipovic, N., & Obradovic, Z. (2017). Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics, 18(1), 9. https://doi.org/10.1186/s12859-016-1423-9
Rashidi, T. H., & Ghasri, M. (2017). A competing survival analysis for housing relocation behaviour and risk aversion in a resilient housing market. Environment and Planning B: Urban Analytics and City Science, 46(1), 122-142. https://doi.org/10.1177/2399808317703381
Rashidi, T. H., Mohammadian, A., & Koppelman, F. S. (2011). Modeling interdependencies between vehicle transaction, residential relocation and job change. Transportation, 38(6), 909. https://doi.org/10.1007/s11116-011-9359-4
Rossi, P. H. (1955). Why families move: A study in the social psychology of urban residential mobility. Free Press. https://doi.org/10.1177/1440783396032001
Sánchez, A. C., & Andrews, D. (2011). Residential Mobility and Public Policy in OECD Countries. OECD Journal: Economic Studies, 2011(1), 1-22. https://doi.org/10.1787/eco_studies-2011-5kg0vswqt240
Sarkar, J. P., Saha, I., Sarkar, A., & Maulik, U. (2021). Machine learning integrated ensemble of feature selection methods followed by survival analysis for predicting breast cancer subtype specific miRNA biomarkers. Computers in Biology and Medicine, 131, 104244. https://doi.org/10.1016/j.compbiomed.2021.104244
Scheiner, J., & Holz-Rau, C. (2013). A comprehensive study of life course, cohort, and period effects on changes in travel mode use. Transportation Research Part A: Policy and Practice, 47, 167-181. https://doi.org/10.1016/j.tra.2012.10.019
Scheuer, S., Haase, D., Haase, A., Wolff, M., & Wellmann, T. (2021). A glimpse into the future of exposure and vulnerabilities in cities? Modelling of residential location choice of urban population with random forest. Nat. Hazards Earth Syst. Sci., 21(1), 203-217. https://doi.org/10.5194/nhess-21-203-2021
Schirmer, P. M., van Eggermond, M. A. B., & Axhausen, K. W. (2014). The role of location in residential location choice models: a review of literature. Journal of Transport and Land Use, 7(2), 3-21. https://doi.org/10.5198/jtlu.v7i2.740
Shen, Q. (2001). A Spatial Analysis of Job Openings and Access in a U.S. Metropolitan Area. Journal of the American Planning Association, 67(1), 53-68. https://doi.org/10.1080/01944360108976355
Sonabend, R., Király, F. J., Bender, A., Bischl, B., & Lang, M. (2021). mlr3proba: an R package for machine learning in survival analysis. Bioinformatics, 37(17), 2789-2791. https://doi.org/10.1093/bioinformatics/btab039
Spooner, A., Chen, E., Sowmya, A., Sachdev, P., Kochan, N. A., Trollor, J., & Brodaty, H. (2020). A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Scientific Reports, 10(1), 20410. https://doi.org/10.1038/s41598-020-77220-w
Sprumont, F., & Viti, F. (2018). The effect of workplace relocation on individuals’ activity travel behavior. Journal of Transport and Land Use, 11(1). https://doi.org/10.5198/jtlu.2018.1123
Srour, I. M., Kockelman, K. M., & Dunn, T. P. (2002). Accessibility Indices: Connection to Residential Land Prices and Location Choices. Transportation Research Record, 1805(1), 25-34. https://doi.org/10.3141/1805-04
Therneau, T., Atkinson, B., Ripley, B., & Ripley, M. B. (2015). Package ‘rpart’. https://cran.r-project.org/web/packages/rpart/rpart.pdf
Thomas, M. J., Stillwell, J. C., & Gould, M. I. (2016). Modelling the duration of residence and plans for future residential relocation: A multilevel analysis. Transactions of the Institute of British Geographers, 41(3), 297-312. https://doi.org/10.1111/tran.12123
Tran, M. T., Zhang, J., Chikaraishi, M., & Fujiwara, A. (2016). A joint analysis of residential location, work location and commuting mode choices in Hanoi, Vietnam. Journal of Transport Geography, 54, 181-193. https://doi.org/10.1016/j.jtrangeo.2016.06.003
Wachs, M., & Kumagai, T. G. (1973). Physical accessibility as a social indicator. Socio-economic Planning Sciences, 7(5), 437-456. https://doi.org/10.1016/0038-0121(73)90041-4
Wang, P., Li, Y., & Reddy, C. K. (2019). Machine Learning for Survival Analysis: A Survey. 51(6 %J ACM Comput. Surv.), Article 110. https://doi.org/10.1145/3214306
Wright, M. N., & Ziegler, A. (2015). ranger: A fast implementation of random forests for high dimensional data in C++ and R. https://doi.org/10.48550/arXiv.1508.04409
Xu, T., Gao, J., & Li, Y. (2019). Machine learning-assisted evaluation of land use policies and plans in a rapidly urbanizing district in Chongqing, China. Land Use Policy, 87, 104030. https://doi.org/10.1016/j.landusepol.2019.104030
Xue, F., & Yao, E. (2022). Adopting a random forest approach to model household residential relocation behavior. Cities, 125, 103625. https://doi.org/10.1016/j.cities.2022.103625
Yi, C., & Kim, K. (2018). A Machine Learning Approach to the Residential Relocation Distance of Households in the Seoul Metropolitan Region. 10(9), 2996. https://www.mdpi.com/2071-1050/10/9/2996
Zhang, J. (2014). Revisiting residential self-selection issues: A life-oriented approach. Journal of Transport and Land Use, 7(3), 29-45. https://doi.org/10.5198/jtlu.v7i3.460
Zhou, B., & Kockelman, K. M. (2008). Microsimulation of Residential Land Development and Household Location Choices: Bidding for Land in Austin, Texas. Transportation Research Record, 2077(1), 106-112. https://doi.org/10.3141/2077-14
Zhou, M., Le, D.-T., Nguyen-Phuoc, D. Q., Zegras, P. C., & Ferreira, J. (2021). Simulating impacts of Automated Mobility-on-Demand on accessibility and residential relocation. Cities, 118, 103345. https://doi.org/10.1016/j.cities.2021.103345
Zolfaghari, A., Sivakumar, A., & Polak, J. W. (2012). Choice set pruning in residential location choice modelling: a comparison of sampling and choice set generation approaches in greater London. Transportation Planning and Technology, 35(1), 87-106. https://doi.org/10.1080/03081060.2012.635420
Zondag, B., & Pieters, M. (2005). Influence of Accessibility on Residential Location Choice. Transportation Research Record, 1902(1), 63-70. https://doi.org/10.1177/0361198105190200108