Advances in Consumer Research
Issue 4 : 676-687
Original Article
Predicting Hospital Length of Stay Using Machine Learning and Spatial Analytics: A Large Open Health Dataset
 ,
 ,
 ,
1
PhD Scholar, IIC University of Technology, Cambodia, (Enrolment No. FNR210604).
2
Professor at Prin. L. N. Welingkar Institute of Management Developmement and Research (PGDM), Mumbai.
3
Geomatics Scientist
4
London School of Management Education
Abstract

This mixed-methods study develops and validates machine learning models for hospital length of stay (LOS) and cost prediction, integrates Geographic Information System (GIS) spatial analytics for population-level healthcare governance, and examines cross-national transferability through qualitative validation with Indian healthcare professionals. Analysing 1,048,575 inpatient discharge records from the New York State SPARCS database, Random Forest regression achieved R² = 0.41 for LOS prediction (MAE = 3.18 days, 71.3% accuracy within ±3 days) and R² = 0.79 for cost prediction. Clinical classification variables-principally APR-DRG severity-accounted for 74.61% of feature importance versus 8.21% for demographics, establishing a nine-to-one ratio that identifies classification infrastructure as the binding constraint on prediction capability. Extreme severity patients demonstrated 5.11-fold longer stays than minor cases (15.58 vs. 3.05 days), males showed 19% longer LOS than females (6.30 vs. 5.28 days, p < 0.0001), and patients aged 50+ exhibited 68% longer stays. GIS integration extended individual-level predictions to spatial governance, identifying high-burden ZIP code hotspots (~120,000 cases in ZIP 112), a 4.7-fold severity-stratified LOS gradient across geographic units, and a 2-fold county-level cost disparity (New York County ~$41,000 vs. Clinton County ~$20,000) revealing equity gaps invisible to individual-level models. Cost-effectiveness analysis yielded a dominant strategy with negative ICER of –$2,499.55 per bed-day avoided (ROI: 561,637%). Qualitative validation revealed complete APR-DRG unfamiliarity among all seven Indian healthcare professionals, exposing a structural classification gap with cascading consequences for reimbursement fairness, hospital benchmarking, and spatial resource allocation. The findings support phased adoption of severity-adjusted classification frameworks adapted to India’s disease burden, integrated with spatial analytics for district-level healthcare governanc..

 

Keywords
Recommended Articles
Original Article
Consumer Engagement with AI Generated Content: The Role of Perceived Human Likeness and Cognitive Effort in Shaping Purchase Behaviour
...
Original Article
Inclusive Migration Governance and Migrant Vulnerabilities during the COVID-19 Pandemic
...
Original Article
Assessing the Implementation of the Domestic Violence Act in Anand District, Gujarat: Successes and Challenges
Original Article
Data-Driven Analysis of Customer Retention Strategies
Loading Image...
Volume 3, Issue 4
Citations
17 Views
11 Downloads
Share this article
© Copyright Advances in Consumer Research