| Author: Liuan LU | Date: 2026-04-28 |
Due to the presence of “right-censored” data (customers who haven’t churned by the end of the observation period), traditional regression models fall short. In this project, we implemented a complete survival analysis pipeline:
subscription_start and subscription_end, calculating the duration in days. We encoded the event status where is_churn = 1 (churn event occurred) and 0 (right-censored).monthly_fee, auto_renew, and age to calculate the Hazard Ratio (HR), quantifying how specific variables accelerate or decelerate the churn rate.Analysis Record: The K-M survival curve reveals a steady decline in retention probability over time. During the first 200 days, the decline is relatively gentle, maintaining a retention rate above 80%. Observing the median threshold (50% survival rate), we found the Median Survival Time is approximately 420 days. This means half of the customer base is expected to churn after about 420 days of subscription.
Analysis Record: * Model Evaluation: The Concordance Index is 0.52, indicating a limited discriminative ability of the selected covariates in the current sample distribution.
monthly_fee (0.06), auto_renew (0.44), and age (0.82) are all > 0.05, meaning they do not reach conventional statistical significance.exp(coef) values are extremely close to 1.00. Overall, in this dataset, monthly fee, auto-renew status, and age do not have a statistically significant accelerating or decelerating impact on the survival cycle.