Interpreting Cox Proportional Hazards Model Using Colon Dataset in R

Cox proportional hazards model is used to determine significant predictors for outcomes that are time-to-event. It is especially relevant in disciplines such as oncology, where outcomes are usually time-to-event (e.g overall survival and disease-free survival). Due to the complex nature of time-to-event outcome which involves censoring as well as both continuous and categorical components, it may be difficult to understand how to interpret the model initially. Hence, similar to what I did in my previous article on logistic regression, I would examine how to interpret R outputs for cox proportional hazards model as well as a test for proportional hazard assumption in the model.

Glossary for statistical terms for the later section:

Overall Survival: This is usually defined as from time to randomization to time to event, or last follow up if no event was observed throughout the period.
Hazard ratio: Similar to how odds is used in logistic regression, the equivalent for odds in cox proportional hazard model is hazard. The hazard ratio look into comparing the hazards occurring in one group in relation to the reference group (Eg. the experimental regimen vs standard treatment). Hazard ratio is the exponential form of the coefficients obtained in the Cox proportional hazard model.
Log rank test: This is to test for the overall difference in survival probability among the groups compared.

For the dataset, I will be using the colon dataset from the _survival _package. The data was collected from a clinical trial, which tested on the use of adjuvant chemotherapy regimens (Levamisole and Levamisole + 5-FU) for patients with colon cancer. While there are several variables in the dataset, we will be focusing on these variables to build the Cox proportional hazard model:

Treatment (rx): Whether patient was under Observation, Levamisole and Levamisole + 5-FU
Differentiation of the tumor (differ): Whether the tumor was considered as well differentiated, moderately differentiated or poorly differentiated
Nodal involvement (node4): Whether there were more than 4 nodes involved
Sex (sex): Whether patient was male or female
Time to event (time): Time taken for the event to occur, in this case death
Censoring status (censor): Whether the event was censored or not

#r-programming #regression #survival-analysis #rstudio #data analysis

medium.com

Interpreting Cox Proportional Hazards Model Using Colon Dataset in R