The c-index and Its Application in Survival Analysis
The concordance index, or c-index, is a metric for evaluating how well a predictive model ranks outcomes. In survival analysis, it is commonly used for tasks such as cancer prognosis, risk stratification, and time-to-event prediction, where the model needs to rank who is likely to experience an event earlier or later.
In Python, you can compute it using the concordance_index function from the lifelines package.
Setup
Install lifelines first:
pip install lifelines
Then run the Python examples below.
The key idea behind c-index
The most important point is that c-index mainly measures whether the ranking is correct, not whether the predicted numbers are numerically close to the true values.
As a practical rule of thumb:
1.0means perfect ranking0.5is close to random ranking0.0means the order is almost completely reversed
That is why c-index is best understood as a ranking metric rather than a direct error metric.
Let’s look at a concrete example to understand its meaning. Suppose we have six patients with actual survival times of 1 month, 6 months, 12 months, 2 years, 3 years, and 5 years. If the predictions exactly match the actual values, the c-index is 1.0, indicating perfect prediction.
# Import necessary packages
import pandas as pd
from lifelines.utils import concordance_index
# Define the data
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 6, 12, 24, 36, 60]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output:
# name survive predicted
# 0 Zhang San 1 1
# 1 Li Si 6 6
# 2 Wang Wu 12 12
# 3 Zhao Er 24 24
# 4 Ma Zi 36 36
# 5 someone 60 60
# 1.0
In fact, the c-index does not depend on the actual values but rather on the ordering, making it similar to Spearman’s correlation — a non-parametric method. If we change the predicted values while preserving the order, the c-index remains 1.
Example 1:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 1.1, 1.2, 2.4, 3.6, 6]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# 1.0
Example 2:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 60, 120, 240, 360, 600]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# 1.0
However, if the order is incorrect, the c-index drops significantly.
Example 3:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 12, 6, 36, 24, 60]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output: 0.8666666666666667
Example 4: Reverse order:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [60, 36, 24, 12, 6, 1]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output: 0.0
Summary
The concordance index (c-index) is a useful metric in survival analysis for evaluating the performance of predictive models. It is sensitive to the ranking order of predictions, but insensitive to the specific numerical values. This makes it especially suitable for assessing models where rank accuracy is more important than exact value prediction.
If you are building survival models in practice, it is better to read c-index together with calibration checks and risk-group plots rather than relying on a single metric alone.
Related reading
- 原文作者:春江暮客
- 原文链接:https://www.bobobk.com/en/592.html
- 版权声明:本作品采用 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议 进行许可,非商业转载请注明出处(作者,原文链接),商业转载请联系作者获得授权。