春江暮客

春江暮客的个人学习分享网站

The c-index and Its Application in Survival Analysis

2021-12-23 Technology
The c-index and Its Application in Survival Analysis

The concordance index, or c-index, is a metric for evaluating how well a predictive model ranks outcomes. In survival analysis, it is commonly used for tasks such as cancer prognosis, risk stratification, and time-to-event prediction, where the model needs to rank who is likely to experience an event earlier or later.

In Python, you can compute it using the concordance_index function from the lifelines package.

Setup

Install lifelines first:

pip install lifelines

Then run the Python examples below.

The key idea behind c-index

The most important point is that c-index mainly measures whether the ranking is correct, not whether the predicted numbers are numerically close to the true values.

As a practical rule of thumb:

  1. 1.0 means perfect ranking
  2. 0.5 is close to random ranking
  3. 0.0 means the order is almost completely reversed

That is why c-index is best understood as a ranking metric rather than a direct error metric.

Let’s look at a concrete example to understand its meaning. Suppose we have six patients with actual survival times of 1 month, 6 months, 12 months, 2 years, 3 years, and 5 years. If the predictions exactly match the actual values, the c-index is 1.0, indicating perfect prediction.

# Import necessary packages
import pandas as pd

from lifelines.utils import concordance_index

# Define the data
df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 6, 12, 24, 36, 60]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output:
#       name  survive  predicted
# 0  Zhang San        1          1
# 1     Li Si        6          6
# 2    Wang Wu       12         12
# 3    Zhao Er       24         24
# 4     Ma Zi       36         36
# 5   someone       60         60
# 1.0

In fact, the c-index does not depend on the actual values but rather on the ordering, making it similar to Spearman’s correlation — a non-parametric method. If we change the predicted values while preserving the order, the c-index remains 1.

Example 1:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 1.1, 1.2, 2.4, 3.6, 6]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# 1.0

Example 2:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 60, 120, 240, 360, 600]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# 1.0

However, if the order is incorrect, the c-index drops significantly.

Example 3:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 12, 6, 36, 24, 60]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output: 0.8666666666666667

Example 4: Reverse order:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [60, 36, 24, 12, 6, 1]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output: 0.0

Summary

The concordance index (c-index) is a useful metric in survival analysis for evaluating the performance of predictive models. It is sensitive to the ranking order of predictions, but insensitive to the specific numerical values. This makes it especially suitable for assessing models where rank accuracy is more important than exact value prediction.

If you are building survival models in practice, it is better to read c-index together with calibration checks and risk-group plots rather than relying on a single metric alone.

友情链接

其它