How to calculate Harrell’s c-index to evaluate multivariate model with EXCEL VBA?

Pocket

You may use Akaike information criterion (AIC) to evaluate fitting of multivariate model. You could use c-index that Harrell have proposed. Although it seems to evaluate fitting of present data set, it seems not to consider about future data set, it might result in overfitting to present data set.

  1. Make pair from data set with proportional hazard analysis
  2. Calculate risk score
  3. Compare risk score and survival time between pairs
  4. Calculate c-index

1. Make pair from data set with proportional hazard analysis

It’s assumed that sample size is N, the number of pairs could be calculated following formula.

\displaystyle _{N}C_{2} = \frac{N!}{(N-2)!2!}

It’s assumed that worksheet’s structure follows the list below.

  • The 1st line is title.
  • The 1st column is survival time, the 2nd is outcome, the 3rd is a risk score of model 1, the 4th is a risk score of model 2 and the 5th is a risk score of model 3, respectively.
  • All data type is numerical.
  • In outcome, 0 is death and 1 is censored, respectively.
Option Explicit

Sub C_Statistics()
    Dim i   As Long
    Dim j   As Long
    Dim k   As Long
    Dim Rng As Range
    Dim Ar  As Variant
    k = 0
    Set Rng = ActiveSheet.UsedRange
    Set Rng = Rng.Resize(Rng.Rows.Count - 1).Offset(1)
    Ar = Rng
    For i = LBound(Ar) To UBound(Ar) - 1
        For j = i + 1 To UBound(Ar)
            k = k + 1
        Next j
    Next i
    Debug.Print "k= " & k
End Sub

2. Calculate risk score

Risk score (R) is calculated as following formula.

\displaystyle R = \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n \displaystyle S(t, X) = S_0(t)^{\exp(R-R_0)}

A point estimated of effect size in COX proportional hazard analysis is hazard ratio (Exp(Β)) and regression coefficient of covariate is logarithm of hazard ratio (Β). It’s assumed that risk score has been calculated.

3. Compare risk score and survival time between both of pair

It’s important that “If both of pair was censored or one of pair was censored and survival time of censored is short, they were classified as unknown”. In other words,

  • Accept the pair both of it is death
  • If one of pair is death and the survival time of death is shorter than the censored, accept it.

It’s as following VBA code. It’s assumed that it doesn’t includes equal sign if both survival time of pair were equal.

            Select Case Ar(i, 2) + Ar(j, 2)
            Case 0
                k = k + 1
            Case 1
                If (Ar(i, 1) - Ar(j, 1)) * (Ar(i, 2) - Ar(j, 2)) > 0 Then
                    k = k + 1
                End If
            End Select

Furthermore, you would compare risk score and survival time between both of pair and evaluate the sign of product of the differentiation of risk score and the differentiation of survival time, respectively. It means that whether the magnitude of risk score and the length of survival time are consistent or not. It’s assumed that lower risk score means longer survival time.

Option Explicit

Sub C_Statistics()
    Dim i   As Long
    Dim j   As Long
    Dim k   As Long
    Dim n1  As Long
    Dim n2  As Long
    Dim n3  As Long
    Dim Rng As Range
    Dim Ar  As Variant
    
    k = 0
    n1 = 0
    n2 = 0
    n3 = 0
    Set Rng = ActiveSheet.UsedRange
    Set Rng = Rng.Resize(Rng.Rows.Count - 1).Offset(1)
    Ar = Rng
    For i = LBound(Ar) To UBound(Ar) - 1
        For j = i + 1 To UBound(Ar)
            Select Case Ar(i, 2) + Ar(j, 2)
            Case 0
                If (Ar(i, 1) - Ar(j, 1)) * (Ar(i, 3) - Ar(j, 3))  0 Then
                    If (Ar(i, 1) - Ar(j, 1)) * (Ar(i, 3) - Ar(j, 3)) 

The sign of 35th line is larger than 0, it's assumed that censor is 1 and death is 0, would be reversed if censor was 0 and death was 1. The signs of 24th, 27th, 30th, 36th, 39th and 42nd would be reversed if it was assumed that higher risk score means longer survival time.

4. Calculate c-index

n1/k, n2/k and n3/k are c-index of model 1, model 2 and model 3, respectively. c-index ranges between 0 and 1. If c-index is 0.5, it means that the model doesn't fit at all. If it's closer to 0 or 1, it means that the model fits better.

Draw a pair of patients and determine which patient lived longer from his baseline evaluation. Survival times can be validly compared either when both patients have died, or when one has died and the other's followup time has exceeded the survival time of the first. If both patients are still alive, which will live longer is not known, and that pair of patients is not used in the analysis. Otherwise, it can be determined whether the patient with the higher prognostic score (ie, the weighted combination of baseline and test variables used to predict survival) also had the longer survival time. The process is repeated until all possible pairs of patients have been examined. Of the pairs of patients for which the ordering of survival time s could be inferred, the fraction of pairs such that the patient with the higher score had the longer survival time will be denoted by c.

The index c estimates the probability that, of two randomly chosen patients, the patient with the higher prognostic score will outlive the patient with the lower prognostic score. Values of c near .5 indicate that the prognostic score is no better than a coin-flip in determining which patient will live longer. Values of c near 0 or 1 indicate the baseline data virtually always determine which patient has a better prognosis. The c index measures a probability; many clinicians are more used to dealing with a correlation index that ranges from -1 to +1. A Kendall or Goodman-Kruskal type of correlation index can easily be constructed by calculating γ = 2(c - .5), where γ is the estimated probability that the prognostic score correctly orders prognosis for a pair of patients minus the probability that it incorrectly orders prognosis. When the prognostic score is unrelated to survival time, gamma is zero. When gamma = .5, the relationship between the prognostic score and survival time is halfway between a random relationship and a perfect relationship, and the corresponding c value is .75.

References:
Frank E. Harrell Jr, et al: Evaluating the Yield of Medical Tests. JAMA. 1982; 247 (18): 2543 - 2546
Morizane Toshio: Multivariate model, International Medical Information Center 2008; 29 (3): 8 - 12

Pocket

投稿者: admin

趣味:写真撮影とデータベース. カメラ:TOYO FIELD, Hasselblad 500C/M, Leica M6. SQL Server 2008 R2, MySQL, Microsoft Access.

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です