Problem with scikit-learn metrics in K-fold cross validation

  • Context: Python 
  • Thread starter Thread starter BRN
  • Start date Start date
  • Tags Tags
    Cross
Click For Summary
SUMMARY

The forum discussion addresses discrepancies between training accuracy and average accuracy calculated using Scikit-Learn metrics during K-Fold Cross Validation with a model based on InceptionV3. The user implemented K-Fold Cross Validation using Scikit-Learn's KFold class and reported an average accuracy of 0.51, despite training accuracies ranging from 0.70 to 0.90. The issue arises from the difference in how training accuracy and validation accuracy are computed, with the latter reflecting performance on unseen data, leading to lower values.

PREREQUISITES
  • Familiarity with K-Fold Cross Validation in machine learning
  • Understanding of Scikit-Learn metrics, specifically accuracy_score
  • Knowledge of TensorFlow and model training using InceptionV3
  • Basic understanding of performance metrics like MSE, MAE, and AUC
NEXT STEPS
  • Explore the differences between training accuracy and validation accuracy in machine learning models
  • Learn how to implement K-Fold Cross Validation using Scikit-Learn's KFold class
  • Investigate the impact of hyperparameters on model performance metrics
  • Study the use of TensorFlow for model evaluation and metric calculation
USEFUL FOR

Data scientists, machine learning practitioners, and anyone involved in model evaluation and performance optimization using Scikit-Learn and TensorFlow.

BRN
Messages
107
Reaction score
10
Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
[CODE lang="python" title="K-Fold cross validation"]def kf_validation(images_path_list):

kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)

model = inceptionV3()

acc_list = []
mse_list = []
mae_list = []
auc_list = []

for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):

print('==================================================================')
print(f'-----------------------FOLD {fold + 1}, -------------------------')
print('==================================================================')

dataset = get_dataset(images_path_list, split_dataset = False)
train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)

model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)

y_true = [label for _, label in test_ds]
y_true = merge_tensors(y_true)
y_pred = model.predict(test_ds)
y_pred = tf.argmax(y_pred, axis = 1)
y_true = tf.argmax(y_true, axis = 1)

results = calc_metrics(y_true, y_pred)
acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])

print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
print(f'average ACC: {np.mean(acc_list):.3f}')
print(f'average MSE: {np.mean(mse_list):.3f}')
print(f'average MAE: {np.mean(mae_list):.3f}')
print(f'average AUC: {np.mean(auc_list):.3f}')
print('--------------------------------------------------------------------------------')

return y_true, y_pred[/CODE]

[CODE lang="python" title="scikit-learn metrics"]def calc_metrics(y_true, y_pred):

acc = accuracy_score(y_true, y_pred, normalize = True)
mse = mean_squared_error(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)

fpr, tpr, thresholds = roc_curve(y_true, y_pred)
auc_val = auc(fpr, tpr)

print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

return acc, mse, mae, auc_val[/CODE]

For example, for the first fold, the result is this

[CODE title="first fold"]==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271, [/CODE]

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.
 
Technology news on Phys.org
Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.