Problem with scikit-learn metrics in K-fold cross validation

BRN · Apr 24, 2023

Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
[CODE lang="python" title="K-Fold cross validation"]def kf_validation(images_path_list):

kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)

model = inceptionV3()

acc_list = []
mse_list = []
mae_list = []
auc_list = []

for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):

print('==================================================================')
print(f'-----------------------FOLD {fold + 1}, -------------------------')
print('==================================================================')

dataset = get_dataset(images_path_list, split_dataset = False)
train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)

model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)

y_true = [label for _, label in test_ds]
y_true = merge_tensors(y_true)
y_pred = model.predict(test_ds)
y_pred = tf.argmax(y_pred, axis = 1)
y_true = tf.argmax(y_true, axis = 1)

results = calc_metrics(y_true, y_pred)
acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])

print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
print(f'average ACC: {np.mean(acc_list):.3f}')
print(f'average MSE: {np.mean(mse_list):.3f}')
print(f'average MAE: {np.mean(mae_list):.3f}')
print(f'average AUC: {np.mean(auc_list):.3f}')
print('--------------------------------------------------------------------------------')

return y_true, y_pred[/CODE]

[CODE lang="python" title="scikit-learn metrics"]def calc_metrics(y_true, y_pred):

acc = accuracy_score(y_true, y_pred, normalize = True)
mse = mean_squared_error(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)

fpr, tpr, thresholds = roc_curve(y_true, y_pred)
auc_val = auc(fpr, tpr)

print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

return acc, mse, mae, auc_val[/CODE]

For example, for the first fold, the result is this

[CODE title="first fold"]==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271, [/CODE]

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.

BRN · Apr 24, 2023

Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.

Problem with scikit-learn metrics in K-fold cross validation

How to increase phone signal strength by lying about it

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

Star maps using Blender

Other than just FizzBuzz to test programmer candidates

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight