Python Problem with scikit-learn metrics in K-fold cross validation

BRN · Apr 24, 2023

Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
[CODE lang="python" title="K-Fold cross validation"]def kf_validation(images_path_list):

kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)

model = inceptionV3()

acc_list = []
mse_list = []
mae_list = []
auc_list = []

for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):

print('==================================================================')
print(f'-----------------------FOLD {fold + 1}, -------------------------')
print('==================================================================')

dataset = get_dataset(images_path_list, split_dataset = False)
train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)

model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)

y_true = [label for _, label in test_ds]
y_true = merge_tensors(y_true)
y_pred = model.predict(test_ds)
y_pred = tf.argmax(y_pred, axis = 1)
y_true = tf.argmax(y_true, axis = 1)

results = calc_metrics(y_true, y_pred)
acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])

print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
print(f'average ACC: {np.mean(acc_list):.3f}')
print(f'average MSE: {np.mean(mse_list):.3f}')
print(f'average MAE: {np.mean(mae_list):.3f}')
print(f'average AUC: {np.mean(auc_list):.3f}')
print('--------------------------------------------------------------------------------')

return y_true, y_pred[/CODE]

[CODE lang="python" title="scikit-learn metrics"]def calc_metrics(y_true, y_pred):

acc = accuracy_score(y_true, y_pred, normalize = True)
mse = mean_squared_error(y_true, y_pred)
mae = mean_absolute_error(y_true, y_pred)

fpr, tpr, thresholds = roc_curve(y_true, y_pred)
auc_val = auc(fpr, tpr)

print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

return acc, mse, mae, auc_val[/CODE]

For example, for the first fold, the result is this

[CODE title="first fold"]==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271, [/CODE]

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.

BRN · Apr 24, 2023

Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.

Python Problem with scikit-learn metrics in K-fold cross validation

Thread 'For those who ask: "What programming language should I learn?"'

How to increase phone signal strength by lying about it

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

How to calculate Tension for a series of connected points?

Learning Assembly and computer architecture for x86

Sequential Analog Computers?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers