Problem with scikit-learn metrics in K-fold cross validation

  • Python
  • Thread starter BRN
  • Start date
  • Tags
    Cross
In summary, the conversation discusses the implementation of a K-Fold Cross Validation and the calculation of accuracy using Scikit-Learn metrics. There is a difference between the accuracy calculated during each training and the average accuracy calculated through the metrics. The code for the K-Fold Cross Validation and the calculation of metrics is shown, and the conversation includes a discussion about the accuracy being lower than expected. There is a query about the discrepancy between the accuracy displayed during training and the one calculated using Scikit-Learn metrics.
  • #1
BRN
108
10
Hello everyone,
In my implementation of a K-Fold Cross Validation, I find a difference between the accuracy calculated during each training and the average accuracy calculated through the metric functions of Scikit-Learn.

This is my code for the K-Fold Cross Validation and for the calculation of metrics.
K-Fold cross validation:
def kf_validation(images_path_list):
    
    kfold = KFold(n_splits = NUM_FOLDS, shuffle = True, random_state = 42)
    
    model = inceptionV3()
    
    acc_list = []
    mse_list = []
    mae_list = []
    auc_list = []
    
    for fold, (train_index, test_index) in enumerate(kfold.split(images_path_list)):
        
        print('==================================================================')
        print(f'-----------------------FOLD {fold + 1}, -------------------------')
        print('==================================================================')
        
        dataset = get_dataset(images_path_list, split_dataset = False)
        train_ds = dataset.skip(len(test_index)).batch(BATCH_SIZE)
        test_ds = dataset.skip(len(train_index)).take(len(test_index)).batch(BATCH_SIZE)
        
        model.fit(train_ds, epochs = NUM_EPOCHS, validation_data = test_ds, verbose = 1)
        
        y_true = [label for _, label in test_ds]
        y_true = merge_tensors(y_true)
        y_pred = model.predict(test_ds)
        y_pred = tf.argmax(y_pred, axis = 1)
        y_true = tf.argmax(y_true, axis = 1)       
        
        results = calc_metrics(y_true, y_pred)
        acc_list, mse_list, mae_list, auc_list = zip([(results[0], results[1], results[2], results[3]) for _ in range(4)])
    
    print('----------------AVERAGES METRICS AFTER ', NUM_FOLDS,' FOLDS---------------------')
    print(f'average ACC: {np.mean(acc_list):.3f}')
    print(f'average MSE: {np.mean(mse_list):.3f}')
    print(f'average MAE: {np.mean(mae_list):.3f}')
    print(f'average AUC: {np.mean(auc_list):.3f}')
    print('--------------------------------------------------------------------------------')

    return y_true, y_pred

scikit-learn metrics:
def calc_metrics(y_true, y_pred):
    
    acc = accuracy_score(y_true, y_pred, normalize = True)
    mse = mean_squared_error(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)
    
    fpr, tpr, thresholds = roc_curve(y_true, y_pred)
    auc_val = auc(fpr, tpr)
    
    print(f'-- ACC={acc}, MSE={mse}, MAE={mae}, AUC={auc_val}, --')

    return acc, mse, mae, auc_val

For example, for the first fold, the result is this

first fold:
==================================================================
-----------------------FOLD 1, -------------------------
==================================================================
Epoch 1/10
19/19 [==============================] - 55s 486ms/step - loss: 1.7101 - accuracy: 0.7681 - val_loss: 8.6331 - val_accuracy: 0.4371
Epoch 2/10
19/19 [==============================] - 9s 355ms/step - loss: 0.4822 - accuracy: 0.7729 - val_loss: 8.0062 - val_accuracy: 0.4780
Epoch 3/10
19/19 [==============================] - 9s 379ms/step - loss: 0.4996 - accuracy: 0.8312 - val_loss: 7.9097 - val_accuracy: 0.4843
Epoch 4/10
19/19 [==============================] - 9s 357ms/step - loss: 0.3882 - accuracy: 0.8738 - val_loss: 7.9579 - val_accuracy: 0.4811
Epoch 5/10
19/19 [==============================] - 9s 355ms/step - loss: 0.6962 - accuracy: 0.9085 - val_loss: 5.0147 - val_accuracy: 0.5283
Epoch 6/10
19/19 [==============================] - 9s 359ms/step - loss: 1.1334 - accuracy: 0.8896 - val_loss: 2.6646 - val_accuracy: 0.7453
Epoch 7/10
19/19 [==============================] - 9s 349ms/step - loss: 0.3623 - accuracy: 0.9401 - val_loss: 4.8177 - val_accuracy: 0.6792
Epoch 8/10
19/19 [==============================] - 9s 354ms/step - loss: 0.3827 - accuracy: 0.9401 - val_loss: 3.4249 - val_accuracy: 0.7767
Epoch 9/10
19/19 [==============================] - 8s 347ms/step - loss: 0.3808 - accuracy: 0.9180 - val_loss: 4.9683 - val_accuracy: 0.7107
Epoch 10/10
19/19 [==============================] - 9s 361ms/step - loss: 0.3999 - accuracy: 0.9148 - val_loss: 9.7832 - val_accuracy: 0.3396
10/10 [==============================] - 4s 68ms/step
-- ACC=0.5125786163522013, MSE=0.48742138364779874, MAE=0.48742138364779874, AUC=0.5153103611979271,

How is it possible to obtain an average accuracy of 0.51 when in the training is in the range 0.70 - 0.90?

Does anyone have an explanation?

Thanks.
 
Technology news on Phys.org
  • #2
Maybe I explained myself wrong. I wonder why the accuracy calculated with the Scikit-Learn metrics is not comparable with the one displayed during training.
 

1. What is K-fold cross validation?

K-fold cross validation is a technique used to evaluate the performance of a machine learning model. It involves splitting the data into k subsets, using k-1 subsets for training and the remaining subset for testing. This process is repeated k times, with each subset being used as the test set exactly once. The results are then averaged to give a single estimate of the model's performance.

2. What is the purpose of using K-fold cross validation?

The purpose of using K-fold cross validation is to get a more accurate estimate of a model's performance by reducing the bias that can occur when using a single training and testing set. It also helps to make the most out of the available data by using all of it for both training and testing.

3. What are the potential problems with using scikit-learn metrics in K-fold cross validation?

One potential problem is that the default scoring metrics used in scikit-learn may not be appropriate for a specific dataset or problem. Another issue is that K-fold cross validation can be computationally expensive, especially with large datasets and complex models.

4. How can I address these problems?

To address the first problem, it is important to carefully consider which metrics are most relevant for the specific problem at hand and use those instead of relying on the default metrics. For the second problem, one option is to use a stratified K-fold cross validation, where the data is split in a way that maintains the same class distribution in each subset. This can help to improve the performance and computational efficiency of the model.

5. Are there any alternatives to K-fold cross validation for evaluating a model's performance?

Yes, there are other techniques such as holdout validation, random subsampling validation, and leave-one-out cross validation. Each of these methods has its own advantages and disadvantages, and the choice of which one to use will depend on the specific dataset and problem being addressed.

Similar threads

  • Programming and Computer Science
Replies
7
Views
6K
Back
Top