Training YOLOv9 on custom dataset

  • Context: Comp Sci 
  • Thread starter Thread starter falyusuf
  • Start date Start date
Click For Summary
SUMMARY

This discussion focuses on training the YOLOv9 model on a custom dataset using the Roboflow API. The user successfully cloned the YOLOv9 repository, installed necessary dependencies, and downloaded pre-trained weights. However, an IndexError occurred during training, specifically indicating that an index was out of range in the DataLoader process. The issue likely stems from the dataset's label or image indexing, which needs to be verified for consistency.

PREREQUISITES
  • Familiarity with YOLOv9 model architecture and training procedures
  • Understanding of Python programming and package management with pip
  • Experience with Roboflow API for dataset management
  • Knowledge of PyTorch DataLoader and indexing mechanisms
NEXT STEPS
  • Verify the integrity and structure of the dataset labels and images in the Roboflow project
  • Learn about debugging DataLoader issues in PyTorch, focusing on IndexError handling
  • Explore YOLOv9 training parameters and their impact on dataset requirements
  • Investigate the use of data augmentation techniques in YOLOv9 to enhance training data
USEFUL FOR

Machine learning practitioners, computer vision engineers, and data scientists interested in training YOLOv9 on custom datasets and troubleshooting related issues.

falyusuf
Messages
35
Reaction score
3
Homework Statement
I want to use the Python language in Kaggle to train YOLOv9 on a custom dataset containing three folders: train, test, and valid. Each folder has two subfolders: images and labels. The dataset is installed from Roboflow, using the YOLOv9 model. However, I encountered an error during the training process.

Below, I have attached my code and the error message.
Relevant Equations
-
[CODE lang="python" title="training YOLOv9"]!git clone https://github.com/SkalskiP/yolov9.git
%cd yolov9
!pip3 install -r requirements.txt -q

import os
Home = os.getcwd()
print(Home)

!mkdir -p /kaggle/working/yolov9/weights
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-e.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-c.pt
!wget -P /kaggle/working/yolov9/weights -q https://github.com/WongKinYiu/yolov9/releases/download/v0.1/gelan-e.pt

%cd /kaggle/working/yolov9

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="#my_api")
project = rf.workspace("egat-43h2x").project("parking_lot-1lqk2")
version = project.version(1)
dataset = version.download("yolov9")

[/CODE]
The dataset has been uploaded.

[CODE lang="python" title="training YOLOv9"]%cd /kaggle/working/yolov9

!python train.py \
--batch 16 --epochs 50 --img 640 --min-items 0 --close-mosaic 15 \
--data /kaggle/input/parkingspacesyolov9/data.yaml\
--weights /kaggle/working/yolov9/weights/gelan-c.pt \
--cfg /kaggle/working/yolov9/models/detect/gelan-c.yaml \
--hyp hyp.scratch-high.yaml
[/CODE]

I am getting this error:
Logging results to runs/train/exp
Starting training for 50 epochs...

Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
0%| | 0/84 00:00
Traceback (most recent call last):
File "/kaggle/working/yolov9/train.py", line 634, in <module>
main(opt)
File "/kaggle/working/yolov9/train.py", line 528, in main
train(opt.hyp, opt, device, callbacks)
File "/kaggle/working/yolov9/train.py", line 277, in train
for i, (imgs, targets, paths, _) in pbar: # batch -------------------------------------------------------------
File "/opt/conda/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
for obj in iterable:
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/kaggle/working/yolov9/utils/dataloaders.py", line 656, in __getitem__
img, labels = self.load_mosaic(index)
File "/kaggle/working/yolov9/utils/dataloaders.py", line 791, in load_mosaic
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
File "/kaggle/working/yolov9/utils/augmentations.py", line 248, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range

How to fix?
 
Physics news on Phys.org
falyusuf said:
Traceback (most recent call last):
<snip>
File "/kaggle/working/yolov9/utils/augmentations.py", line 248, in copy_paste
l, box, s = labels[j], boxes[j], segments[j]
IndexError: list index out of range
The above is the last line you showed, so represents the most recent call. It looks to me like the index j is out of range for one or more of the lists.
 
Mark44 said:
The above is the last line you showed, so represents the most recent call. It looks to me like the index j is out of range for one or more of the lists.
I have copied the code snippet for uploading the dataset directly from Roboflow for the YOLOv9 model. I installed YOLOv9 and its requirements as mentioned on the official website. Where might the problem be?
 
falyusuf said:
I have copied the code snippet for uploading the dataset directly from Roboflow for the YOLOv9 model. I installed YOLOv9 and its requirements as mentioned on the official website. Where might the problem be?
Mark just told you. Have you checked the indices and their ranges yet?
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
Replies
3
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K