We are using YOLO Darknet for object detection. We are using Python3, tensorflow 1.0, numpy, opencv 3. Using yolo.weight for detection. As per given in below link : https://github.com/thtrieu/darkflow#cameravideo-file-demo
When we run it on a video, it is simultaneously detecting all the objects, which are not required?
Please guide us on how we can only detect specific class name to be searched.
Thanks
If you just follow the steps that @JP Kim has mentioned - you will get a video out with just your labels, however it would also output other objects as one of your labels.
There's a specific section of the darkflow repo which exactly tells what to do if you wish to have a different output. TLDR - you should retrain your model. They show this by taking up an example of 3 classes.
But, let me walk you through the process anyway. Let's consider that you have a video and you just need to track all people in it. So, we only need to track 1 type of object - 'person'.
We make a copy of tiny-yolo-voc.cfg
file in cfg
directory. Let's follow their convention and name this tiny-yolo-voc-1c.cfg
where suffix 1c
represents the number of classes. The reason for choosing tiny-yolo-voc
and not some other config as our base model is it's a smaller network which is trainable on smaller GPUs. From what I've observed other configs require 10GB+ Graphic memory and they used to make my machine go out of memory.
We'll make required changes in tiny-yolo-voc-1c.cfg
file :
classes
variable to classes=1
convolutional
section just before region
, we will change filter
variable to 5 * (num_class + 5) = 5 * (1+5) = 30
. So, set filters=30
We'll edit the labels.txt
file in the darkflow source directory and have only 1 line inside it which says person
, since we need only 1 type of label.
Now, we need to train our model. However, for training, we first require dataset to be there.
Now, if your label is one of the existing label of VOC datset or CoCo dataset, then you could just download one of VOC / Coco datsets. In our case person
is the type of object we need to track, and that is already a type of object in VOC dataset. So, we'll use VOC dataset.
However, if you wish to use YOLO to classify and track a new type of object, then you need to prepare your own dataset and annotations. For this custom object purpose, you could follow the part 5-8 of this youtube video series. These videos showcase an example of how to use YOLO to track and classify fidget_spinner
.
Download VOC Dataset because it contains enough data and annotations for our type of object person
# Download the Pascal VOC dataset:
curl -O https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar
tar xf VOCtest_06-Nov-2007.tar
We're not going to train from scratch. So, we're actually going to load weights for tiny-yolo-voc
model and start re-training from there, for our specific use-case (just person
class). For this, we musthave weights downloaded for tiny-yolo-voc
. You can find the weights here for YOLO v2. We will download the weights for Tiny YOLO for VOC dataset. Move the file to /darkflow/bin/
directory after downloading.
Once we've downloaded this, it's necessary to have the base model config file and the weight file having same name. Since renaming config is not a good idea, we will rename the weights we've downloaded from yolov2-tiny-voc.weights
to tiny-yolo-voc.weights
. This is required because when we train, we provided weights file and darkflow tries to pick up the corresponding config file as a reference for training the new model.
This is also mentioned on the darkflow repo page :
When darkflow sees you are loading tiny-yolo-voc.weights it will look for tiny-yolo-voc.cfg in your cfg/ folder and compare that configuration file to the new one you have set with --model cfg/tiny-yolo-voc-1c.cfg. In this case, every layer will have the same exact number of weights except for the last two, so it will load the weights into all layers up to the last two because they now contain different number of weights.
Now, we can train our model. You can remove the --gpu 0.9
part if you do not have a GPU to train this.
# Train the net on the Pascal dataset:
flow --model cfg/tiny-yolo-voc-1c.cfg --load bin/tiny-yolo-voc.weights --train --dataset "~/VOCdevkit/VOC2007/JPEGImages" --annotation "~/VOCdevkit/VOC2007/Annotations" --gpu 0.9
Hit Ctrl+C to end training when you think the loss is not reducing anymore. Usually a good loss / ave loss is 1 or below 1.
You would have noted that after every 250 steps, darkflow will keep saving checkpoints in the ckpt/
directory. Once you stop training, you can use any of these checkpoints to test out your model.
We will run it on a video of people and let it save a new video with bounding box predictions. Let's use 1500th step checkpoint for this example.
flow --model cfg/tiny-yolo-voc-1c.cfg --load 1500 --demo video-input.mp4 --gpu 0.9 --saveVideo
When you run this, it would show the FPS at which the model is able to work with your video. This could vary depending upon your machine. Depending upon the FPS and the length of the video, it could take some time for this to finish. Once the process is finished, you'll have a video.avi
created in darkflow/
directory.
This should only have person
type of objects detected in the video.
If the output isn't great, you can train your model further, and/or vary thresholds, or other parameters to get better outputs.
Hope this helps.