Vision Transformers (ViT)

Transformers have been the de-facto for NLP tasks, various pretrained models are available for translation, text generation, summarization and more. The models can be downloaded and fine tuned in your deep learning framework of choice as it plays nicely with Tensorflow, Pytorch and Jax. Transformers aren’t just for text any more- they can handle a […]

Spatial Transformers

Spatial Transformer The spatial transformer module consists of layers of neural networks that can spatially transform an image. These spatial transformations include cropping, scaling, rotations, and translations etc CNNs perform poorly when the input data contains so much variation. One of the solutions to this is the max-pooling layer. But then again, max-pooling layers do […]

YoloV3 – Training Custom Dataset

Recently, while exploring computer vision got a chance to train YoloV3 on custom dataset for object detection. We custom trained the YOLO V3 to detect following classes:– hardhat– vest– mask– boots Below is a short video demonstrating how amazingly the model is able to detect these objects. Code for this can be found here.