Portfolio

Residual Network (ResNet) for Identifying the Digit Represented by a Hand Sign

One of the challenges of training a very deep neural network is the vanishing and exploding gradient types of problems. When we make the network deeper and deeper, it becomes very difficult for it to choose parameters that learn even the identity function. So, the performance downgrades as the network get deeper. This problem can be significantly reduced by using Residual Networks (ResNets). ResNets use skip connections that take the activation from one layer and feed it to another layer much deeper in the network. It reduces the vanishing gradient problem, and the deeper layers can easily learn the identity function, which ensures that the performance will not degrade in deeper layers.
The project is about classifying the digit represented by a hand sign. The dataset consists of 1080 hand images for training and 120 images for validation.

Visual representation of the different classes. Credit: DeepLearning.AI
I created the necessary building blocks to train a ResNet-50 model from scratch. Here is a summary of the model architecture:

Credit: DeepLearning.AI, Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun - Deep Residual Learning for Image Recognition (2015)
The convolution and identity block has three convolution layers each, with convolution blocks having an additional convolution layer in a shortcut path to make sure the dimensions match up for the later stages. I trained the model for eight epochs and it achieved around 94% validation accuracy! Here is the training history:


I used the model to classify an image of hand sign!

A Deep Convolutional Neural Network (CNN) to Classify Photos of Dogs and Cats

Using dataset from a Kaggle machine learning competition, I wrote an algorithm to classify whether images contain either a dog or a cat. The Kaggle competition provided 25,000 labeled photos, which included 12,500 dog photos and an equal number of cat photos. The dataset was originally developed by Microsoft. I used Keras API to solve this problem. I took two approaches: (1) train a model from scratch (2) use transfer learning to train a pretrained model. For the first approach, I followed the general architectural principles of the VGG models. I stacked three convolutional layers with 3×3 filters followed by a max pooling layer to create each block and repeated the process to create a three block network. Other important hyperparameters involved ‘same’ padding, 0.001 learning rate, and ‘binary cross-entropy’ as the loss function. For the second approach, I loaded a VGG-16 model, removed the fully connected layers, froze the weights of all of the convolutional layers, and trained new fully connected layers using the Kaggle dataset. The second approach produced a higher accuracy in a shorter time.

Training history for the two approaches.

I classified photos of some of my co-worker’s pets using the 2nd model! Credit: Kaggle, Machine Learning Mastery, Microsoft

Object Detection using YOLO model

YOLO (You Only Look Once) is a state-of-the-art object detection model that is fast and produces results with high accuracy. This algorithm was trained to run on 608x608 images and requires only one forward propagation pass through the network to make predictions. In this project, I used pre-trained weights from the YOLOv2 model to detect cars in a dataset. The YOLO architecture runs each input image through a deep convolutional neural network. After running the model through the images, it returned all the predicted boxes for each image. The boxes were then filtered by thresholding on object and class confidence to remove boxes with low probability. A second filtering was applied using Intersection over Union (IoU) thresholding to remove overlapping boxes. The final output was one bounding box for each object with a predicted score and class.

I used the model to detect cars in my driveway! Credit: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi - You Only Look Once: Unified, Real-Time Object Detection (2015), Allan Zelener

Measuring Urban Surface Reflectivity and Heat Mitigation Potential at High-Resolution with Remote Sensing and Machine Learning

At WRI, I worked on a project to identify the surface reflectivity of roofs and pavements in urban areas using machine learning and remote sensing. We built on the methods developed in Ban-Weiss et al. 2015a & 2015b and scaled them through cloud computing and machine learning. We used official footprint data from LA city, Microsoft building footprints and OpenStreetMap/SharedStreet API to get geometries of roofs and streets. Using open-source satellite imagery from National Agriculture Imagery Program (NAIP), ground truth measurements collected through project partners, and regression machine learning, we created high-resolution map of surface reflectivity for multiple urban areas in the United States. The resulting data and maps provide an estimate of the existing surface reflectivity at a building and street-segment scale which can be superimposed with current heat vulnerability, green infrastructure, urban morphology, and urban heat data. This tool serves cities in developing and evaluating urban heat island reduction strategies and promoting extensive adoption of urban heat mitigation programs.

Prediction of mean albedo for every roof/street in LA city between 2009 and 2018. Credit: WRI/Microsoft AI for Earth/Global Cool Cities Alliance/City of Los Angeles/George Ban-Weiss/Sika AG/Federal Highway Administration Albedo Study/James E Alleman/Michael Heitzman.