Graduate student researcher at the Robotics Institute, Carnegie Mellon University. Passionate about applications of Computer Vision in Augmented and Virtual Reality, Telepresence, Hand Gesture Recognition, Robotics, Artificial Intelligence, Machine learning and Deep learning. Keen interests in – Data Analytics, Big Data, IoT, Machine Learning and Cryptocurrencies.
Bringing state-of-the-art segmentation to Microsoft Teams.
Led efforts on transitioning to temporal segmentation. Responsible for a reduction of up to 20% in error rates.
Blind augmentation and normalization in the input domain does not help significantly in making the face encoders robust to real-world variations such as identities, appearances, lighting and headset assembly. Hence, we use analysis-by-synthesis to exploit the information available during test time: the target image (with lighting/appearance information).
Explored metric learning to learn a generic feature space which can be used to refine predictions during run-time. This is done through fully convolutional networks (so that the pixels are aligned), and comparison in this feature space minimizes the distance between current predictions and the reference results.
Working on High-Fidelity Bidirectional Telepresence communication in VR using Photo-realistic avatars. This involves solving problems such as 3D avatar animation generation via self-supervised multi-view image translation using a limited amount of IR sensors.
Developed a Variational Auto-Encoder to predict texture and geometry from IR images and experimented with input-level augmentation techniques such as adding UV maps to make the model robust to variations in identities, appearances, and lighting. This resulted in an overall improvement of 6% over the ongoing methods.
Worked at the intersection of Computer Vision, Deep Learning and Augmented Reality. Explored the possibility of porting powerful deep-learning models to commodity smart-phones to solve problems in the domain of AR.
Proposed the new state-of-the-art in 2D temporal hand gesture recognition for egocentric videos. Our DrawInAir framework uses a CNN architecture to detect hands and a DSNT layer to regress over the fingertip coordinates which are tracked by a Bi-LSTM to classify gestures.
Worked on memory-efficient Deep Neural Network architectures that enable on-device hand gestures recognition for frugal HMDs.
Designed a Deep Neural Network Architecture to recognize complex Hololens-like, marker-less, 3D temporal hand gestures in real-time using monocular RGB input without any depth information.
Worked as the founder and product developer of AirZen, which aims to measure pollution levels in and around people's home using a RaspberryPi with various sensors and give suitable health advice. Also, made an Android app and a server implementation for the same. Project mentored by Ms. Jyoti Vashishtha Sinha and incubated by the Incubation Cell, IIITD.
Worked in the domain of Augmented Reality, exploring the possibility of leveraging deep learning for recognizing 3-dimensional markerless temporal hand gestures in real-time using monocular camera input of a smartphone. It includes estimating a 3D hand pose and running a classification network.
Worked on a startup that aims to provide benefits of ‘Road Rationing’ and reduce overall congestion, commuting time and pollution by distributing the roads evenly across all the commuters.
Worked under Ms. Jyoti Vashishtha Sinha to determine influence of various pollutants like PM, CO, CO2, NO2, O3 on already infested diseases and conditions like asthma and bronchitis.
Working as the Google Student Ambassador for the college.