The CVAIR Lab conducts research in computer vision, deep learning, and artificial intelligence. We aim to develop impactful AI solutions for real-world problems.
This project focuses on accurately identifying anatomical landmarks on 3D dental scans using convolutional neural networks (CNNs). Automated tooth landmark localization is a critical task in orthodontic planning and forensic identification. Our approach improves precision and consistency in landmark detection, reducing reliance on manual annotations and enabling large-scale dental morphometric analysis.
A multimodal system that analyzes both speech and text inputs to detect signs of swimming distress. It leverages gesture recognition to enhance safety monitoring in aquatic environments.
This project focuses on reconstructing 3D human mesh representations from single 2D images, guided by textual prompts. By integrating natural language processing with vision-based models, this project aims to enhance mesh recovery for personalized healthcare, rehabilitation, and human-computer interaction.
This project introduces a hybrid AI framework combining Vision Transformers and Diffusion Models for accurate 3D human pose estimation from 2D images. The transformer captures global spatial features, while the diffusion model refines pose predictions through probabilistic reasoning, improving robustness in complex scenes.
This project explores the use of generative adversarial networks (GANs) and diffusion models to synthesize high-quality biomedical images. Synthetic image generation is crucial for augmenting training datasets in domains where data is limited or annotations are costly. Our research emphasizes fidelity, clinical realism, and diversity in generated outputs, supporting applications in training, anomaly detection, and privacy-preserving AI.
We explore the integration of multimodal AI models — particularly VLMs and LLMs — for semantic segmentation of colorectal cancer from histopathology and radiology images. This research aims to bridge textual medical knowledge with visual analysis, enabling explainable and robust segmentation under low-data conditions. By leveraging models like CLIP, Segment Anything, and GPT-4, the project pushes toward zero-shot or few-shot learning in medical imaging.