Projects
-
Master’s Thesis – Generative Model for Video Montage Creation
This research proposes a generative approach for video montage creation by aligning textual input with video representations. Using Unmasked Teacher (UMT) for encoding and GPT-based autoregression, the model predicts semantically meaningful video embeddings. It achieved state-of-the-art performance on the VSPD dataset in terms of IoU, UMS, and SMS scores.
Technologies Used:
- Python, PyTorch, HuggingFace Transformers, SLURM
-
Training Noisy Real vs Generated Images for Attribute Classification
This project involves finding and downloading images based on specific attributes. We compared the performance of the OpenCLIP model trained on real noisy images and synthetic images generated using Stable Diffusion. Attribute-specific datasets were created for material, pattern, group, and color classification, using CLIP-retrieved samples from the LAION dataset and synthetic images generated via Stable Diffusion. The study highlights how noisy real data leads to better generalization compared to synthetic data.
Technologies Used:
- PyTorch, Stable Diffusion, OpenCLIP, Python