Research Experience

University of Hamburg, Germany

Research Assistant

Under Prof. Chris Beimann and Florian Schneider

State-of-the-Art Multi-Modal LLMs for Text-Video Retrieval
- Duration: May 2023 - Present
- In this ongoing project, our mission is to implement a Retrieval-Augmented Generation system using Video-LLAMA as the Language Model for zero-shot retrieval. We aim for zero-shot retrieval and plan to leverage advanced models to enhance our research. This project is at the forefront of exploring the fusion of textual and visual data for efficient information retrieval.
Video Retrieval Application
- Duration: Feb 2023 - Apr 2023
- Here, our goal was to develop a Video-Text Retrieval application using the MSR-VTT dataset. We focused on enabling users to retrieve relevant video scenes based on text queries. A notable feature of our approach was the use of the Xclip model’s encoders to generate embeddings for both text and videos. We explored different Xclip encoder variants, including “microsoft/xclip-base-patch32,” “microsoft/xclip-base-patch16,” and “microsoft/xclip-large-patch14.” This project provided practical experience in video-text retrieval and the application of state-of-the-art models for multimodal information retrieval.

KU Leuven, Belgium

Visiting Scholar

Under Prof. Hugo Van Hamme

Relating Output Symbol Probabilities With Confidence in an End-to-End Speech Recognizer

Duration: July 2022 - Sep 2022

During my time at KU Leuven, we embarked on a fascinating project within the domain of speech recognition. Our primary focus was to evaluate the accuracy of a speech recognition network’s predicted probabilities. We made a noteworthy discovery: the network exhibited an unexpected level of uncertainty in its predictions, despite achieving high accuracy with low probabilities. Our research aimed to address this intriguing challenge by introducing prediction patterns to a designed model. Through this approach, we sought to enhance the network’s probability evaluation per character, ultimately improving the overall performance of the speech recognition system. This project delved deep into both the theoretical and practical aspects of speech recognition technology, contributing to advancements in this vital field.

University of Tehran, Iran

Bachelor’s Thesis

Under Prof. Bagher BabaAli

Investigating the Relationship Between a Person’s Written Text and Their Suicidal Tendencies Using Text Mining and Machine Learning Methods

Duration: April 2023 - July 2023

During my Bachelor’s thesis project at the University of Tehran, I delved into an impactful research endeavor focused on understanding the link between an individual’s written text and their susceptibility to suicidal tendencies. Leveraging advanced text mining and machine learning techniques, I developed a highly precise model for identifying text indicative of suicidal behavior, using data from the social media platform Reddit. The research involved the application of a diverse set of machine learning and deep learning methods, including Support Vector Machines, Random Forests, and Neural Networks, which collectively achieved remarkable accuracy in detecting patterns associated with suicidal tendencies. This research holds significant potential for early identification and intervention in suicide prevention efforts, thereby contributing to the advancement of public health research and well-being.

Overview of My Research Journey

My academic journey has been enriched by a diverse range of research experiences. These projects have not only expanded my knowledge but also allowed me to contribute to cutting-edge advancements in computer science and related fields. From the exploration of multimodal information retrieval to enhancing the accuracy of speech recognition systems and investigating the link between written text and suicidal tendencies, each project has deepened my understanding of complex technological challenges and innovative solutions.

Feel free to explore the details of each project, which reflect my dedication to advancing the fields of computer vision, natural language processing, and multimodal learning. I’m excited to continue my journey of exploration and discovery in the world of research and technology.

Narges Baba Ahmadi