About me

Data Scientist | AI/ML Engineer | ENFJ | Insights Discovery Profiles [1], [2]

A well rounded, technically proficient Data Scientist and out of the box thinker with strong leadership skills. Specializing in solving complex problems and generating actionable insights. Expertise in advanced data science techniques including ML & AI. Mastery of modelling and simulations.

Particularly interested in applying these skills to Life Sciences, Health, Genetics and Pharmaceuticals, with a focus on Immuno-Oncology, Computer Vision, Drug and Vaccine Discovery, Precision Medicine, Chem/Bioinformatics to drive meaningful societal impact with a people first approach.

Living in diverse locations such as Kabul, Glasgow, London, Surrey, and Japan has shaped me into a highly adaptable and resilient individual. This journey has honed my ability to build strong relationships quickly and fostered a deep passion for learning and embracing new experiences.

Strong attention to detail with the ability to understand both high-level and low-level aspects of the software and model development process for projects ensuring alignment with business objectives throughout DS project lifecycles.

Recently refined skills in Machine Learning & AI at the University of Cambridge's career accelerator.

Data Science projects showcasing my skills

Each project has been carefully selected to showcase my skills across the entire spectrum of Data Science - from simple to complex methodologies, allowing me to show how I can dive into the detail and uncover insights.
Projects include; linear regression, clustering, classification, supervised and unsupervised learning, applying neural networks, deep learning, fine-tuning, pre-training, computer vision and working with structured and unstructured data.
Feel free to browse through my projects and witness the transformative power of data science in action!

🛍️ 12 hr Project: Customer Segmentation with Clustering

🔍 Data-Driven Insights for Retail Success | K-means Clustering | Elbow Method | Silhouette Score

This project applies critical thinking and machine learning to design clustering models for segmenting customers driving smarter marketing strategies to improve customer experiences.

🚢 12hr Project: Detecting Anomalous Ship Engine Activity

🛠️ Unsupervised Anomaly Detection | No Ground Truth | SVM | Isolation Forest

This project tackles the challenge of detecting anomalies to ensure optimal engine performance by applying statistical and ML-based anomaly detection approaches.

📚 36hr Project: Book Sales Forecasting with Time Series and Machine Learning

📈 Sales Prediction | Time Series Analysis | ARIMA | XGBoost | LSTM | ACF, PACF, Ljung-Box |

This project utilizes time series data and advanced ML models to forecast book sales, enabling data-driven decisions on inventory and marketing strategies.

💼 Employer project: Analysing Quarterly Results of G-SIBs with Advanced Language Models

🧠 RAG | Sentiment Analysis | Topic Modelling | NLP | LLMs & SLMs

This project aimed to enhance the Bank of England and Prudential Regulation Authority's risk assessments by applying advanced language models to analyse quarterly earnings call transcripts from financial institutions. This approach helps proactively assess firms' stability, identify high-risk behavior, and improve oversight, potentially preventing future financial crises.

A 6-week team project in partnership with the Bank of England through the University of Cambridge, where I served as the Data Science Team Lead.

🎓 12hr Project: Predicting Student Dropout with Advanced Machine Learning

🧠 | Supervised Learning | Classification | XGBoost | Neural Networks

In this project, I used advanced supervised learning techniques to predict student dropouts, aiming to reduce high dropout rates that can negatively impact an institution’s finances, reputation, and student satisfaction.

Due to confidentiality reasons I cannot show the original data or graphical output, instead only the code and general, anonymised summaries are kept.

🎓 19hr Project: Topic Modelling with NLP in a Real-World Context

🧠 | NLP | BERTopic | Topic Modelling |Emotion Analysis

In this project, I applied advanced NLP techniques, including BERTopic and emotion analysis, to uncover themes in customer reviews, helping a gym chain enhance its customer experience.

Due to confidentiality reasons I cannot show the original data or graphical output, instead only the code and general, anonymised summaries are kept.

Coming Soon

AI Driven Nanobody Discovery for SARS-CoV-2

Coming Soon

🧬💊🤖 AI for Drug Discovery | Computational Biology| Protein NLP | Labelled Binding Data

This project leverages advanced AI techniques in drug discovery to accelerate nanobody development for SARS-CoV-2. Using labelled data from the immunization of two alpacas, the focus is on fine-tuning a pre-trained protein (VHH) language model, predicting nanobody-antigen binding, validating and benchmarking its performance.

Pre-Training LMs for Generating VHH sequences

Coming Soon

🦙🔬💡 Immunoinformatics | Machine Learning, Structural Biology | Alpaca Unlabelled Amino Acid Sequences

This project focuses on developing and enhancing large language models (LLMs) for VHH sequence generation. By training on unlabelled alpaca amino acid sequences, the goal is to produce diverse, high-quality nanobody candidates that can later be assessed for binding ability. This work aims to refine nanobody discovery pipelines by improving the generative foundation of VHH modeling.

Applying Computer Vision and Multi-Agent LLMs to Detect Stenosis in Heart X-rays

Coming Soon

❤️📸💡 Computer Vision | Deep Learning, Medical Imaging | Cardiovascular Stenosis Detection with Multi-Agent LLMs

This project combines deep learning and computer vision techniques with multi-agent systems, where different large language models (LLMs) are used to collaborate on tasks like identifying, segmenting, and localizing plaques and stenosis in heart X-rays.

By integrating LLMs as agents to assist with image interpretation, decision-making, and result analysis, the goal is to enhance the accuracy of automated detection systems while comparing the performance of various LLM approaches in medical image diagnostics.

This innovative approach aims to improve early diagnosis and clinical decision-making in cardiology, paving the way for better patient outcomes.