dawoodsarfraz[dot]cs[@]gmail[dot]com   |   dawoodsarfraz0346[@]gmail[dot]com

About

profile

Welcome! I am a Machine Learning engineer at OrbytLabs, working on computer vision, natural language processing, and generative AI projects. During my BSCS at FAST National University of Computer and Emerging Sciences, I worked as a Research Assistant on medical image processing, time series analysis, and recommendation system projects.

Research

My main research interests lie in Natural Language Processing, Computer Vision, 3D computer vision, Computer Graphics, and Generative Models. I am passionate about exploring how these areas intersect to solve challenging problems in AI and real-world applications. A complete and up-to-date list of my publications or projects reports can be found on Google Scholar or further below.

As part of my work, Most of my projects and contributions can be found on GitHub.

Education

FAST NUCES Logo

FAST National University of Computer and Emerging Sciences

Bachelor of Science in Computer Science

September 2020 - September 2024 | Pakistan

Specialization: Artificial Intelligence, Machine Learning, Deep Learning, Digital Image Processing, Data Science, Computer Vision, Natural Language Processing

Experience

OrbytLabs Logo

OrbytLabs

Co-Founder | Machine Learning Engineer

June 2024 – Present | Remote

  • Developed a vehicle price prediction system, applying data validation, feature engineering, and model optimization. Iteratively improved the model to achieve 98% prediction accuracy, enabling accurate and reliable price estimations.
  • Built object detection and tracking systems using YOLOv10/YOLOv11, integrating tracking algorithms for real-time visual understanding in smart security solutions. Applied post-training quantization (PTQ) and Quantization-Aware Training (QAT) to optimize model performance and ensure efficient, real-time inference on edge devices.
  • Implemented intrusion detection systems by combining motion-based region analysis and AI-driven object localization to automatically detect and flag unauthorized entries or restricted zone violations in real time.
  • Implemented OCR pipelines using Tesseract and EasyOCR to extract and structure information from scanned documents and ID cards, enabling automated data extraction and document digitization.
  • Created and deployed LLM-based chatbots and document understanding systems using OpenAI, Claude, and Gemini APIs with RAG pipelines to automate customer support and business operations. Fine-tuned large language models (LLMs) for domain-specific use cases, enhancing contextual understanding, accuracy, and efficiency.
  • Optimized deep learning pipelines using transfer learning, quantization, pruning, and model optimization techniques to improve accuracy and efficiency during experimental evaluations.
Deutics Global Logo

Deutics Global

Associate Machine Learning Engineer

Jan 2025 – July 2025 | On-site, Pakistan

  • Developed and optimized a real-time video analytics system by converting RTSP streams to WebRTC for efficient live video processing and reduced latency.
  • Implemented advanced tracking algorithms for real-time object tracking across frames, enabling movement monitoring and direction estimation.
  • Built and integrated OCR pipelines using Tesseract, EasyOCR, and custom deep learning models to extract license plate numbers from images and video streams.
  • Designed and optimized algorithms for wait time estimation, queue detection, speed calculation, and traffic light violation detection, along with advanced modules for eye state detection (drowsiness monitoring), seat belt detection, and driver attention monitoring to enhance traffic efficiency and enforce road safety regulations.
  • Implemented line and zone intrusion detection systems to monitor restricted areas, strengthen surveillance, and enforce access control.
FAST NUCES Logo

FAST NUCES

Research Assistant

Sep 2023 – Dec 2024 | On-site

  • Project: Personalized Recommendation System Development
  • Developed a multi-class skin cancer classification system using the HAM10000 dataset to classify various types of skin lesions.
  • Custom CNN achieved 92% accuracy, precision 92%, recall 92%, F1 score 92%.
  • NasNet achieved 93% accuracy with precision 94%, recall 93%, F1 score 93%.
  • ShuffleNet enhanced computational efficiency using grouped convolutions and channel shuffling; accuracy 87%, precision 87%, recall 87%, F1 score 87%.
  • Optimized data pipelines and applied augmentation strategies to improve robustness and reduce over-fitting.
Anonymous Tree Logo

Anonymous Tree

Machine Learning Engineer (Intern)

Jun 2023 – Aug 2023 | Remote, Pakistan

  • Project: Personalized Recommendation System Development
  • Collected and processed user interaction data (e.g., purchases, ratings) along with item metadata (e.g., product categories, descriptions, attributes) to build a robust dataset. The data was cleaned, normalized, and structured for efficient use in model training. Additionally, techniques such as data imputation and feature scaling were implemented to handle missing values and enhance the accuracy and reliability of model predictions.
  • Developed a content-based recommendation system utilizing item metadata (e.g., text-based features) to recommend similar products to users. Implemented collaborative filtering techniques such as user-based and item-based approaches combined both into a hybrid recommendation system for improved accuracy.
  • Tuned hyper-parameters using Grid Search and Randomized Search for optimal performance. Evaluated models using Precision, Recall, F1 Score, and Mean Squared Error (MSE). Applied cross-validation to ensure generalization and prevent over-fitting.
  • Integrated Matrix Factorization methods (e.g., SVD, KNN) for collaborative filtering. Utilized TF-IDF and Cosine Similarity for content-based filtering to identify similar items.
  • Monitored real-time recommendation performance and ensured alignment with business goals. Incorporated user feedback into model retraining and improvement cycles. Analyzed system performance over time and tested new features for better accuracy.

Projects

I believe the best way to learn is by doing. I have completed many projects, and here is a list of the most notable ones in reverse chronological order.. For more information, please visit GitHub.

Highlighted Projects

  1. Skin Cancer Classification using NasNet and ShuffleNet

    Dawood Sarfraz

    Pytorch

  2. Product Recommender System

  3. Duplicate Question Pairs

    Pytorch

  4. Text Generation using LSTMs

    Python, NumPy, Pandas, TensorFlow, Keras

  5. Collaborative AI System for Task Research and Analysis

    Python, Ollama, Llama 3.2, CrewAI, and Serper

  6. Stock Market Prediction using LSTM

    Python, NumPy, Pandas, Keras, TensorFlow

  7. AI Research Assistant for Real-Time Information Discovery

    Python, Ollama, Llama 3.2, CrewAI, and Serper

  8. Twittet Sentiment Analysis using Machine Learning

    Python (version 3.7+), Pandas, NumPy, Scikit-learn, SVM, KNN, DT, XGBoost, Random Forest, Logistic Regression

  9. Text Classification using Classical NLP and RoBERTa

    Dawood Sarfraz

    Python, Tensorflow, RoBERTa, VADER lexicon, NLTK

  10. Intelligent Academic Assistant for Text and Image Tasks

    Python, Llama 3.2 3B, LangChain, Streamlit, Ollama

  11. Smart Literature Review and Research Analysis Assistant

    Python, RAG, Mistral 7B, Ollama

  12. Russian Language Sentiment Analysis

    Tech Stack: Python (version 3.7+), Pandas, NumPy, NLTK ('punkt', stopwords, SnowballStemmer), TfidfVectorizer, scikit-learn (MultinomialNB)

  13. Text, Image, Audio and Video Steganography

    Python 3, OpenCV, NumPy, Matplotlib, ImageIO, SciPy, tqdm, pypng, wave, Linux

  14. Cyber Attacks Classification using Machine Learning

    Dawood Sarfraz

    Python (version 3.7+), Pandas, NumPy, Scikit-learn, SVM, KNN, DT, XGBoost, Random Forest, Logistic Regression

  15. Pakistan Food Price Analysis

    Python 3.7+, NumPy, Pandas, Matplotlib, Plotly, Seaborn, scikit-learn SVM, MLPRegressor, RandomForestRegressor, AdaBoostRegressor, DecisionTreeRegressor

  16. Mart Sales Prediction using XGBoost

    Python (version 3.7+), Pandas, NumPy, Scikit-learn, XGBoost

  17. Gold Price Prediction using Random Forest Regressor

    Python (version 3.7+), Pandas, NumPy, Scikit-learn

Publications

List of papers is in reverse chronological order. For more information, please visit Google Scholar.

Papers

  1. Skin Cancer Classification using Deep Learning

    Dawood Sarfraz

    arXiv, 2025

Skills

Programming Languages: Python, C++, JavaScript
Full-Stack Web Technologies: HTML, CSS, Bootstrap, React, FastAPI, Django, Flask, Streamlit, Gradio
Tools & Platforms: Git, Docker, AWS, GCP, Azure, Linux
Databases & Vector Stores: MySQL, PostgreSQL, MongoDB, Redis, FAISS, Pinecone, Weaviate
ML/DL Frameworks:Scikit-learn, PyTorch, TensorFlow, Keras, YOLO, Hugging Face Transformers, OpenCV
NLP Libraries: spaCy, NLTK
Data Science & Visualization: NumPy, Pandas, SciPy, Matplotlib, Seaborn
Architectures & Models: Transformers, BERT, RoBERTa, Vision Transformers (ViTs), CLIP, CNNs, RNNs, LSTMs, GANs, Autoencoders
OCR & Document Processing: EasyOCR, Tesseract OCR
Generative AI Libraries & Tools: LangChain, LangGraph, LlamaIndex, CrewAI
Large Language Models: LLaMA, DeepSeek, Mistral, Falcon, Phi, Qwen, Granite, Gemma
Speech Models: Whisper
Techniques: Retrieval-Augmented Generation (RAG), Prompt Engineering, Fine-tuning, LoRA, QLoRA, Quantization, Model Distillation, PEFT

Miscellaneous

News

  1. X Y, 2025, News, “See Here”

Blogs

  1. X Y, 2025, JSON vs TOON, “Read Blog Here”

Selected Talks

  1. X Y, 2025, Selected Talk “Watch Talk Here ”

Workshops

  1. X Y, 2025,Workshops “Attend Workshop Here”

Tutorials

  1. X Y, 2025, Tutorials “Watch Tutorial Here”