Prasad Thete

Hello, I'm

Prasad Thete

|

AI/ML Engineer @ HonchoMinds, Pune | B.E. in AI & Data Science (2025) | MET Institute of Engineering, Nashik

Scroll Down

01. About Me

AI/ML Engineer specializing in Generative AI and Large Language Model (LLM) systems, with a B.E. in Artificial Intelligence & Data Science (SPPU, CGPA: 8.14First Class with Distinction).

I design and deploy production-grade Retrieval-Augmented Generation (RAG) architectures, fine-tuned LLM pipelines, and scalable AI systems built for real-world performance and reliability. My experience spans LLM fine-tuning (LoRA, QLoRA, 4-bit quantization), vector databases (Qdrant/FAISS), large-scale document processing, and end-to-end AI deployment.

I have hands-on expertise in:

• Architecting multi-stage RAG systems with intelligent retrieval, confidence handling, and fallback logic
• Optimizing LLM inference and GPU utilization for efficient large-model training and serving
• Designing scalable data ingestion and embedding pipelines for high-volume document datasets
• Building robust, containerized backend APIs for AI-powered production applications

02. Education & Experience

Education

B.E. in Artificial Intelligence & Data Science

MET Institute of Engineering, Nashik

2021 – 2025

CGPA: 8.14 / 10.00 — First Class with Distinction

Experience

AI/ML Engineer

HonchoMinds — Pune, Maharashtra

Full-time On-site
Jan 2025 – Present
BharatNyay.ai — AI-Powered Legal Intelligence Platform bharatnyay.ai
  • Engineered a production-grade RAG system for large-scale legal document search and contextual Q&A.
  • Built automated data crawling, OCR extraction, and end-to-end data pipelines (cleaning, chunking, embeddings) for legal datasets (1950–2020).
  • Implemented multi-stage retrieval strategies (semantic search, metadata filtering, re-ranking) using vector databases.
  • Performed instruction tuning and advanced prompt engineering on OpenAI models for domain-aligned responses.
  • Designed hallucination mitigation and confidence-based fallback mechanisms for reliable outputs.
  • Developed backend services using FastAPI, integrating LLM inference and retrieval pipelines.
  • Deployed and optimized GPU-based AI workloads on Linux VM environments.
BharatNyay RAG Automation & Ingestion Platform Internal AI Operations System
  • Architected an end-to-end RAG automation platform to eliminate manual extraction, cleaning, and vector DB updates.
  • Built automated data ingestion pipelines (PDF/JSON/CSV) with OCR processing, embedding generation, and vector database integration.
  • Implemented staging-to-production vector DB synchronization pipelines for controlled data deployment.
  • Developed RAG monitoring & log management system for vector database performance tracking.
  • Engineered automated government data crawling pipelines and dynamic dataset update workflows.
  • Designed backend services and dashboard using FastAPI, with user management, subscription tracking, token monitoring, and Linux VM infrastructure control.
Agentice — LLM-Powered ITSM Automation Platform (Ivanti Integration) Enterprise AI Automation System
  • Built an enterprise LLM-based chatbot automation system for Ivanti ITSM, automating Knowledge, Incident, Problem, Change, and Request modules via structured function calling.
  • Processed and cleaned custom function-calling datasets, performing advanced data preprocessing and formatting for instruction tuning.
  • Fine-tuned LLaMA 3 (8B) and Mistral 7B Instruct using LoRA-based PEFT techniques on custom function-calling data.
  • Trained models on NVIDIA A100 (80GB GPU) with optimized fine-tuning workflows and quantization-aware configurations.
  • Validated and merged LoRA adapters into standalone production-ready models.
  • Deployed fine-tuned models using Ollama and vLLM, integrating inference APIs into enterprise backend systems.
  • Engineered structured JSON function-calling enforcement and response validation mechanisms for reliable tool execution.

03. Projects

04. Technical Skills

LLM & Generative AI

Large Language Models (LLMs) Generative AI System Design Retrieval-Augmented Generation (RAG) Agentic RAG Architectures Fine-Tuning (LoRA, PEFT) Instruction Tuning Prompt Engineering & Context Engineering Function Calling & Tool Integration Embedding Models & Vector Search Quantization (QLoRA / 4-bit / 8-bit) vLLM Hybrid Search Re-ranking RAG Evaluation Hallucination Mitigation LLM Evaluation & Optimization Open-Source LLM Deployment (Ollama / vLLM) System Architecture Design (LLM/RAG)
Hands-on Models Worked With
Mistral 7B Instruct LLaMA 3 8B Vision-Language Models (VLM) GPT-based Models (API Integration)

Databases

PostgreSQL SQL MongoDB Vector Databases (Qdrant, FAISS) Redis Embedding Indexing

Data Collection

Web Scraping Selenium BeautifulSoup Requests API-based Data Collection Automated Crawling Systems

Deployment & DevOps

Docker Kubernetes Jenkins (CI/CD Pipelines) Model Deployment & Monitoring Containerized ML Applications Production LLM Serving GPU Optimization Linux Server Management

Machine Learning & Deep Learning

ML Model Development Supervised & Unsupervised Learning Feature Engineering Model Evaluation & Optimization CNN Architectures Transfer Learning ResNet (ResNet-18, ResNet-50) EfficientNet Image Classification Computer Vision OpenCV PyTorch TensorFlow Scikit-learn NumPy Pandas
Core AI Libraries & Frameworks
LangChain Transformers (Hugging Face) PEFT BitsAndBytes TRL PyMuPDF

Computer Vision & Multimodal AI

Convolutional Neural Networks (CNN) Transfer Learning (ResNet-18, ResNet-50, EfficientNet) OpenCV Image Classification Pipelines Vision-Language Models (VLM) Multimodal Data Extraction (Text + Image)
OCR Pipelines
Tesseract OCR PaddleOCR EasyOCR Layout-aware PDF Extraction Unstructured.io Docling

Data Engineering & Pipelines

End-to-End AI/ML Pipelines Data Extraction & Cleaning Data Preprocessing & Normalization Chunking & Tokenization Embedding Generation RAG Data Ingestion Pipelines Vector Index Creation JSONL Dataset Generation Large-Scale PDF Processing ETL Workflows

Backend & Web Frameworks

FastAPI Django Flask REST API Development Async APIs Authentication & Authorization SSH & Secure Server Deployment

05. Certifications & Training

SQL Certification

Professional Certification

Python Libraries for Data Science

Professional Certification

06. Leadership & Initiatives

First President

Student Association of Artificial Intelligence & Data Science (AISA)

2024 – 2025 MET Institute of Engineering, Nashik BE AI & DS | Batch 2025
  • First President & Founding Member of the AI & DS Student Association at MET IOE
  • Led the AISA committee of 40+ students from the department, coordinating all departmental activities
  • Organized technical workshops, AI seminars, and hackathons for the department
  • Coordinated with faculty and industry mentors for knowledge-sharing sessions
  • Significantly increased student participation in AI/ML activities across the college
  • Represented the department in academic and technical events

07. Get In Touch

I design and deploy production-grade AI systems across Generative AI, RAG, and Industrial Automation.

If you’re interested in collaboration, technical discussions, or innovative AI solutions, feel free to connect.

Certificate