Tree-of-Thought Breadth-First Search for Real-Time LLM Guardrail Evaluation
Siva Prasad Nidamanuri. Citi, Technical Report, 2025
Weighted Factual Scoring for LLM Alignment
Siva Prasad Nidamanuri. Citi, Technical Report, 2025
Multi-Agent Orchestration with Model Context Protocol
Siva Prasad Nidamanuri. Citi, Technical Report, 2025
CitiKG: Graph-Augmented Knowledge Retrieval with Precision Citations
Siva Prasad Nidamanuri. Citi, Technical Report, 2024
Exploring the Frontiers of Large Language Models: Architecture, Training, and Security
Siva Prasad Nidamanuri. University of Michigan, 2023
Driver Pre-Accident Behaviour Pattern Recognition Using Deep Learning Ensembles
Siva Prasad Nidamanuri, Sai Tarun Teja Surapaneni, Ruthvik Pangalore Sankar. University of Michigan, EECS 545, 2023 [code]
Optimizing Neural Network Components for Emergency Vehicle Identification
Siva Prasad Nidamanuri. University of Michigan, EECS 504, 2023
The core idea: instead of fine-tuning models for safety, evaluate outputs through a tree structure that expands breadth-first. Each level processes all nodes before going deeper, enabling parallel evaluation across safety dimensions.
Traditional RLHF requires training a separate reward model and using PPO for policy optimization, which is computationally expensive and unstable. My work explores Direct Preference Optimization (DPO) and its variants, which eliminate the need for explicit reward modeling.
DPO directly optimizes the policy without a separate reward model, leading to more stable and efficient training.
I developed an extension to Group Relative Policy Optimization (GRPO) that incorporates adaptive λ-weighting for different preference categories. The key insight is that not all preferences are equally important—safety-critical preferences should be weighted higher than stylistic preferences.
where λi is learned per-category weight and α controls KL penalty
Combined with QLoRA (4-bit quantization + Low-Rank Adaptation), this approach enables fine-tuning 70B parameter models on a single A100 GPU while maintaining alignment quality comparable to full RLHF.
Designed and deployed a scalable multi-agent ecosystem for customer service automation. The system uses MCP to standardize tool interfaces and context passing between agents, with a multi-model stack (Llama 3.1 70B, SBERT, Gemini Flash) for specialized tasks.
Built a hybrid retrieval pipeline combining sparse (BM25) and dense (bi-encoder) retrieval with GraphRAG on Neo4j for structured knowledge. The system implements two-stage ranking with cross-encoder reranking for optimal precision.
High-impact production system: Architected and deployed CitiKG, a novel graph-augmented retrieval architecture for enterprise article management. The system enables customer service agents to query thousands of PDFs, policy documents, and internal articles with sub-2-second response times and word-level citation accuracy—providing exact page numbers, section references, and contextual snippets during live customer calls.
CitiKG's dual-level retrieval paradigm: local entity-centric retrieval captures document-specific context, while global relation traversal discovers cross-document knowledge patterns. The precision citation linker provides word-level source attribution.
Multimodal Document Understanding: Llama 4 Vision extracts structured information from images, tables, and diagrams embedded in PDFs—converting visual content into searchable text with positional metadata (page, bounding box, section hierarchy).
Word-Level Citation Indexing: Unlike traditional RAG systems that cite at chunk level, our system maintains a fine-grained index mapping each extracted entity and relation back to exact source locations (document → page → section → paragraph → word offset), enabling agents to provide precise citations during live customer interactions.
Dual-Mode Retrieval: CitiKG implements a novel dual-retrieval strategy combining local (entity neighborhood traversal) and global (cross-document relation paths) modes with adaptive fusion based on query complexity—achieving comprehensive coverage while maintaining sub-2-second latency for real-time customer support.
My flagship research contribution: I invented and deployed a novel guardrail evaluation framework using Tree-of-Thought Breadth-First Search (ToT-BFS) combined with Weighted Factual Scoring. Unlike traditional approaches requiring expensive model pre-training or preference tuning (RLHF/DPO), my method evaluates LLM outputs through a hierarchical decision tree that expands breadth-first, enabling parallel evaluation of safety dimensions while maintaining real-time latency requirements.
My ToT-BFS architecture expands evaluation breadth-first: Level 1 evaluates safety dimensions in parallel, Level 2 performs deep analysis per dimension, Level 3 verifies individual facts. Weighted scoring aggregates results without requiring model retraining.
Traditional RLHF and DPO require expensive preference data collection and model fine-tuning. My Weighted Factual Scoring (WFS) approach instead uses a scoring function that prioritizes factual accuracy over stylistic preferences, evaluated at inference time:
where wfact=0.45, wsafe=0.35, wcoh=0.20 (learned from production data)
Key advantages over RLHF/DPO: (1) No model retraining required—works with any base LLM; (2) Weights adjustable in real-time based on use case; (3) 10x faster to deploy; (4) Transparent decision-making through interpretable scores.
Citi, Irving, TX
Federal Soft Systems (DoorDash Contract), Remote
eClinical Solutions, Bangalore, India
University of Michigan
Coursework: Deep Learning, Machine Learning, Computer Vision, Natural Language Processing, Intelligent Systems, Applied Statistics
Andhra University