Siva Prasad Nidamanuri

Senior Machine Learning Engineer, Citi
M.S. Artificial Intelligence, University of Michigan '23

I work on LLM training, preference optimization, and alignment. Interested in SFT, DPO/RLHF, efficient fine-tuning, and building safe production LLM systems.

Siva Prasad Nidamanuri

Publications

Tree-of-Thought Breadth-First Search for Real-Time LLM Guardrail Evaluation
Siva Prasad Nidamanuri. Citi, Technical Report, 2025

Weighted Factual Scoring for LLM Alignment
Siva Prasad Nidamanuri. Citi, Technical Report, 2025

Multi-Agent Orchestration with Model Context Protocol
Siva Prasad Nidamanuri. Citi, Technical Report, 2025

CitiKG: Graph-Augmented Knowledge Retrieval with Precision Citations
Siva Prasad Nidamanuri. Citi, Technical Report, 2024

Academic

Exploring the Frontiers of Large Language Models: Architecture, Training, and Security
Siva Prasad Nidamanuri. University of Michigan, 2023

Driver Pre-Accident Behaviour Pattern Recognition Using Deep Learning Ensembles
Siva Prasad Nidamanuri, Sai Tarun Teja Surapaneni, Ruthvik Pangalore Sankar. University of Michigan, EECS 545, 2023 [code]

Optimizing Neural Network Components for Emergency Vehicle Identification
Siva Prasad Nidamanuri. University of Michigan, EECS 504, 2023

Tree-of-Thought Guardrail Architecture

The core idea: instead of fine-tuning models for safety, evaluate outputs through a tree structure that expands breadth-first. Each level processes all nodes before going deeper, enabling parallel evaluation across safety dimensions.

Technical Deep Dive: Preference Optimization

Traditional RLHF requires training a separate reward model and using PPO for policy optimization, which is computationally expensive and unstable. My work explores Direct Preference Optimization (DPO) and its variants, which eliminate the need for explicit reward modeling.

Figure 1: Direct Preference Optimization vs. RLHF Pipeline
Traditional RLHF Pipeline Preference Data (x, y_w, y_l) Reward Model r_θ(x, y) PPO Training Policy π_θ KL Constraint D_KL(π||π_ref) Aligned LLM VS Direct Preference Optimization (My Approach) Preference Data (x, y_w, y_l) DPO Loss L_DPO = -log σ(β(log π_θ(y_w|x) - log π_θ(y_l|x))) Aligned LLM Direct optimization ✓ No reward model needed ✓ Stable training ✓ 30% faster convergence ✓ 4x memory reduction with QLoRA

DPO directly optimizes the policy without a separate reward model, leading to more stable and efficient training.

Lambda-GRPO: My Approach to Efficient Preference Optimization

I developed an extension to Group Relative Policy Optimization (GRPO) that incorporates adaptive λ-weighting for different preference categories. The key insight is that not all preferences are equally important—safety-critical preferences should be weighted higher than stylistic preferences.

Lλ-GRPO = Σi λi · LDPO(xi, yw, yl) + α · DKLθ || πref)

where λi is learned per-category weight and α controls KL penalty

Combined with QLoRA (4-bit quantization + Low-Rank Adaptation), this approach enables fine-tuning 70B parameter models on a single A100 GPU while maintaining alignment quality comparable to full RLHF.

Selected Projects

Multi-Agent Orchestration with Model Context Protocol (MCP)

Designed and deployed a scalable multi-agent ecosystem for customer service automation. The system uses MCP to standardize tool interfaces and context passing between agents, with a multi-model stack (Llama 3.1 70B, SBERT, Gemini Flash) for specialized tasks.

Multi-Agent Architecture with MCP
User Query Orchestrator Llama 3.1 70B ReAct Prompting Task Decomposition Model Context Protocol (MCP) Retrieval SBERT Reasoning Gemini Flash Tool Use APIs/DB Memory Context Cache Neo4j GraphRAG Vector Store Response
Multi-Agent MCP Llama 3.1 SBERT Neo4j AutoGen

Hybrid RAG with GraphRAG for Enterprise Knowledge Retrieval

Citi | 2024 | Production System

Built a hybrid retrieval pipeline combining sparse (BM25) and dense (bi-encoder) retrieval with GraphRAG on Neo4j for structured knowledge. The system implements two-stage ranking with cross-encoder reranking for optimal precision.

Hybrid RAG Architecture with Two-Stage Ranking
Query Stage 1: Candidate Retrieval BM25 Sparse Bi-Encoder GraphRAG Fusion RRF/Score Stage 2: Reranking Cross-Encoder SBERT ms-marco Generation Llama 3.1 + Context Answer NDCG@10: 0.89 | Precision@5: 0.92
RAG GraphRAG Neo4j BM25 Cross-Encoder FAISS

📚 CitiKG: Graph-Augmented Knowledge Retrieval with Precision Citations

Citi | 2025 | Production System for Live Customer Support

High-impact production system: Architected and deployed CitiKG, a novel graph-augmented retrieval architecture for enterprise article management. The system enables customer service agents to query thousands of PDFs, policy documents, and internal articles with sub-2-second response times and word-level citation accuracy—providing exact page numbers, section references, and contextual snippets during live customer calls.

CitiKG: Hierarchical Knowledge Graph Architecture
Document Ingestion PDFs Articles Llama 4 Vision Image/Table Extraction Entity-Relation Extraction Entities Relations Page/Section/Word Index Knowledge Graph Neo4j + GraphRAG Vector Index Chunk Vectors Entity Vectors Relation Vectors Agent Query Live Call Dual-Mode Retrieval (Local + Global) Local Context Entity Neighbors Global Context Cross-Doc Relations Hybrid Fusion RRF Reranking Citation Linker Page:Section:Word Response Generation Llama 3.1 70B + Context Grounding ✓ Inline Citations | ✓ Source Verification | ✓ <2s Latency

CitiKG's dual-level retrieval paradigm: local entity-centric retrieval captures document-specific context, while global relation traversal discovers cross-document knowledge patterns. The precision citation linker provides word-level source attribution.

Key Innovations

Multimodal Document Understanding: Llama 4 Vision extracts structured information from images, tables, and diagrams embedded in PDFs—converting visual content into searchable text with positional metadata (page, bounding box, section hierarchy).

Word-Level Citation Indexing: Unlike traditional RAG systems that cite at chunk level, our system maintains a fine-grained index mapping each extracted entity and relation back to exact source locations (document → page → section → paragraph → word offset), enabling agents to provide precise citations during live customer interactions.

Dual-Mode Retrieval: CitiKG implements a novel dual-retrieval strategy combining local (entity neighborhood traversal) and global (cross-document relation paths) modes with adaptive fusion based on query complexity—achieving comprehensive coverage while maintaining sub-2-second latency for real-time customer support.

📚 CitiKG Knowledge Graph Llama 4 Vision Precision Citations Neo4j Dual-Mode Retrieval <2s Latency

🌳 Tree-of-Thought BFS Guardrail Evaluation Framework

Citi | 2024-2025 | Novel Research & Production System

My flagship research contribution: I invented and deployed a novel guardrail evaluation framework using Tree-of-Thought Breadth-First Search (ToT-BFS) combined with Weighted Factual Scoring. Unlike traditional approaches requiring expensive model pre-training or preference tuning (RLHF/DPO), my method evaluates LLM outputs through a hierarchical decision tree that expands breadth-first, enabling parallel evaluation of safety dimensions while maintaining real-time latency requirements.

Tree-of-Thought BFS Guardrail Architecture
ROOT NODE LLM Output BFS Level 1 Parallel Expansion Safety Node w=0.35 Factuality Node w=0.45 (Priority) Coherence Node w=0.20 BFS Level 2 Deep Analysis Toxicity Score: 0.92 Bias Check Score: 0.88 Claim Extract 5 claims found Evidence Link RAG Verify Hallucination Score: 0.94 Logic Flow Score: 0.91 Consistency Score: 0.89 BFS Level 3 Fact Verification Fact 1 ✓ Fact 2 ✓ Fact 3 ✓ Fact 4 ✗ Fact 5 ✓ Weighted Score Aggregation Final = Σ(w_i × score_i) = 0.91 ✓ PASS DECISION ✓ Approved

My ToT-BFS architecture expands evaluation breadth-first: Level 1 evaluates safety dimensions in parallel, Level 2 performs deep analysis per dimension, Level 3 verifies individual facts. Weighted scoring aggregates results without requiring model retraining.

Weighted Factual Scoring: My Alternative to Preference Tuning

Traditional RLHF and DPO require expensive preference data collection and model fine-tuning. My Weighted Factual Scoring (WFS) approach instead uses a scoring function that prioritizes factual accuracy over stylistic preferences, evaluated at inference time:

ScoreWFS = wfact·Sfactuality + wsafe·Ssafety + wcoh·Scoherence

where wfact=0.45, wsafe=0.35, wcoh=0.20 (learned from production data)

Key advantages over RLHF/DPO: (1) No model retraining required—works with any base LLM; (2) Weights adjustable in real-time based on use case; (3) 10x faster to deploy; (4) Transparent decision-making through interpretable scores.

# Tree-of-Thought BFS Guardrail Evaluator (Nidamanuri, 2024) class ToTBFSGuardrail: """Breadth-First Search evaluation with weighted factual scoring.""" def __init__(self, weights={'factuality': 0.45, 'safety': 0.35, 'coherence': 0.20}): self.weights = weights self.threshold = 0.85 self.tree = {} # Stores evaluation tree for interpretability def evaluate_bfs(self, llm_output, context): """BFS expansion: evaluate all dimensions at each level before going deeper.""" queue = [(llm_output, 'root', 0)] # (content, node_id, depth) scores = {dim: [] for dim in self.weights} while queue: content, node_id, depth = queue.pop(0) # Level 1: Parallel dimension evaluation if depth == 0: for dim in self.weights: child_score = self._evaluate_dimension(content, dim) scores[dim].append(child_score) queue.append((content, f'{dim}_L1', 1)) # Level 2: Deep fact extraction for factuality branch elif depth == 1 and 'factuality' in node_id: facts = self._extract_claims(content) for i, fact in enumerate(facts): fact_score = self._verify_fact(fact, context) scores['factuality'].append(fact_score) # Weighted aggregation (no model retraining needed!) final_score = sum(self.weights[d] * (sum(s)/len(s)) for d, s in scores.items() if s) return { 'approved': final_score >= self.threshold, 'score': final_score, 'breakdown': scores, 'tree': self.tree # Full interpretability }
🌳 Tree-of-Thought BFS Evaluation Weighted Scoring No Retraining Real-time 68% Safety ↑ Sub-100ms

Experience

2024 — Present

Senior Machine Learning Engineer

Citi, Irving, TX

  • Invented Tree-of-Thought BFS guardrail framework with weighted factual scoring—reducing harmful outputs by 68% without expensive RLHF/DPO model retraining
  • Built CitiKG, a graph-augmented retrieval system with word-level precision citations, enabling agents to query 10K+ documents in <2 seconds during live customer calls
  • Architected hybrid RAG pipeline combining GraphRAG + BM25 + cross-encoder reranking, achieving 92% accuracy across 190+ customer intents
  • Deployed multi-agent orchestration using MCP (Model Context Protocol) with Llama 3.1 70B, SBERT, and Gemini Flash for specialized task routing
  • Implemented vLLM serving infrastructure with PagedAttention and continuous batching, reducing inference costs by 35%
  • Built LLM-as-Judge evaluation pipeline with interpretable scoring—adopted across 3 business units for production safety checks
  • Integrated Llama 4 Vision for multimodal document extraction (tables, images, diagrams) with positional metadata indexing
2024

Machine Learning Engineer

Federal Soft Systems (DoorDash Contract), Remote

  • Built Siamese network classifier with triplet loss for 300+ category product tagging (23% F1 improvement)
  • Engineered LLM-powered knowledge graph pipeline reducing manual annotation time by 90%
  • Developed real-time ML serving handling 1M+ predictions/day on Kubernetes with auto-scaling
2017 — 2022

Senior Data Scientist

eClinical Solutions, Bangalore, India

  • Developed deep learning models (CNNs, RNNs, autoencoders) for cellular imaging and drug screening in clinical trials
  • Built NLP pipelines using BERT for clinical document processing (30% efficiency improvement)
  • Designed scalable analytics pipelines processing petabyte-scale genomic data on Databricks
  • Deployed ML models via MLflow and Kubernetes on AWS for real-time clinical trial monitoring

Education

2022 — 2023

Master of Science in Artificial Intelligence

University of Michigan

Coursework: Deep Learning, Machine Learning, Computer Vision, Natural Language Processing, Intelligent Systems, Applied Statistics

2013 — 2017

Bachelor of Science in Computer Science

Andhra University