Siva Prasad Nidamanuri

Tree-of-Thought Guardrail Architecture

The core idea: instead of fine-tuning models for safety, evaluate outputs through a tree structure that expands breadth-first. Each level processes all nodes before going deeper, enabling parallel evaluation across safety dimensions.

Technical Deep Dive: Preference Optimization

Traditional RLHF requires training a separate reward model and using PPO for policy optimization, which is computationally expensive and unstable. My work explores Direct Preference Optimization (DPO) and its variants, which eliminate the need for explicit reward modeling.

Figure 1: Direct Preference Optimization vs. RLHF Pipeline

DPO directly optimizes the policy without a separate reward model, leading to more stable and efficient training.

Lambda-GRPO: My Approach to Efficient Preference Optimization

I developed an extension to Group Relative Policy Optimization (GRPO) that incorporates adaptive λ-weighting for different preference categories. The key insight is that not all preferences are equally important—safety-critical preferences should be weighted higher than stylistic preferences.

                        Lλ-GRPO = Σi λi · LDPO(xi, yw, yl) + α · DKL(πθ || πref)
                    

where λ_i is learned per-category weight and α controls KL penalty

Combined with QLoRA (4-bit quantization + Low-Rank Adaptation), this approach enables fine-tuning 70B parameter models on a single A100 GPU while maintaining alignment quality comparable to full RLHF.

Selected Projects

Multi-Agent Orchestration with Model Context Protocol (MCP)

Designed and deployed a scalable multi-agent ecosystem for customer service automation. The system uses MCP to standardize tool interfaces and context passing between agents, with a multi-model stack (Llama 3.1 70B, SBERT, Gemini Flash) for specialized tasks.

Multi-Agent Architecture with MCP

Multi-Agent MCP Llama 3.1 SBERT Neo4j AutoGen

Hybrid RAG with GraphRAG for Enterprise Knowledge Retrieval

Citi | 2024 | Production System

Built a hybrid retrieval pipeline combining sparse (BM25) and dense (bi-encoder) retrieval with GraphRAG on Neo4j for structured knowledge. The system implements two-stage ranking with cross-encoder reranking for optimal precision.

Hybrid RAG Architecture with Two-Stage Ranking

RAG GraphRAG Neo4j BM25 Cross-Encoder FAISS

📚 CitiKG: Graph-Augmented Knowledge Retrieval with Precision Citations

Citi | 2025 | Production System for Live Customer Support

High-impact production system: Architected and deployed CitiKG, a novel graph-augmented retrieval architecture for enterprise article management. The system enables customer service agents to query thousands of PDFs, policy documents, and internal articles with sub-2-second response times and word-level citation accuracy—providing exact page numbers, section references, and contextual snippets during live customer calls.

CitiKG: Hierarchical Knowledge Graph Architecture

CitiKG's dual-level retrieval paradigm: local entity-centric retrieval captures document-specific context, while global relation traversal discovers cross-document knowledge patterns. The precision citation linker provides word-level source attribution.

Key Innovations

Multimodal Document Understanding: Llama 4 Vision extracts structured information from images, tables, and diagrams embedded in PDFs—converting visual content into searchable text with positional metadata (page, bounding box, section hierarchy).

Word-Level Citation Indexing: Unlike traditional RAG systems that cite at chunk level, our system maintains a fine-grained index mapping each extracted entity and relation back to exact source locations (document → page → section → paragraph → word offset), enabling agents to provide precise citations during live customer interactions.

Dual-Mode Retrieval: CitiKG implements a novel dual-retrieval strategy combining local (entity neighborhood traversal) and global (cross-document relation paths) modes with adaptive fusion based on query complexity—achieving comprehensive coverage while maintaining sub-2-second latency for real-time customer support.

📚 CitiKG Knowledge Graph Llama 4 Vision Precision Citations Neo4j Dual-Mode Retrieval <2s Latency

🌳 Tree-of-Thought BFS Guardrail Evaluation Framework

Citi | 2024-2025 | Novel Research & Production System

My flagship research contribution: I invented and deployed a novel guardrail evaluation framework using Tree-of-Thought Breadth-First Search (ToT-BFS) combined with Weighted Factual Scoring. Unlike traditional approaches requiring expensive model pre-training or preference tuning (RLHF/DPO), my method evaluates LLM outputs through a hierarchical decision tree that expands breadth-first, enabling parallel evaluation of safety dimensions while maintaining real-time latency requirements.

Tree-of-Thought BFS Guardrail Architecture

My ToT-BFS architecture expands evaluation breadth-first: Level 1 evaluates safety dimensions in parallel, Level 2 performs deep analysis per dimension, Level 3 verifies individual facts. Weighted scoring aggregates results without requiring model retraining.

Weighted Factual Scoring: My Alternative to Preference Tuning

Traditional RLHF and DPO require expensive preference data collection and model fine-tuning. My Weighted Factual Scoring (WFS) approach instead uses a scoring function that prioritizes factual accuracy over stylistic preferences, evaluated at inference time:

                            ScoreWFS = wfact·Sfactuality + wsafe·Ssafety + wcoh·Scoherence
                        

where w_fact=0.45, w_safe=0.35, w_coh=0.20 (learned from production data)

Key advantages over RLHF/DPO: (1) No model retraining required—works with any base LLM; (2) Weights adjustable in real-time based on use case; (3) 10x faster to deploy; (4) Transparent decision-making through interpretable scores.

# Tree-of-Thought BFS Guardrail Evaluator (Nidamanuri, 2024)
class ToTBFSGuardrail:
    """Breadth-First Search evaluation with weighted factual scoring."""
    
    def __init__(self, weights={'factuality': 0.45, 'safety': 0.35, 'coherence': 0.20}):
        self.weights = weights
        self.threshold = 0.85
        self.tree = {}  # Stores evaluation tree for interpretability

    def evaluate_bfs(self, llm_output, context):
        """BFS expansion: evaluate all dimensions at each level before going deeper."""
        queue = [(llm_output, 'root', 0)]  # (content, node_id, depth)
        scores = {dim: [] for dim in self.weights}
        
        while queue:
            content, node_id, depth = queue.pop(0)
            
            # Level 1: Parallel dimension evaluation
            if depth == 0:
                for dim in self.weights:
                    child_score = self._evaluate_dimension(content, dim)
                    scores[dim].append(child_score)
                    queue.append((content, f'{dim}_L1', 1))
            
            # Level 2: Deep fact extraction for factuality branch
            elif depth == 1 and 'factuality' in node_id:
                facts = self._extract_claims(content)
                for i, fact in enumerate(facts):
                    fact_score = self._verify_fact(fact, context)
                    scores['factuality'].append(fact_score)
        
        # Weighted aggregation (no model retraining needed!)
        final_score = sum(self.weights[d] * (sum(s)/len(s)) 
                          for d, s in scores.items() if s)
        
        return {
            'approved': final_score >= self.threshold,
            'score': final_score,
            'breakdown': scores,
            'tree': self.tree  # Full interpretability
        }
                

🌳 Tree-of-Thought BFS Evaluation Weighted Scoring No Retraining Real-time 68% Safety ↑ Sub-100ms

Experience

2024 — Present

Senior Machine Learning Engineer

Citi, Irving, TX

Invented Tree-of-Thought BFS guardrail framework with weighted factual scoring—reducing harmful outputs by 68% without expensive RLHF/DPO model retraining
Built CitiKG, a graph-augmented retrieval system with word-level precision citations, enabling agents to query 10K+ documents in <2 seconds during live customer calls
Architected hybrid RAG pipeline combining GraphRAG + BM25 + cross-encoder reranking, achieving 92% accuracy across 190+ customer intents
Deployed multi-agent orchestration using MCP (Model Context Protocol) with Llama 3.1 70B, SBERT, and Gemini Flash for specialized task routing
Implemented vLLM serving infrastructure with PagedAttention and continuous batching, reducing inference costs by 35%
Built LLM-as-Judge evaluation pipeline with interpretable scoring—adopted across 3 business units for production safety checks
Integrated Llama 4 Vision for multimodal document extraction (tables, images, diagrams) with positional metadata indexing

2024

Machine Learning Engineer

Federal Soft Systems (DoorDash Contract), Remote

Built Siamese network classifier with triplet loss for 300+ category product tagging (23% F1 improvement)
Engineered LLM-powered knowledge graph pipeline reducing manual annotation time by 90%
Developed real-time ML serving handling 1M+ predictions/day on Kubernetes with auto-scaling

2017 — 2022

Senior Data Scientist

eClinical Solutions, Bangalore, India

Developed deep learning models (CNNs, RNNs, autoencoders) for cellular imaging and drug screening in clinical trials
Built NLP pipelines using BERT for clinical document processing (30% efficiency improvement)
Designed scalable analytics pipelines processing petabyte-scale genomic data on Databricks
Deployed ML models via MLflow and Kubernetes on AWS for real-time clinical trial monitoring

Education

2022 — 2023

Master of Science in Artificial Intelligence

University of Michigan

Coursework: Deep Learning, Machine Learning, Computer Vision, Natural Language Processing, Intelligent Systems, Applied Statistics

2013 — 2017

Bachelor of Science in Computer Science

Andhra University

Publications

Academic