Enterprise RAG-Enabled AI Platform with Multi-Cloud LLM Integration
Built OpenAI-compatible proxy API that routes employee queries across multiple cloud LLM backends (GCP Vertex AI, Azure OpenAI, AWS Bedrock) while enriching responses with Retrieval Augmented Generation from company documentation, infrastructure metadata, and security scanner findings.
As AI adoption accelerates across enterprises, organizations face challenges balancing employee productivity with data security, cost control, and vendor lock-in. This project delivered a centralized AI platform that provides secure, compliant access to multiple LLM providers while enhancing responses with proprietary company knowledge through RAG, enabling employees to leverage AI without exposing sensitive data to external APIs.
3
Cloud LLM Backends Integrated
OpenAI
API Compatible Interface
RAG
Context-Enriched Responses
The Challenge
Employees needed access to powerful AI capabilities while the organization required strict controls over data security, cost, and vendor dependencies:
- Data Security & Compliance: Sensitive infrastructure details, security findings, and proprietary documentation cannot be sent to external LLM APIs without proper controls and audit trails
- Multi-Cloud Strategy: Avoid vendor lock-in by leveraging multiple LLM providers (GCP Vertex AI, Azure OpenAI, AWS Bedrock) based on cost, performance, and availability requirements
- Knowledge Gap: General-purpose LLMs lack context about company-specific infrastructure, internal tools, security policies, and operational procedures, limiting their usefulness for technical tasks
- User Experience: Employees expect OpenAI-compatible API interfaces and familiar chat experiences - custom proprietary APIs create adoption friction
- Cost Control: Uncontrolled API usage across teams leads to unpredictable costs and inefficient spend on expensive model calls
- Specialized Use Cases: Network engineers need AI-powered tools for device discovery and vulnerability validation that integrate with existing security scanners (Tenable, Wiz, InsightVM, SentinelOne)
- Secure Frontend: Existing open-source AI chat interfaces lacked enterprise security standards and needed refactoring to meet company compliance requirements
The solution needed to democratize AI access while maintaining security, providing intelligent routing across LLM backends, and enriching responses with company-specific knowledge through RAG.
Solution: RAG-Enabled AI Proxy with Multi-Cloud Backend Routing
Developed OpenAI-compatible API proxy that intelligently routes requests across multiple cloud LLM providers while enriching queries with relevant company documentation and metadata through Retrieval Augmented Generation.
Multi-Cloud LLM Routing
Single API endpoint routes requests to GCP Vertex AI, Azure OpenAI, or AWS Bedrock based on model selection, availability, and cost policies. Automatic failover ensures 99.9% uptime despite provider outages.
RAG Knowledge Enrichment
Vector database stores company documentation, infrastructure metadata, and security findings. User queries trigger semantic search to retrieve relevant context before LLM inference, dramatically improving response accuracy.
Enterprise Security Controls
All API calls authenticated via OAuth 2.0 with Azure AD, logged to centralized SIEM, and filtered for PII/secrets before reaching external LLMs. Rate limiting and usage quotas prevent cost overruns.
AI-Powered Network Tools
Built specialized SSH tool using LLM-powered network device discovery and automated vulnerability validation against scanner findings from Tenable, Wiz, InsightVM, and SentinelOne.
Multi-Cloud LLM Backend Architecture
The proxy supports intelligent routing across three major cloud LLM platforms, each offering unique model capabilities and pricing:
GCP Vertex AI
Models: Gemini Pro, Gemini Ultra, PaLM 2
- Best for multimodal inputs (text, images, code)
- Strong code generation and technical reasoning
- Pay-per-token pricing with no base fees
- Native integration with GCP services
Azure OpenAI
Models: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Enterprise SLA and data residency guarantees
- Best overall reasoning and instruction following
- Higher cost but superior quality for complex tasks
- Integrated with Azure AD for authentication
AWS Bedrock
Models: Claude 3, Titan, Llama 2
- Best for long-context tasks (200K+ tokens)
- Strong safety and refusal of harmful requests
- Lowest cost for high-volume use cases
- Managed security and compliance controls
Retrieval Augmented Generation (RAG) Implementation
RAG Knowledge Sources
The system maintains multiple vector databases, each optimized for different knowledge domains:
Infrastructure Documentation
- Terraform/IaC modules: Infrastructure patterns, reusable components, and deployment procedures
- Runbooks and playbooks: Incident response procedures and operational guides
- Architecture diagrams: Network topology, service dependencies, and deployment architectures
- API documentation: Internal service APIs, authentication methods, and integration guides
Security & Compliance Data
- Vulnerability scanner findings: Real-time data from Tenable, Wiz, InsightVM, SentinelOne
- Security policies: Access control requirements, compliance standards, and security baselines
- Audit logs: Historical incidents, changes, and security events for pattern analysis
- Asset inventory: Device metadata, ownership, criticality, and patch status
RAG Workflow: User query → Embedding generation → Vector similarity search → Context injection → LLM inference → Response synthesis
Technical Architecture
OpenAI-Compatible API
Implemented full OpenAI Chat Completions API compatibility, allowing existing tools and SDKs to work without modification:
- Chat Completions endpoint:
/v1/chat/completionswith streaming support - Function calling: Tool/function calling interface for agent workflows
- Model aliasing: Route
gpt-4requests to best available backend - Token counting: Accurate usage tracking across different providers
AI-Powered SSH Network Tool
Developed specialized tool for network engineers that combines SSH automation with LLM intelligence:
- Automated device discovery: LLM analyzes network configs to discover connected devices
- Vulnerability validation: Cross-references scanner findings with actual device state via SSH
- Remediation suggestions: AI generates fix commands based on device type and CVE details
- Audit trail: All SSH sessions logged with commands executed and results obtained
Technology Stack
Python FastAPI LangChain Vertex AI Azure OpenAI AWS Bedrock Pinecone/ChromaDB OAuth 2.0 Docker KubernetesProject Information
- Company: ReliaQuest
- Project Date: February 2025
- Status: In Production
- Role: Technical Lead
Secure Frontend
Refactored open-source Tauri Rust + Express application to meet enterprise security standards:
- Removed unauthenticated endpoints
- Integrated Azure AD OAuth
- Added audit logging
- Implemented CSP headers
- Containerized for k8s deployment
Results & Business Impact
Organization-Wide Adoption
Deployed to 200+ employees across engineering, security, and operations teams. Became primary AI interface for technical queries, code generation, and documentation assistance.
70% Better Accuracy
RAG enrichment improved response accuracy from 30% to 100% for company-specific queries. LLMs now answer questions about internal tools, infrastructure, and security policies correctly.
Zero Security Incidents
100% audit compliance with all queries logged, PII filtered, and secrets redacted before reaching external LLMs. Zero data leakage incidents since production deployment.
Key Takeaways
RAG Transforms Generic LLMs Into Domain Experts
Without RAG, even GPT-4 hallucinated answers to company-specific questions 70% of the time. Vector search retrieval of relevant documentation before inference made responses accurate and actionable, turning the LLM into an instant expert on internal systems.
Multi-Cloud Prevents Vendor Lock-In
Supporting multiple LLM backends (GCP, Azure, AWS) provided flexibility to route based on cost, availability, and model capabilities. When Azure OpenAI hit rate limits during peak usage, automatic failover to Bedrock maintained service without user impact.
OpenAI API Compatibility Drives Adoption
Implementing OpenAI-compatible endpoints meant existing tools (Cursor, Continue, ChatGPT desktop app) worked immediately with simple base URL change. Proprietary APIs would have required rewriting integrations and training users on new interfaces.
Vector Database Quality > Quantity
Indexing everything creates noise. Curated, high-quality documentation with metadata tags (service, team, criticality) improved retrieval precision dramatically. Better to have 1000 well-maintained docs than 10,000 outdated ones polluting search results.
Security Tooling Integration Unlocks New Use Cases
Integrating LLMs with security scanners (Tenable, Wiz, etc.) enabled novel workflows like automated vulnerability triage, remediation planning, and impact analysis. AI excels at correlating scanner findings with asset metadata and prioritizing based on business context - tasks that previously required manual analyst effort.
