Architecture Overview
Copilot-LD is an intelligent agent leveraging GitHub Copilot, linked data and retrieval-augmented generation.
System Design
- gRPC Microservices: Single-responsibility services with gRPC communication
- Extensions: Plugin-based adapters for different applications
- Modularity: Framework-agnostic packages for maximum reusability
- Performance: Parallel processing and optimized vector operations
Communication Layer
- gRPC Protocol: All inter-service communication uses gRPC with Protocol Buffers
- REST APIs: Extensions expose REST endpoints for external client integration
-
Schema Definition: Protobuf schemas in
/proto
ensure type safety
Service Architecture
- Agent Service: Central orchestrator managing request flow
- Specialized Services: Domain-specific services (history, LLM, vector, text)
- Parallel Processing: Services execute operations concurrently
- Stateless Design: Services maintain no persistent state
Directory Structure
./services/ # gRPC services ./extensions/ # Extensions that adapt core system to applications ./packages/ # Reusable, domain-focused logic ./tools/ # Utility scripts for dev and test ./data/ # Definitions, vectors, and chunk data
High-Level Architecture
flowchart TD A[Clients] B[Extensions] C[Agent service] D[History service] E[LLM service] G[Vector service] H[Text service] I[LLM backend] J[History cache] L[Vector index] M[Chunk index] %% Clients communicate with Extensions over REST A -- REST --> B %% Extensions interact with the Agent via gRPC B -- gRPC --> C %% Agent service interact with backend services C -- gRPC --> D C -- gRPC --> E C -- gRPC --> G C -- gRPC --> H %% Interaction with the foundation model E -- REST --> I %% Services that interact with storage D -- Local I/O --> J G -- Local I/O --> L H -- Local I/O --> M
Online Sequence Diagram
The online sequence diagram illustrates the real-time request processing flow when the platform services are actively running. This shows how a client request flows through the REST extensions, gets orchestrated by the Agent service, and triggers parallel operations across specialized backend services to retrieve relevant information through vector similarity search.
sequenceDiagram participant Client participant Extension as Extensions participant Agent as Agent service participant History as History service participant LLM as LLM service participant Vector as Vector service participant Text as Text service participant Index as Vector index Client->>Extension: REST request Extension->>Client: REST response Extension->>Agent: RPC request (ProcessRequest) Agent->>Extension: RPC response par Parallel Agent->>History: RPC request (GetHistory) History-->>Agent: RPC response and Agent->>LLM: RPC request (CreateEmbeddings) LLM-->>Agent: RPC response end Agent->>Vector: RPC request (QueryItems) Vector->>Index: I/O request (QueryIndex) Index-->>Vector: I/O response Vector-->>Agent: RPC response Note left of Vector: Orders and reduces
results by similarity Agent->>Text: RPC request (GetChunks) Text-->>Agent: RPC response Agent--)History: RPC request (UpdateHistory) Note right of Agent: Fire-and-forget,
no response awaited Agent->>Extension: RPC response Extension->>Client: REST response
Service Responsibilities
Agent Service
Central orchestrator that coordinates all other services. Processes requests by executing operations in parallel for optimal performance and manages the complete business logic flow.
History Service
Maintains conversation history and context. Provides historical data for request processing and stores interaction records for continuity.
LLM Service
Interfaces with language models for embedding generation and text completion. Handles communication with external AI services.
Vector Service
Performs similarity search operations against a vector index. Returns chunk IDs and similarity scores ordered by relevance.
Text Service
Retrieves text content for chunks by their IDs. Provides the actual content corresponding to vector search results.
Offline Sequence Diagram
The platform includes offline tools for knowledge base preparation and vector index creation. These tools process external knowledge sources into searchable vector indices before the services are deployed.
sequenceDiagram participant Dev as Developer participant Download as tools/download.js participant GitHub as GitHub API participant Chunk as tools/chunk.js participant Index as tools/index.js participant LLM as LLM API participant Storage as Local Storage Dev->>Download: npm run download Download->>GitHub: Fetch latest release artifacts GitHub-->>Download: Release assets (.tar.gz) Download->>Storage: Extract to data/knowledge/ Note right of Download: HTML files with microdata Dev->>Chunk: npm run chunk Chunk->>Storage: Read HTML files from data/knowledge/ Storage-->>Chunk: HTML content with microdata loop For each HTML file Chunk->>Chunk: Extract microdata items Chunk->>Chunk: Generate chunk ID (SHA-256) Chunk->>Chunk: Format as JSON, count tokens Chunk->>Storage: Store chunk.json in data/chunks/{id}/ end Chunk->>Storage: Persist chunk index Dev->>Index: npm run index Index->>Storage: Load chunk index Storage-->>Index: All chunk metadata loop Process chunks in batches Index->>Storage: Load chunk text content Storage-->>Index: Chunk text Index->>LLM: Create embeddings for chunk batch LLM-->>Index: Chunk embeddings Index->>Storage: Add to vector index end Index->>Storage: Persist vector index Note right of Storage: Ready for runtime vector search
Offline Processing Workflow
1. Knowledge Download (tools/download.js)
- Downloads latest release artifacts from GitHub repository
-
Extracts compressed archives to
data/knowledge/
directory - Provides HTML files with structured microdata for processing
2. Chunk Processing (tools/chunk.js)
- Scans HTML files for microdata items with configurable selectors
- Extracts structured content and generates unique chunk IDs using SHA-256
- Formats content as JSON and calculates token counts
-
Stores individual chunks in
data/chunks/{id}/chunk.json
- Creates searchable chunk index for efficient retrieval
3. Vector Indexing (tools/index.js)
- Batch Processing: Processes chunks in token-optimized batches to minimize API calls
- Embedding Generation: Creates vector embeddings for all chunks via LLM API
- Index Creation: Builds a vector index containing all chunks
- Persistence: Stores the index to disk for runtime access
Key Characteristics
- Offline Execution: All processing occurs before service deployment
- API Optimization: Batched requests minimize LLM API calls and costs
- Comprehensive Organization: Single index enables comprehensive search across all content
- Incremental Processing: Skips existing chunks to support iterative updates
- Token Management: Respects API token limits while maximizing batch efficiency
Data Flow
- Raw Knowledge → HTML files with microdata
- Structured Chunks → Individual JSON files with metadata
- Vector Embeddings → Numerical representations for similarity search
- Vector Index → Single vector database containing all content ready for runtime queries
This offline pipeline ensures that runtime services can perform fast vector similarity searches without depending on external APIs or requiring real-time embedding generation.
Security Architecture
The security design focuses on network isolation, service authentication, and secure communication channels.
Network Topology
The system implements a defense-in-depth approach with network isolation between external-facing extensions and internal backend services.
Network Architecture
graph TB subgraph "Host Network" Client[External Clients] end subgraph "External Network (copilot-ld.external)" Web[Web Extension
:3000] Copilot[Copilot Extension
:3001] end subgraph "Internal Network (copilot-ld.internal)" Agent[Agent Service
:3000] History[History Service
:3000] LLM[LLM Service
:3000] Vector[Vector Service
:3000] Text[Text Service
:3000] end subgraph "External Services" LLMAPI[LLM API
OpenAI/etc] end %% External connections Client -.->|REST/HTTP| Web Client -.->|REST/HTTP| Copilot %% Extension to Agent connections (via network bridge) Web -->|gRPC| Agent Copilot -->|gRPC| Agent %% Internal service mesh (isolated network) Agent -->|gRPC| History Agent -->|gRPC| LLM Agent -->|gRPC| Vector Agent -->|gRPC| Text %% External API calls LLM -.->|HTTPS| LLMAPI style Web fill:#e1f5fe style Copilot fill:#e1f5fe style Agent fill:#f3e5f5 style History fill:#fff3e0 style LLM fill:#fff3e0 style Vector fill:#fff3e0 style Text fill:#fff3e0
Port Exposure Strategy
-
Web Extension (
copilot-ld.web
): Exposes port 3000 to host, bridges external and internal networks -
Copilot Extension (
copilot-ld.copilot
): Exposes port 3001 to host, bridges external and internal networks - Backend Services: No host port exposure - isolated on internal network only
Network Isolation Benefits
- Enhanced Attack Surface Reduction: Backend services are completely isolated on internal network
- Network Segmentation: Extensions on external network bridge to internal network for controlled access
- Service Mesh Isolation: Internal gRPC communication is fully segmented from external traffic
- Defense in Depth: Dual network topology provides additional security boundaries
Authentication Mechanisms
Authentication Flow
The platform implements HMAC-SHA256 authentication for
service-to-service communication using the
HmacAuth
class.
sequenceDiagram participant Service A participant Service B participant Authenticator Service A->>Authenticator: generateToken(serviceId) Authenticator->>Authenticator: Create payload: serviceId:timestamp Authenticator->>Authenticator: Sign with HMAC-SHA256 Authenticator-->>Service A: Base64 encoded token Service A->>Service B: gRPC request + token Service B->>Authenticator: verifyToken(token) Authenticator->>Authenticator: Decode and validate signature Authenticator->>Authenticator: Check token expiration Authenticator-->>Service B: {isValid, serviceId, error} alt Token Valid Service B-->>Service A: Process request else Token Invalid Service B-->>Service A: Authentication error end
HMAC Implementation Details
- Algorithm: HMAC-SHA256
-
Secret: Shared via
SERVICE_AUTH_SECRET
environment variable (minimum 32 characters) -
Token Format:
Base64(serviceId:timestamp:signature)
- Token Lifetime: 60 seconds (configurable)
-
Payload Structure:
serviceId:timestamp
Token Generation Process
- Create payload combining service ID and current timestamp
- Generate HMAC-SHA256 signature using shared secret
-
Encode as Base64:
Base64(serviceId:timestamp:signature)
Token Verification Process
- Decode Base64 token
- Extract service ID, timestamp, and signature
- Verify timestamp is within token lifetime
- Recreate expected signature using shared secret
- Compare signatures using constant-time comparison
Communication Security
gRPC Internal Communication
- Protocol: gRPC over HTTP/2
- Network: Isolated Docker bridge network
- Authentication: HMAC tokens
- Schema Validation: Protocol Buffer message validation
External API Communication
- Extensions to Clients: REST over HTTP (can be upgraded to HTTPS)
- LLM Service to External APIs: HTTPS with API key authentication
Security Limitations
mTLS Not Implemented
Mutual TLS (mTLS) is not currently implemented between services. Future security enhancements should include:
- Certificate-based service authentication
- Encrypted gRPC communication channels
- Service identity verification via X.509 certificates
Rate-Limiting Not Implemented
Rate-limiting is not currently implemented for externally facing services. Future enhancements should include:
- Request throttling per client IP address to prevent abuse
- Adaptive rate limiting based on service resource utilization
- Token bucket or sliding window algorithms for burst traffic handling
- Configurable rate limits per extension type (web vs API clients)
Threat Model
Protected Against
- External Service Access: Backend services cannot be directly accessed from outside the Docker network
- Service Impersonation: HMAC authentication prevents unauthorized service access (when enabled)
- Token Replay: Time-limited tokens reduce replay attack windows
Current Vulnerabilities
- Network Sniffing: Internal gRPC traffic is unencrypted
- Container Compromise: If one container is compromised, it can access other services on the same network
- Extension Security: Extensions are the primary attack surface and must implement their own input validation
Service Security Responsibilities
Extensions (Web, Copilot)
- Input validation and sanitization
- Rate limiting and DDoS protection
- Session management
- CORS policy enforcement
Agent Service
- Request orchestration security
- Service-to-service authentication enforcement
- Business logic security validation
Backend Services (History, LLM, Vector, Text)
- gRPC message validation
- Resource usage limiting
- Data access controls
- Error handling without information disclosure