Architecture Overview
Copilot-LD is an intelligent agent leveraging GitHub Copilot, linked data and retrieval-augmented generation.
System Design
- gRPC Microservices: Single-responsibility services with gRPC communication
- Extensions: Plugin-based adapters for different applications
- Modularity: Framework-agnostic packages for maximum reusability
- Performance: Parallel processing and optimized vector operations
Communication Layer
- gRPC Protocol: All inter-service communication uses gRPC with Protocol Buffers
- REST APIs: Extensions expose REST endpoints for external client integration
-
Schema Definition: Authoritative protobuf schemas in
/proto
(copied verbatim into/generated/proto
during code generation) ensure type safety. Optional tool schemas in/tools
extend the core system with additional functionality. All runtime loading now reads exclusively from/generated/proto
for a single source of operational truth.
Service Architecture
- Agent Service: Central orchestrator managing request flow
-
Specialized Services: Domain-specific services
(
LLM
,Memory
,Vector
) - Parallelism: Readiness checks run in parallel; the main request flow is orchestrated sequentially per request
- Stateless Design: Services maintain no persistent state
Generated Code Workflow
Both code generation and service startup follow a unified three-step workflow to ensure generated code is available:
-
Resolve Source Path: Use
storageFactory("generated", "local")
to determine the canonical storage location for generated artifacts -
Ensure Storage Exists: Call
ensureBucket()
to create the storage directory if it doesn't exist -
Create Package Symlinks: Use
Finder.createPackageSymlinks()
to link package target directories (packages/libtype/generated/
,packages/librpc/generated/
) to the source location
This workflow is used by both the code generation binary (npm run codegen
) and the download utility that services use at startup. When
STORAGE_TYPE=s3
, services automatically download and
extract the latest generated code bundle from remote storage before
creating symlinks.
Directory Structure
./services/ # gRPC service implementations (custom logic only) ./extensions/ # Extensions that adapt core system to applications ./packages/ # Reusable, domain-focused logic (framework agnostic) ./scripts/ # Utility scripts & code generation ./proto/ # Authoritative protobuf source schemas (never edited in generated/) ./tools/ # Custom tools that extend the core system ./generated/ # ALL generated artifacts (proto copies, types, service bases, clients) ./data/ # Knowledge base, vectors, and resource data
High-Level Architecture
flowchart TD A[Clients] B[Extensions] C[Agent service] D[Memory service] E[LLM service] F[Tool service] G[Vector service] H[GitHub API] I[LLM backend] J[Memory store] L[Content Vector Index] N[Descriptor Vector Index] O[Resource Storage] %% Clients communicate with Extensions over REST A -- REST --> B %% Extensions interact with the Agent via gRPC B -- gRPC --> C %% Agent service interact with backend services C -- gRPC --> D C -- gRPC --> E C -- gRPC --> F C -- gRPC --> G C -- REST --> H %% Interaction with the foundation model E -- REST --> I %% Services that interact with storage D -- Local I/O --> J G -- Local I/O --> L G -- Local I/O --> N C -- Local I/O --> O
Online Sequence Diagram
The online sequence diagram reflects the current implementation in the
codebase. The Agent performs operations sequentially: it validates the
GitHub token, ensures service readiness in parallel (including Tool
service), creates an embedding for the latest user message, queries the
content
vector index, appends similar identifiers to memory
(fire-and-forget), computes a token budget (subtracting
assistant.content.tokens
and applying configured
allocation), assembles a memory window using the query vector and
budget, and then enters a tool calling loop that iteratively requests
completions and executes tools until no more tool calls are needed. Only
readiness checks are executed in parallel; the main request flow is
sequential.
Copilot-LD is an intelligent agent leveraging GitHub Copilot, linked
data and retrieval-augmented generation. As of 2025-09-05 the platform
consolidates all generated protobuf artifacts into a single
/generated
directory to eliminate ambiguity between source
and generated files, simplifying extension with additional tool schemas
that extend the core system.
sequenceDiagram participant Client participant Extension as Extensions participant Agent as Agent service participant Memory as Memory service participant LLM as LLM service participant Tool as Tool service participant Vector as Vector service participant ContentIndex as Content Vector Index participant ResourceIndex as Resource Index participant GitHub as GitHub API Client->>Extension: REST request Extension->>Agent: RPC request (ProcessRequest) Note right of Agent: Ensure clients ready in parallel (LLM/Vector/Memory/Tool) par Parallel Readiness & Validation Agent->>Memory: ensureReady() Agent->>LLM: ensureReady() Agent->>Vector: ensureReady() Agent->>Tool: ensureReady() and Agent->>GitHub: GET /user (validate token) GitHub-->>Agent: 200 OK (user) end Agent->>ResourceIndex: Get/Create conversation Agent->>ResourceIndex: Put user message (scoped to conversation) Agent->>ResourceIndex: Get assistant by id (from config) Note right of Agent: Compute token budget (budget = config.budget.tokens - assistant.content.tokens) and derive allocation Agent->>LLM: CreateEmbeddings (latest user message) LLM-->>Agent: Embedding Agent->>Vector: QueryItems (index: content) Vector->>ContentIndex: QueryIndex (similarity search) ContentIndex-->>Vector: Matches Vector-->>Agent: Identifiers (scored) Agent--)Memory: Append (identifiers) Note right of Agent: Fire-and-forget append (no await) Agent->>Memory: GetWindow (for conversation, with vector/budget/allocation) Memory-->>Agent: Window (tools, context, history) Agent->>ResourceIndex: Resolve window identifiers (tools/context/history) ResourceIndex-->>Agent: Resources (policy-filtered) Note over Agent: Tool Calling Loop (max 10 iterations) loop Until no tool calls or max iterations Agent->>LLM: CreateCompletions (assistant + tasks + tools + context + history) LLM-->>Agent: Completion with potential tool calls alt Tool calls present loop For each tool call Agent->>Tool: ExecuteTool (tool call + github_token) Tool-->>Agent: Tool result message Note right of Agent: Add tool result to messages end Note right of Agent: Continue loop with updated messages else No tool calls Note right of Agent: Exit loop - completion ready end end Agent->>ResourceIndex: Put final completion message (scoped to conversation) Agent-->>Extension: RPC response (with conversation_id) Extension-->>Client: REST response
Service Responsibilities
Agent Service
Central orchestrator that coordinates all other services. Requests are
processed sequentially as shown in the sequence diagram (readiness
checks run in parallel). The service integrates a
ResourceIndex
for direct resource access with policy-based
access control.
Message assembly order and budgeting
-
Assembly order (passed to completions):
assistant
→tasks
→tools
→context
→history
-
gRPC Protocol: All inter-service communication uses
gRPC with Protocol Buffers. Services load schemas exclusively from
/generated/proto
produced by code generation. -
Budgeting: Effective token budget is computed as
max(0, config.budget.tokens - assistant.content.tokens)
, with optional allocation shaping fortools
,history
, andcontext
- Window inputs: Memory window selection is driven by the query vector, effective budget, and optional allocation ratios
Memory Service
Manages transient conversation context as resource identifiers with JSONL (newline-delimited JSON) storage for efficient appends. Provides memory windows used by the Agent to assemble context.
LLM Service
Interfaces with language models for embedding generation and text completion. Handles communication with external AI services.
Vector Service
Performs similarity search operations against dual vector indexes
(content and descriptor). Returns resource identifiers and similarity
scores ordered by relevance, with support for index type selection and
token-based filtering. The current Agent runtime path queries the
content
index by default.
Tool Service
Acts as a gRPC proxy between tool calls requested by LLMs and actual tool implementations. Supports both mapping to existing services and custom tool implementations that extend the core system through configuration-driven endpoint mapping.
Key Operations:
-
ExecuteTool
: Proxies tool calls to appropriate services based on configuration mapping -
ListTools
: Returns available tools with OpenAI-compatible schemas for LLM consumption
Architecture:
- Configuration-driven: Tools defined via endpoint mappings in configuration files
- Pure proxy: No business logic, just routing and protocol conversion
-
Extensible: Supports mapping to existing services
(e.g.,
vector.QueryItems
) and custom tools that extend the platform - Schema generation: Automatically generates OpenAI-compatible JSON schemas from protobuf types
Offline Sequence Diagram
The platform includes offline scripts for knowledge base preparation and vector index creation. These scripts process external knowledge sources into searchable vector indices before the services are deployed.
sequenceDiagram participant Dev as Developer participant Download as scripts/download.js participant GitHub as GitHub API participant Chunk as scripts/chunk.js participant Index as scripts/index.js participant LLM as LLM API participant Storage as Local Storage Dev->>Download: npm run download Download->>GitHub: Fetch latest release artifacts GitHub-->>Download: Release assets (.tar.gz) Download->>Storage: Extract to data/knowledge/ Note right of Download: HTML files with microdata Dev->>Resources: node scripts/resources.js Resources->>Storage: Read HTML files from data/knowledge/ Storage-->>Resources: HTML content with microdata loop For each HTML file Resources->>Resources: Extract microdata items as resources Resources->>Resources: Generate resource identifier (URI) Resources->>Resources: Apply policy filtering Resources->>Storage: Store resource.json with metadata end Resources->>Storage: Persist resource index Dev->>Index: node scripts/index.js Index->>Storage: Load resource index Storage-->>Index: All resource metadata loop Process resources in batches (content & descriptor) Index->>Storage: Load resource content Storage-->>Index: Resource content/descriptors Index->>LLM: Create embeddings for resource batch LLM-->>Index: Resource embeddings Index->>Storage: Add to dual vector indexes end Index->>Storage: Persist content & descriptor vector indexes Note right of Storage: Ready for runtime vector search with dual indexes
Offline Processing Workflow
1. Knowledge Download (scripts/download.js
)
- Downloads latest release artifacts from GitHub repository
-
Extracts compressed archives to
data/knowledge/
directory - Provides HTML files with structured microdata for processing
2. Resource Processing (scripts/resources.js
)
- Scans HTML files for microdata items with configurable selectors
- Extracts structured content and generates unique resource identifiers using URI format
- Applies policy-based filtering for access control
- Stores individual resources with metadata in unified resource storage
- Creates searchable resource index with type-based organization
3. Vector Indexing (scripts/index.js
)
- Dual-Index Architecture: Creates separate content and descriptor vector indexes
- Batch Processing: Processes resources in token-optimized batches to minimize API calls
- Embedding Generation: Creates vector embeddings for both content and descriptors via LLM API
- Index Creation: Builds dual indexes for flexible search capabilities
- Persistence: Stores indexes to disk for runtime access with index type selection
Key Characteristics
- Offline Execution: All processing occurs before service deployment
- API Optimization: Batched requests minimize LLM API calls and costs
- Resource-Based Architecture: Unified resource management with policy control
- Flexible Search: Dual-index system enables content and descriptor-based searches
- Incremental Processing: Skips existing resources to support iterative updates
- Token Management: Respects API token limits while maximizing batch efficiency
Data Flow
- Raw Knowledge → HTML files with microdata
- Structured Resources → Individual JSON files with metadata and URI identifiers
- Vector Embeddings → Numerical representations for similarity search (content & descriptors)
- Dual Vector Indexes → Content and descriptor databases ready for runtime queries with index type selection
This offline pipeline ensures that runtime services can perform fast vector similarity searches without depending on external APIs or requiring real-time embedding generation.
Code Generation
The platform uses a centralized code generation approach with all
generated artifacts consolidated in a single
generated/
directory at the project root. The
@copilot-ld/libcodegen
package provides the codegen utility that discovers protobuf schemas
from both core services and tools, generating type-safe JavaScript code
and service infrastructure.
Generation Process
-
Setup Workflow: Follows the unified three-step
workflow (resolve source path, ensure storage exists, create symlinks)
to prepare the
generated/
directory -
Schema Discovery: Scans
proto/
for core service definitions andtools/
for extension schemas that extend the platform -
Centralized Output: Creates all artifacts in
generated/
with organized sub-directories for types, services, and copied proto files -
Type Generation: Produces consolidated JavaScript
types and TypeScript declarations accessible via
@copilot-ld/libtype
-
Service Generation: Creates base service classes and
typed clients for gRPC communication via
@copilot-ld/librpc
- Package Integration: Symlinks created during setup make generated code available through framework-agnostic packages
Output Structure
./generated/ ├── proto/ # Copied .proto files for runtime loading ├── types/ │ └── types.js # Consolidated protobuf type definitions └── services/ ├── exports.js # Aggregated service and client exports └── service-name/ ├── service.js # Generated base class └── client.js # Generated typed client
Code Generation Commands
Re-run code generation after changing schemas in proto/
or
adding tool definitions in tools/
:
# Generate everything (recommended)
npm run codegen
# Generate specific components
npm run codegen:type # Types only
npm run codegen:service # Service base classes only
npm run codegen:client # Client classes only
npm run codegen:definition # Service definitions only
The @copilot-ld/libcodegen
package uses
protobufjs
for schema processing and
Mustache
templates for code generation, producing ES
modules with full type safety. Services automatically create package
symlinks at startup to ensure generated code is accessible through
@copilot-ld/libtype
and @copilot-ld/librpc
while maintaining clean package separation.
Generated Artifact Structure
./generated/ ├── bundle.tar.gz # Compressed archive of all generated artifacts ├── proto/ # Copied original .proto files for runtime access ├── types/ │ └── types.js # Consolidated protobuf type definitions └── services/ # Service base classes and clients ├── agent/ │ ├── service.js # Generated base class │ └── client.js # Generated typed client ├── exports.js # Aggregated service/client exports └── definitions/ # Pre-compiled service definitions ├── agent.js # gRPC service definition for Agent ├── vector.js # gRPC service definition for Vector └── exports.js # Aggregated definitions exports # Package symlinks created automatically at service startup ./packages/libtype/generated/ → symlink to ./generated/ ./packages/librpc/generated/ → symlink to ./generated/
Code Generation Workflow
The centralized generation approach ensures consistency and eliminates ambiguity between source and generated artifacts:
-
Unified Location: All generated code lives in a
single
generated/
directory at the project root -
Schema Discovery: Automatically finds protobuf
schemas in
proto/
and tool extensions intools/
-
Type Safety: Creates unified type definitions
accessible via
@copilot-ld/libtype
-
Service Infrastructure: Produces base classes, typed
clients, and pre-compiled service definitions through
@copilot-ld/librpc
- Pre-compiled Definitions: Generates runtime-ready gRPC service definitions that eliminate the need for proto-loader dependencies
- Package Integration: Service startup automatically creates symlinks for clean access through framework-agnostic packages
- Tool Extension: Supports additional schemas that extend core system functionality
Docker Build Architecture
The platform uses a unified Docker build process that simplifies container creation and reduces maintenance overhead.
Unified Dockerfile
A single root-level Dockerfile
handles building for all
components (services, extensions, and tools) using a
COMPONENT_PATH
build argument to specify which component to
build.
Build Process
- Multi-stage build: Separates build dependencies from runtime image for smaller production containers
-
Code generation: Uses
@copilot-ld/libcodegen
to generate required artifacts during build - Component copying: Copies only the specified component path to minimize image size
- Production optimization: Final stage includes only production dependencies and generated code
Usage Examples
# Build Agent service
docker build --build-arg COMPONENT_PATH=services/agent -t agent .
# Build Web extension
docker build --build-arg COMPONENT_PATH=extensions/web -t web .
# Build custom tool
docker build --build-arg COMPONENT_PATH=tools/hash -t hash .
Benefits
- Consistency: All components use identical build process
- Maintenance: Single Dockerfile to update and maintain
- Flexibility: Easy to add new components without creating new Dockerfiles
- Optimization: Shared layers between components improve build efficiency
Security Architecture
The security design focuses on service authentication and secure communication channels.
Authentication Mechanisms
For specific configuration details including environment variables, token formats, and timeout values, see the Configuration Guide.
Current State
-
Service-to-Service: Cryptographic authentication is
implemented and enforced between internal services via the
@copilot-ld/librpc
package. All gRPC communication uses time-limited tokens with secure headers. -
Request-Level: The Agent validates incoming
github_token
by callingGET /user
via GitHub's API before processing requests.
Service Authentication (Current Implementation)
The platform implements cryptographic authentication for all service-to-service communication:
sequenceDiagram participant Service A participant Service B participant Authenticator Service A->>Authenticator: generateToken(serviceId) Authenticator->>Authenticator: Create payload: serviceId:timestamp Authenticator->>Authenticator: Sign with cryptographic algorithm Authenticator-->>Service A: Encoded authentication token Service A->>Service B: gRPC request + token Service B->>Authenticator: verifyToken(token) Authenticator->>Authenticator: Decode and validate signature Authenticator->>Authenticator: Check token expiration Authenticator-->>Service B: {isValid, serviceId, error} alt Token Valid Service B-->>Service A: Process request else Token Invalid Service B-->>Service A: Authentication error end
Token Generation Process (Implemented)
- Create payload combining service ID and current timestamp
- Generate cryptographic signature using a shared secret
- Encode authentication token with service identity and timestamp
- Attach token to gRPC metadata headers
Token Verification Process (Implemented)
- Extract token from authentication headers in gRPC metadata
- Decode token and extract service ID, timestamp, and cryptographic signature
- Verify timestamp is within configured token lifetime
- Recreate expected signature using shared secret
- Compare signatures using secure comparison
- Reject request with authentication error if validation fails
Communication Security
For specific protocol details, timeout configurations, and environment variables, see the Configuration Guide.
gRPC Internal Communication
- Protocol: gRPC over secure transport
- Authentication: Cryptographic signatures with time-limited tokens
- Token Lifetime: Short-lived configurable duration
- Secret Management: Shared secret from environment variables
- Schema Validation: Protocol Buffer message validation
External API Communication
- Clients to Extensions: HTTP/HTTPS via load balancer
- LLM Service to External APIs: HTTPS with API key authentication
Security Limitations
mTLS Not Implemented
Mutual TLS (mTLS) is not currently implemented between services. Future security enhancements should include:
- Certificate-based service authentication
- Encrypted gRPC communication channels
- Service identity verification via digital certificates
Rate-Limiting Not Implemented
Rate-limiting is not currently implemented for externally facing services. Future enhancements should include:
- Request throttling per client IP address to prevent abuse
- Adaptive rate limiting based on service resource utilization
- Token bucket or sliding window algorithms for burst traffic handling
- Configurable rate limits per extension type (web vs API clients)
Threat Model
Protected Against
- External Service Access: Backend services cannot be directly accessed from outside the Docker environment
- Service Impersonation: HMAC authentication prevents unauthorized service access with cryptographic verification
- Token Replay: Short-lived time-limited tokens minimize replay attack windows
- Request Forgery: HMAC signatures prevent message tampering and ensure request authenticity
Current Vulnerabilities
- Shared Secret Exposure: If the authentication secret is compromised, all service authentication is bypassed
- Container Compromise: If one container is compromised, the attacker gains access to the shared authentication secret
- Extension Security: Extensions are the primary attack surface and must implement their own input validation
Service Security Responsibilities
Extensions (Web, Copilot)
- Input validation and sanitization
- Rate limiting and DDoS protection
- Session management
- CORS policy enforcement
Agent Service
- Request orchestration security
- Service-to-service authentication enforcement
- Business logic security validation
Backend Services (Memory, LLM, Vector)
- gRPC message validation
- Resource usage limiting
- Data access controls
- Error handling without information disclosure