Copilot-LD – Architecture

Architecture Overview

Copilot-LD is an intelligent agent leveraging GitHub Copilot, linked data and retrieval-augmented generation.

System Design

gRPC Microservices: Single-responsibility services with gRPC communication
Extensions: Plugin-based adapters for different applications
Modularity: Framework-agnostic packages for maximum reusability
Performance: Parallel processing and optimized vector operations

Communication Layer

gRPC Protocol: All inter-service communication uses gRPC with Protocol Buffers
REST APIs: Extensions expose REST endpoints for external client integration
Schema Definition: Authoritative protobuf schemas in /proto (copied verbatim into /generated/proto during code generation) ensure type safety. Optional tool schemas in /tools extend the core system with additional functionality. All runtime loading now reads exclusively from /generated/proto for a single source of operational truth.

Service Architecture

Agent Service: Central orchestrator managing request flow
Specialized Services: Domain-specific services (LLM, Memory, Vector)
Parallelism: Readiness checks run in parallel; the main request flow is orchestrated sequentially per request
Stateless Design: Services maintain no persistent state

Generated Code Workflow

Both code generation and service startup follow a unified three-step workflow to ensure generated code is available:

Resolve Source Path: Use storageFactory("generated", "local") to determine the canonical storage location for generated artifacts
Ensure Storage Exists: Call ensureBucket() to create the storage directory if it doesn't exist
Create Package Symlinks: Use Finder.createPackageSymlinks() to link package target directories (packages/libtype/generated/, packages/librpc/generated/) to the source location

This workflow is used by both the code generation binary (npm run codegen) and the download utility that services use at startup. When STORAGE_TYPE=s3, services automatically download and extract the latest generated code bundle from remote storage before creating symlinks.

Directory Structure

./services/          # gRPC service implementations (custom logic only)
./extensions/        # Extensions that adapt core system to applications
./packages/          # Reusable, domain-focused logic (framework agnostic)
./scripts/           # Utility scripts & code generation
./proto/             # Authoritative protobuf source schemas (never edited in generated/)
./tools/             # Custom tools that extend the core system
./generated/         # ALL generated artifacts (proto copies, types, service bases, clients)
./data/              # Knowledge base, vectors, and resource data

High-Level Architecture

flowchart TD
    A[Clients]
    B[Extensions]
    C[Agent service]
    D[Memory service]
    E[LLM service]
    F[Tool service]
    G[Vector service]
    H[GitHub API]
    I[LLM backend]
    J[Memory store]
    L[Content Vector Index]
    N[Descriptor Vector Index]
    O[Resource Storage]

    %% Clients communicate with Extensions over REST
    A -- REST --> B

    %% Extensions interact with the Agent via gRPC
    B -- gRPC --> C

    %% Agent service interact with backend services
    C -- gRPC --> D
    C -- gRPC --> E
    C -- gRPC --> F
    C -- gRPC --> G
    C -- REST --> H

    %% Interaction with the foundation model
    E -- REST --> I

    %% Services that interact with storage
    D -- Local I/O --> J
    G -- Local I/O --> L
    G -- Local I/O --> N
    C -- Local I/O --> O

Online Sequence Diagram

The online sequence diagram reflects the current implementation in the codebase. The Agent performs operations sequentially: it validates the GitHub token, ensures service readiness in parallel (including Tool service), creates an embedding for the latest user message, queries the content vector index, appends similar identifiers to memory (fire-and-forget), computes a token budget (subtracting assistant.content.tokens and applying configured allocation), assembles a memory window using the query vector and budget, and then enters a tool calling loop that iteratively requests completions and executes tools until no more tool calls are needed. Only readiness checks are executed in parallel; the main request flow is sequential.

Copilot-LD is an intelligent agent leveraging GitHub Copilot, linked data and retrieval-augmented generation. As of 2025-09-05 the platform consolidates all generated protobuf artifacts into a single /generated directory to eliminate ambiguity between source and generated files, simplifying extension with additional tool schemas that extend the core system.

sequenceDiagram
    participant Client
    participant Extension as Extensions
    participant Agent as Agent service
    participant Memory as Memory service
    participant LLM as LLM service
    participant Tool as Tool service
    participant Vector as Vector service
    participant ContentIndex as Content Vector Index
    participant ResourceIndex as Resource Index
    participant GitHub as GitHub API

    Client->>Extension: REST request
    Extension->>Agent: RPC request (ProcessRequest)

    Note right of Agent: Ensure clients ready in parallel (LLM/Vector/Memory/Tool)
    par Parallel Readiness & Validation
        Agent->>Memory: ensureReady()
        Agent->>LLM: ensureReady()
        Agent->>Vector: ensureReady()
        Agent->>Tool: ensureReady()
    and
        Agent->>GitHub: GET /user (validate token)
        GitHub-->>Agent: 200 OK (user)
    end

    Agent->>ResourceIndex: Get/Create conversation
    Agent->>ResourceIndex: Put user message (scoped to conversation)
    Agent->>ResourceIndex: Get assistant by id (from config)
    Note right of Agent: Compute token budget (budget = config.budget.tokens - assistant.content.tokens) and derive allocation

    Agent->>LLM: CreateEmbeddings (latest user message)
    LLM-->>Agent: Embedding

    Agent->>Vector: QueryItems (index: content)
    Vector->>ContentIndex: QueryIndex (similarity search)
    ContentIndex-->>Vector: Matches
    Vector-->>Agent: Identifiers (scored)

    Agent--)Memory: Append (identifiers)
    Note right of Agent: Fire-and-forget append (no await)

    Agent->>Memory: GetWindow (for conversation, with vector/budget/allocation)
    Memory-->>Agent: Window (tools, context, history)

    Agent->>ResourceIndex: Resolve window identifiers (tools/context/history)
    ResourceIndex-->>Agent: Resources (policy-filtered)

    Note over Agent: Tool Calling Loop (max 10 iterations)
    loop Until no tool calls or max iterations
        Agent->>LLM: CreateCompletions (assistant + tasks + tools + context + history)
        LLM-->>Agent: Completion with potential tool calls
        
        alt Tool calls present
            loop For each tool call
                Agent->>Tool: ExecuteTool (tool call + github_token)
                Tool-->>Agent: Tool result message
                Note right of Agent: Add tool result to messages
            end
            Note right of Agent: Continue loop with updated messages
        else No tool calls
            Note right of Agent: Exit loop - completion ready
        end
    end

    Agent->>ResourceIndex: Put final completion message (scoped to conversation)

    Agent-->>Extension: RPC response (with conversation_id)
    Extension-->>Client: REST response

Service Responsibilities

Agent Service

Central orchestrator that coordinates all other services. Requests are processed sequentially as shown in the sequence diagram (readiness checks run in parallel). The service integrates a ResourceIndex for direct resource access with policy-based access control.

Message assembly order and budgeting

Assembly order (passed to completions): assistant → tasks → tools → context → history
gRPC Protocol: All inter-service communication uses gRPC with Protocol Buffers. Services load schemas exclusively from /generated/proto produced by code generation.
Budgeting: Effective token budget is computed as max(0, config.budget.tokens - assistant.content.tokens), with optional allocation shaping for tools, history, and context
Window inputs: Memory window selection is driven by the query vector, effective budget, and optional allocation ratios

Memory Service

Manages transient conversation context as resource identifiers with JSONL (newline-delimited JSON) storage for efficient appends. Provides memory windows used by the Agent to assemble context.

LLM Service

Interfaces with language models for embedding generation and text completion. Handles communication with external AI services.

Vector Service

Performs similarity search operations against dual vector indexes (content and descriptor). Returns resource identifiers and similarity scores ordered by relevance, with support for index type selection and token-based filtering. The current Agent runtime path queries the content index by default.

Tool Service

Acts as a gRPC proxy between tool calls requested by LLMs and actual tool implementations. Supports both mapping to existing services and custom tool implementations that extend the core system through configuration-driven endpoint mapping.

Key Operations:

ExecuteTool: Proxies tool calls to appropriate services based on configuration mapping
ListTools: Returns available tools with OpenAI-compatible schemas for LLM consumption

Architecture:

Configuration-driven: Tools defined via endpoint mappings in configuration files
Pure proxy: No business logic, just routing and protocol conversion
Extensible: Supports mapping to existing services (e.g., vector.QueryItems) and custom tools that extend the platform
Schema generation: Automatically generates OpenAI-compatible JSON schemas from protobuf types

Offline Sequence Diagram

The platform includes offline scripts for knowledge base preparation and vector index creation. These scripts process external knowledge sources into searchable vector indices before the services are deployed.

sequenceDiagram
    participant Dev as Developer
    participant Download as scripts/download.js
    participant GitHub as GitHub API
    participant Chunk as scripts/chunk.js
    participant Index as scripts/index.js
    participant LLM as LLM API
    participant Storage as Local Storage

    Dev->>Download: npm run download
    Download->>GitHub: Fetch latest release artifacts
    GitHub-->>Download: Release assets (.tar.gz)
    Download->>Storage: Extract to data/knowledge/
    Note right of Download: HTML files with microdata

    Dev->>Resources: node scripts/resources.js
    Resources->>Storage: Read HTML files from data/knowledge/
    Storage-->>Resources: HTML content with microdata
    loop For each HTML file
        Resources->>Resources: Extract microdata items as resources
        Resources->>Resources: Generate resource identifier (URI)
        Resources->>Resources: Apply policy filtering
        Resources->>Storage: Store resource.json with metadata
    end
    Resources->>Storage: Persist resource index

    Dev->>Index: node scripts/index.js
    Index->>Storage: Load resource index
    Storage-->>Index: All resource metadata

    loop Process resources in batches (content & descriptor)
        Index->>Storage: Load resource content
        Storage-->>Index: Resource content/descriptors
        Index->>LLM: Create embeddings for resource batch
        LLM-->>Index: Resource embeddings
        Index->>Storage: Add to dual vector indexes
    end

    Index->>Storage: Persist content & descriptor vector indexes
    Note right of Storage: Ready for runtime vector search with dual indexes

Offline Processing Workflow

1. Knowledge Download (`scripts/download.js`)

Downloads latest release artifacts from GitHub repository
Extracts compressed archives to data/knowledge/ directory
Provides HTML files with structured microdata for processing

2. Resource Processing (`scripts/resources.js`)

Scans HTML files for microdata items with configurable selectors
Extracts structured content and generates unique resource identifiers using URI format
Applies policy-based filtering for access control
Stores individual resources with metadata in unified resource storage
Creates searchable resource index with type-based organization

3. Vector Indexing (`scripts/index.js`)

Dual-Index Architecture: Creates separate content and descriptor vector indexes
Batch Processing: Processes resources in token-optimized batches to minimize API calls
Embedding Generation: Creates vector embeddings for both content and descriptors via LLM API
Index Creation: Builds dual indexes for flexible search capabilities
Persistence: Stores indexes to disk for runtime access with index type selection

Key Characteristics

Offline Execution: All processing occurs before service deployment
API Optimization: Batched requests minimize LLM API calls and costs
Resource-Based Architecture: Unified resource management with policy control
Flexible Search: Dual-index system enables content and descriptor-based searches
Incremental Processing: Skips existing resources to support iterative updates
Token Management: Respects API token limits while maximizing batch efficiency

Data Flow

Raw Knowledge → HTML files with microdata
Structured Resources → Individual JSON files with metadata and URI identifiers
Vector Embeddings → Numerical representations for similarity search (content & descriptors)
Dual Vector Indexes → Content and descriptor databases ready for runtime queries with index type selection

This offline pipeline ensures that runtime services can perform fast vector similarity searches without depending on external APIs or requiring real-time embedding generation.

Code Generation

The platform uses a centralized code generation approach with all generated artifacts consolidated in a single generated/ directory at the project root. The @copilot-ld/libcodegen package provides the codegen utility that discovers protobuf schemas from both core services and tools, generating type-safe JavaScript code and service infrastructure.

Generation Process

Setup Workflow: Follows the unified three-step workflow (resolve source path, ensure storage exists, create symlinks) to prepare the generated/ directory
Schema Discovery: Scans proto/ for core service definitions and tools/ for extension schemas that extend the platform
Centralized Output: Creates all artifacts in generated/ with organized sub-directories for types, services, and copied proto files
Type Generation: Produces consolidated JavaScript types and TypeScript declarations accessible via @copilot-ld/libtype
Service Generation: Creates base service classes and typed clients for gRPC communication via @copilot-ld/librpc
Package Integration: Symlinks created during setup make generated code available through framework-agnostic packages

Output Structure

./generated/
├── proto/           # Copied .proto files for runtime loading
├── types/
│   └── types.js     # Consolidated protobuf type definitions
└── services/
    ├── exports.js   # Aggregated service and client exports
    └── service-name/
        ├── service.js  # Generated base class
        └── client.js   # Generated typed client

Code Generation Commands

Re-run code generation after changing schemas in proto/ or adding tool definitions in tools/:

# Generate everything (recommended)
npm run codegen

# Generate specific components
npm run codegen:type        # Types only
npm run codegen:service     # Service base classes only  
npm run codegen:client      # Client classes only
npm run codegen:definition  # Service definitions only

The @copilot-ld/libcodegen package uses protobufjs for schema processing and Mustache templates for code generation, producing ES modules with full type safety. Services automatically create package symlinks at startup to ensure generated code is accessible through @copilot-ld/libtype and @copilot-ld/librpc while maintaining clean package separation.

Generated Artifact Structure

./generated/
├── bundle.tar.gz    # Compressed archive of all generated artifacts
├── proto/           # Copied original .proto files for runtime access
├── types/
│   └── types.js     # Consolidated protobuf type definitions
└── services/        # Service base classes and clients
    ├── agent/
    │   ├── service.js   # Generated base class
    │   └── client.js    # Generated typed client
    ├── exports.js       # Aggregated service/client exports
    └── definitions/     # Pre-compiled service definitions
        ├── agent.js     # gRPC service definition for Agent
        ├── vector.js    # gRPC service definition for Vector
        └── exports.js   # Aggregated definitions exports

# Package symlinks created automatically at service startup
./packages/libtype/generated/    → symlink to ./generated/
./packages/librpc/generated/     → symlink to ./generated/

Code Generation Workflow

The centralized generation approach ensures consistency and eliminates ambiguity between source and generated artifacts:

Unified Location: All generated code lives in a single generated/ directory at the project root
Schema Discovery: Automatically finds protobuf schemas in proto/ and tool extensions in tools/
Type Safety: Creates unified type definitions accessible via @copilot-ld/libtype
Service Infrastructure: Produces base classes, typed clients, and pre-compiled service definitions through @copilot-ld/librpc
Pre-compiled Definitions: Generates runtime-ready gRPC service definitions that eliminate the need for proto-loader dependencies
Package Integration: Service startup automatically creates symlinks for clean access through framework-agnostic packages
Tool Extension: Supports additional schemas that extend core system functionality

Docker Build Architecture

The platform uses a unified Docker build process that simplifies container creation and reduces maintenance overhead.

Unified Dockerfile

A single root-level Dockerfile handles building for all components (services, extensions, and tools) using a COMPONENT_PATH build argument to specify which component to build.

Build Process

Multi-stage build: Separates build dependencies from runtime image for smaller production containers
Code generation: Uses @copilot-ld/libcodegen to generate required artifacts during build
Component copying: Copies only the specified component path to minimize image size
Production optimization: Final stage includes only production dependencies and generated code

Usage Examples

# Build Agent service
docker build --build-arg COMPONENT_PATH=services/agent -t agent .

# Build Web extension
docker build --build-arg COMPONENT_PATH=extensions/web -t web .

# Build custom tool
docker build --build-arg COMPONENT_PATH=tools/hash -t hash .

Benefits

Consistency: All components use identical build process
Maintenance: Single Dockerfile to update and maintain
Flexibility: Easy to add new components without creating new Dockerfiles
Optimization: Shared layers between components improve build efficiency

Security Architecture

The security design focuses on service authentication and secure communication channels.

Authentication Mechanisms

For specific configuration details including environment variables, token formats, and timeout values, see the Configuration Guide.

Current State

Service-to-Service: Cryptographic authentication is implemented and enforced between internal services via the @copilot-ld/librpc package. All gRPC communication uses time-limited tokens with secure headers.
Request-Level: The Agent validates incoming github_token by calling GET /user via GitHub's API before processing requests.

Service Authentication (Current Implementation)

The platform implements cryptographic authentication for all service-to-service communication:

sequenceDiagram
    participant Service A
    participant Service B
    participant Authenticator

    Service A->>Authenticator: generateToken(serviceId)
    Authenticator->>Authenticator: Create payload: serviceId:timestamp
    Authenticator->>Authenticator: Sign with cryptographic algorithm
    Authenticator-->>Service A: Encoded authentication token

    Service A->>Service B: gRPC request + token
    Service B->>Authenticator: verifyToken(token)
    Authenticator->>Authenticator: Decode and validate signature
    Authenticator->>Authenticator: Check token expiration
    Authenticator-->>Service B: {isValid, serviceId, error}

    alt Token Valid
        Service B-->>Service A: Process request
    else Token Invalid
        Service B-->>Service A: Authentication error
    end

Token Generation Process (Implemented)

Create payload combining service ID and current timestamp
Generate cryptographic signature using a shared secret
Encode authentication token with service identity and timestamp
Attach token to gRPC metadata headers

Token Verification Process (Implemented)

Extract token from authentication headers in gRPC metadata
Decode token and extract service ID, timestamp, and cryptographic signature
Verify timestamp is within configured token lifetime
Recreate expected signature using shared secret
Compare signatures using secure comparison
Reject request with authentication error if validation fails

Communication Security

For specific protocol details, timeout configurations, and environment variables, see the Configuration Guide.

gRPC Internal Communication

Protocol: gRPC over secure transport
Authentication: Cryptographic signatures with time-limited tokens
Token Lifetime: Short-lived configurable duration
Secret Management: Shared secret from environment variables
Schema Validation: Protocol Buffer message validation

External API Communication

Clients to Extensions: HTTP/HTTPS via load balancer
LLM Service to External APIs: HTTPS with API key authentication

Security Limitations

mTLS Not Implemented

Mutual TLS (mTLS) is not currently implemented between services. Future security enhancements should include:

Certificate-based service authentication
Encrypted gRPC communication channels
Service identity verification via digital certificates

Rate-Limiting Not Implemented

Rate-limiting is not currently implemented for externally facing services. Future enhancements should include:

Request throttling per client IP address to prevent abuse
Adaptive rate limiting based on service resource utilization
Token bucket or sliding window algorithms for burst traffic handling
Configurable rate limits per extension type (web vs API clients)

Threat Model

Protected Against

External Service Access: Backend services cannot be directly accessed from outside the Docker environment
Service Impersonation: HMAC authentication prevents unauthorized service access with cryptographic verification
Token Replay: Short-lived time-limited tokens minimize replay attack windows
Request Forgery: HMAC signatures prevent message tampering and ensure request authenticity

Current Vulnerabilities

Shared Secret Exposure: If the authentication secret is compromised, all service authentication is bypassed
Container Compromise: If one container is compromised, the attacker gains access to the shared authentication secret
Extension Security: Extensions are the primary attack surface and must implement their own input validation

Service Security Responsibilities

Extensions (Web, Copilot)

Input validation and sanitization
Rate limiting and DDoS protection
Session management
CORS policy enforcement

Agent Service

Request orchestration security
Service-to-service authentication enforcement
Business logic security validation

Backend Services (Memory, LLM, Vector)

gRPC message validation
Resource usage limiting
Data access controls
Error handling without information disclosure

Architecture Overview

System Design

Communication Layer

Service Architecture

Generated Code Workflow

Directory Structure

High-Level Architecture

Online Sequence Diagram

Service Responsibilities

Agent Service

Memory Service

LLM Service

Vector Service

Tool Service

Offline Sequence Diagram

Offline Processing Workflow

1. Knowledge Download (scripts/download.js)

2. Resource Processing (scripts/resources.js)

3. Vector Indexing (scripts/index.js)

Key Characteristics

Data Flow

Code Generation

Generation Process

Output Structure

Code Generation Commands

Generated Artifact Structure

Code Generation Workflow

Docker Build Architecture

Unified Dockerfile

Build Process

Usage Examples

Benefits

Security Architecture

Authentication Mechanisms

Current State

Service Authentication (Current Implementation)

Token Generation Process (Implemented)

Token Verification Process (Implemented)

Communication Security

gRPC Internal Communication

External API Communication

Security Limitations

mTLS Not Implemented

Rate-Limiting Not Implemented

Threat Model

Protected Against

Current Vulnerabilities

Service Security Responsibilities

Extensions (Web, Copilot)

Agent Service

Backend Services (Memory, LLM, Vector)

1. Knowledge Download (`scripts/download.js`)

2. Resource Processing (`scripts/resources.js`)

3. Vector Indexing (`scripts/index.js`)