Core Concepts
Understanding the foundational concepts behind Copilot-LD helps you make the most of the platform. This guide explains the "why" behind key architectural decisions and how they work together.
What is Copilot-LD?
Copilot-LD is an intelligent agent that combines GitHub Copilot's language models with linked data and retrieval-augmented generation (RAG) to provide accurate, context-aware assistance. Unlike simple chatbots, it understands semantic relationships in your knowledge base and provides responses grounded in your actual data.
Core Technologies
Linked Data
Linked data provides the semantic structure that makes Copilot-LD uniquely accurate. Instead of treating content as plain text, the system understands relationships and context through HTML microdata with Schema.org vocabularies.
Why Linked Data?
- Semantic Understanding: Preserves meaning and relationships between concepts
- Accurate Chunking: Content boundaries align with semantic units, not arbitrary character limits
- Rich Metadata: Every piece of content includes structured information about its type and purpose
- Standard Vocabularies: Schema.org provides well-defined, interoperable types
Retrieval-Augmented Generation (RAG)
RAG enhances language model responses by retrieving relevant context from your knowledge base before generating answers. This grounds responses in factual information rather than relying solely on the model's training data.
The RAG Process:
- Query: User asks a question or makes a request
- Retrieve: System finds relevant content using vector similarity search
- Augment: Retrieved content is added to the conversation context
- Generate: Language model produces a response informed by the retrieved context
Why RAG?
- Accuracy: Responses based on your actual knowledge base, not generic training data
- Up-to-date: Information reflects current content without retraining models
- Transparency: Can trace responses back to source documents
- Control: You determine what information is available to the system
Microservices Architecture
Copilot-LD is built as a collection of specialized microservices that communicate via gRPC. Each service has a single, well-defined responsibility.
Why Microservices?
- Modularity: Services can be developed, tested, and deployed independently
- Scalability: Scale individual services based on their specific resource needs
- Maintainability: Smaller, focused codebases are easier to understand and modify
- Technology Independence: Services can use different technologies if needed
- Fault Isolation: Problems in one service don't cascade to others
gRPC Communication
Services communicate using gRPC, a high-performance RPC framework with Protocol Buffers for message serialization.
Why gRPC?
- Type Safety: Protocol Buffers provide strong typing and schema validation
- Performance: Binary serialization is faster and more compact than JSON
- Code Generation: Automatically generate client and server code from schemas
- Cross-Platform: Works across different languages and platforms
- Built-in Features: Authentication, timeouts, and error handling included
Distributed Tracing
Copilot-LD implements comprehensive distributed tracing to make the agent's decision-making process observable. Each request creates a trace—a complete record of all service calls, tool executions, and timing information as the agent processes the request.
Why Tracing for Agentic Systems?
Agentic AI systems present unique observability challenges that make tracing essential:
-
Non-Deterministic Behavior: Unlike traditional software with fixed code paths, agents make autonomous decisions at runtime. You can't predict which services will be called or in what order. Tracing reveals the actual execution path chosen by the agent for each request.
-
Tool Calling Complexity: Agents dynamically select and execute tools based on conversation context. Understanding which tools were called, why, and with what parameters is critical for debugging and optimization. Traces capture the complete tool execution graph.
-
Multi-Service Orchestration: A single user request flows through multiple services (Agent → Memory → LLM → Tool → Vector). Traditional logs from individual services don't show the complete picture. Distributed tracing correlates all activity across services using a shared trace ID.
-
Performance Analysis: Agentic workflows involve expensive operations—LLM API calls, vector searches, memory retrievals. Traces with precise timing information identify performance bottlenecks and optimization opportunities.
-
Trust and Transparency: Users and operators need to understand how the agent reached its conclusions. Traces provide an audit trail showing exactly what information was retrieved, which tools were executed, and how token budgets were allocated.
The Tracing Model:
Each trace consists of spans—individual units of work with start/end times, attributes, and relationships to other spans:
- Trace ID: Unique identifier shared by all spans in a request
- Span ID: Unique identifier for each operation
- Parent Span ID: Links spans into a hierarchical tree showing call relationships
- Span Kind: Classifies operations as SERVER (incoming), CLIENT (outgoing), or INTERNAL
- Attributes: Structured metadata (service name, method, resource IDs, message counts)
- Events: Point-in-time markers (request.sent, response.received) with additional context
- Status: Success or error state with optional error messages
Why Distributed Tracing?
- Complete Request Visibility: See every service call, tool execution, and data retrieval in one view
- Performance Debugging: Identify slow operations with nanosecond-precision timing
- Causal Relationships: Understand which calls triggered which subsequent operations
- Production Monitoring: Detect anomalies, errors, and performance degradation in real-time
- Agent Behavior Analysis: Study how the agent makes decisions across many requests
- OpenTelemetry Compatibility: Standard format enables integration with industry-standard tools
System Capabilities
Intelligent Request Processing
The Agent service orchestrates request processing, making autonomous decisions about which tools to call and when. It doesn't follow a rigid workflow but adapts based on the conversation context and available tools.
Contextual Memory
The Memory service maintains conversation history with intelligent budgeting. It allocates token budgets between tools, context, and history to maximize relevance while respecting model limits.
Semantic Search
The Vector service provides content-based semantic search to find documents by their actual content using vector embeddings. The service generates embeddings from text queries and searches against indexed document content.
Policy-Based Access Control
The Graph service enforces policy-based filtering, ensuring users only access resources they're authorized to see. Policies are defined declaratively and applied consistently across all resource access.
Extensible Tool System
The Tool service enables the agent to execute external functions. Tools are defined using Protocol Buffers and can be added without modifying core services. The agent autonomously decides when to call tools based on conversation context.
Request Flow
Understanding how a request flows through the system helps clarify how the components work together.
Online Processing (Runtime)
- Client Request: User sends a message through an extension (web interface, Teams bot, etc.)
- Agent Orchestration: Agent service receives the request and validates authentication
- Memory Assembly: Agent requests a memory window with conversation history and available tools
- Context Retrieval: Agent resolves resource identifiers to actual content, with policy filtering applied
- Completion Generation: Agent sends assembled context to LLM service for response generation
- Tool Execution: If the LLM decides to call tools, Agent executes them and continues the loop
- Response: Final completion is saved to memory and returned to the client
This flow is sequential per request but multiple requests can be processed concurrently. The agent makes intelligent decisions at each step rather than following a rigid pipeline.
Offline Processing (Build Time)
Before the system can answer questions, knowledge must be processed into searchable formats:
- Resource Extraction: HTML files with microdata are scanned and converted to individual resource documents
- Embedding Creation: Content is converted to vector embeddings
- Index Building: Vector database is created for fast similarity search
This offline pipeline ensures runtime queries are fast—no external API calls needed during search, just in-memory vector operations.
Why Separate Online and Offline Processing?
Copilot-LD deliberately separates build-time processing from runtime operations for several important reasons:
Performance
- No API Delays: Runtime searches use pre-computed embeddings, eliminating LLM API latency
- In-Memory Operations: Vector similarity is computed locally without network calls
- Predictable Latency: Response times are consistent and fast
Cost Efficiency
- One-Time Embeddings: Generate embeddings once during processing, not on every query
- Batch Processing: Offline pipeline optimizes API calls through batching
- No Per-Query Costs: Vector search has zero API cost
Reliability
- Offline Validation: Catch processing errors before deployment
- Reduced Dependencies: Runtime doesn't depend on external embedding APIs
- Reproducible Builds: Same input always produces same indexes
Architectural Principles
Radical Simplicity
Copilot-LD is built with plain JavaScript and no external dependencies beyond Node.js built-ins. This deliberate choice makes the system:
- Easy to Understand: No framework magic or hidden complexity
- Easy to Deploy: Minimal container size (under 10 MB)
- Easy to Maintain: No dependency updates or compatibility issues
- Easy to Audit: Small codebase with explicit behavior
Business Logic First
Core logic lives in framework-agnostic packages
(@copilot-ld/lib*) that can be imported and tested
independently. Services are thin adapters that wire packages together
with gRPC communication.
Benefits:
- Testability: Business logic can be unit tested without service infrastructure
- Reusability: Same logic can power different interfaces (CLI tools, services, extensions)
- Clarity: Separation between communication (gRPC) and computation (business logic)
Type Safety Without TypeScript
Protocol Buffers provide type safety and schema validation without requiring TypeScript compilation. Generated JavaScript includes JSDoc types for IDE support while remaining simple JavaScript at runtime.
Security by Design
Security is built into the architecture from the start:
- Network Isolation: Backend services are not exposed externally
- Authenticated Communication: HMAC authentication for all inter-service calls
- Time-Limited Tokens: Short-lived authentication tokens prevent replay attacks
- Policy Enforcement: Access control applied at the data layer
- Minimal Attack Surface: Small container images with only essential components
Next Steps
Now that you understand the core concepts, you can:
- Architecture – See how these concepts map to actual system components
- Reference – Deep dive into implementation details