LLxprt Code Integration Implementation Guide
Generated: 2025-08-05 UTC
Purpose: Complete implementation guide for integrating LLxprt Code into Sasha Studio
Repository: https://github.com/acoliver/llxprt-code
Target: Docker-based Node.js application with real-time AI chat streaming
LLxprt Code Overview
LLxprt Code is an open-source multi-provider fork of Google's Gemini CLI, enhanced with:
- Multi-Provider Support: OpenAI, Anthropic, Google, OpenRouter, Fireworks, Local models
- Enhanced Theming: Beautiful, consistent UI themes
- Flexible Configuration: Switch providers, models, and API keys on the fly
- Local Model Support: LM Studio, llama.cpp, any OpenAI-compatible server
- Advanced Settings: Fine-tune model parameters, profiles, ephemeral settings
System Requirements
Prerequisites
- Node.js: Version 24+ (latest requirement from package.json)
- npm: Latest version for global package installation
- Docker: For containerized deployment
- API Keys: For desired AI providers
Installation Methods
# NPM Global Installation (Recommended)
npm install -g @vybestack/llxprt-code
# Homebrew (macOS/Linux)
brew install llxprt-code
# Verify installation
llxprt --version
Docker Integration Strategy
Enhanced Dockerfile for Sasha Studio
FROM node:24-slim AS llxprt-base
# Install comprehensive system dependencies
RUN apt-get update && apt-get install -y \
ca-certificates \
curl \
gnupg \
python3 \
python3-pip \
make \
g++ \
git \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user for security
RUN groupadd --gid 1000 node \
&& useradd --uid 1000 --gid node --shell /bin/bash --create-home node
# Configure npm global directory for non-root user
ENV NPM_CONFIG_PREFIX=/home/node/.npm-global
ENV PATH=$PATH:/home/node/.npm-global/bin
# Switch to non-root user
USER node
WORKDIR /home/node
# Install LLxprt Code globally
RUN npm install -g @vybestack/llxprt-code
# Verify installation
RUN llxprt --version
FROM llxprt-base AS sasha-studio
# Copy application files
COPY --chown=node:node package*.json ./
RUN npm ci --only=production
# Copy source code
COPY --chown=node:node . .
# Create configuration directory
RUN mkdir -p /home/node/.config/llxprt
# Expose application port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
CMD ["node", "server.js"]
Docker Compose Configuration
# docker-compose.yml
version: '3.8'
services:
sasha-studio:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- GOOGLE_API_KEY=${GOOGLE_API_KEY}
volumes:
- llxprt-config:/home/node/.config/llxprt
- user-files:/home/node/user-files:ro
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
volumes:
llxprt-config:
user-files:
LLxprt Configuration Management
Provider Configuration System
// config/llxprt-manager.js
const { spawn } = require('child_process');
const fs = require('fs').promises;
const path = require('path');
class LLxprtManager {
constructor() {
this.configDir = process.env.LLXPRT_CONFIG_DIR || '/home/node/.config/llxprt';
this.providers = {
openai: {
models: ['o1', 'gpt-4o', 'gpt-4o-mini'],
apiKeyEnv: 'OPENAI_API_KEY'
},
anthropic: {
models: ['claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022'],
apiKeyEnv: 'ANTHROPIC_API_KEY'
},
google: {
models: ['gemini-pro', 'gemini-pro-vision'],
apiKeyEnv: 'GOOGLE_API_KEY'
}
};
this.currentProvider = null;
this.currentModel = null;
this.isInitialized = false;
}
async initialize() {
if (this.isInitialized) return;
// Ensure config directory exists
await fs.mkdir(this.configDir, { recursive: true });
// Set up initial provider configuration
await this.setupProviders();
this.isInitialized = true;
}
async setupProviders() {
for (const [provider, config] of Object.entries(this.providers)) {
const apiKey = process.env[config.apiKeyEnv];
if (apiKey) {
await this.configureProvider(provider, apiKey);
}
}
}
async configureProvider(provider, apiKey) {
return new Promise((resolve, reject) => {
const process = spawn('llxprt', [], {
stdio: ['pipe', 'pipe', 'pipe'],
env: { ...process.env, LLXPRT_CONFIG_DIR: this.configDir }
});
let output = '';
process.stdout.on('data', (data) => {
output += data.toString();
});
process.stderr.on('data', (data) => {
console.error(`LLxprt stderr: ${data}`);
});
process.on('close', (code) => {
if (code === 0) {
console.log(`β
Provider ${provider} configured successfully`);
resolve(output);
} else {
reject(new Error(`Provider configuration failed with code ${code}`));
}
});
// Send configuration commands
process.stdin.write(`/provider ${provider}\n`);
process.stdin.write(`/key ${apiKey}\n`);
process.stdin.write(`/model ${this.providers[provider].models[0]}\n`);
process.stdin.write('/exit\n');
process.stdin.end();
});
}
async switchProvider(provider, model) {
if (!this.providers[provider]) {
throw new Error(`Unsupported provider: ${provider}`);
}
if (!this.providers[provider].models.includes(model)) {
throw new Error(`Unsupported model ${model} for provider ${provider}`);
}
this.currentProvider = provider;
this.currentModel = model;
return new Promise((resolve, reject) => {
const process = spawn('llxprt', [], {
stdio: ['pipe', 'pipe', 'pipe'],
env: { ...process.env, LLXPRT_CONFIG_DIR: this.configDir }
});
process.on('close', (code) => {
if (code === 0) {
resolve(`Switched to ${provider}:${model}`);
} else {
reject(new Error(`Provider switch failed with code ${code}`));
}
});
process.stdin.write(`/provider ${provider}\n`);
process.stdin.write(`/model ${model}\n`);
process.stdin.write('/exit\n');
process.stdin.end();
});
}
getAvailableProviders() {
return Object.entries(this.providers)
.filter(([_, config]) => process.env[config.apiKeyEnv])
.map(([provider, config]) => ({
provider,
models: config.models,
available: true
}));
}
}
module.exports = LLxprtManager;
Real-Time Streaming Implementation
WebSocket + LLxprt Integration
// services/streaming-service.js
const { spawn } = require('child_process');
const WebSocket = require('ws');
const LLxprtManager = require('../config/llxprt-manager');
class LLxprtStreamingService {
constructor() {
this.llxprtManager = new LLxprtManager();
this.activeStreams = new Map();
this.metrics = {
totalStreams: 0,
activeConnections: 0,
averageResponseTime: 0
};
}
async initialize() {
await this.llxprtManager.initialize();
}
setupWebSocketServer(server) {
const wss = new WebSocket.Server({
server,
path: '/chat/stream'
});
wss.on('connection', (ws, req) => {
const sessionId = this.generateSessionId();
const clientInfo = {
sessionId,
startTime: Date.now(),
messageCount: 0
};
this.activeStreams.set(sessionId, clientInfo);
this.metrics.activeConnections++;
console.log(`π New WebSocket connection: ${sessionId}`);
ws.on('message', async (data) => {
try {
const request = JSON.parse(data);
await this.handleStreamRequest(ws, sessionId, request);
} catch (error) {
console.error('WebSocket message error:', error);
ws.send(JSON.stringify({
type: 'error',
error: error.message,
sessionId
}));
}
});
ws.on('close', () => {
this.cleanup(sessionId);
this.metrics.activeConnections--;
console.log(`β WebSocket disconnected: ${sessionId}`);
});
ws.on('error', (error) => {
console.error(`WebSocket error for ${sessionId}:`, error);
this.cleanup(sessionId);
});
// Send connection confirmation
ws.send(JSON.stringify({
type: 'connected',
sessionId,
availableProviders: this.llxprtManager.getAvailableProviders()
}));
});
return wss;
}
async handleStreamRequest(ws, sessionId, request) {
const { message, provider, model, options = {} } = request;
// Validate request
if (!message || !provider || !model) {
throw new Error('Missing required fields: message, provider, model');
}
// Switch provider/model if needed
if (this.llxprtManager.currentProvider !== provider ||
this.llxprtManager.currentModel !== model) {
await this.llxprtManager.switchProvider(provider, model);
}
// Start streaming process
const streamData = await this.startLLxprtStream(ws, sessionId, message, options);
// Update session info
const clientInfo = this.activeStreams.get(sessionId);
if (clientInfo) {
clientInfo.messageCount++;
clientInfo.lastMessage = Date.now();
}
this.metrics.totalStreams++;
}
async startLLxprtStream(ws, sessionId, message, options) {
return new Promise((resolve, reject) => {
const startTime = Date.now();
let totalChunks = 0;
let totalChars = 0;
let firstTokenTime = null;
// Spawn LLxprt process
const llxprtProcess = spawn('llxprt', [], {
stdio: ['pipe', 'pipe', 'pipe'],
env: {
...process.env,
LLXPRT_CONFIG_DIR: this.llxprtManager.configDir
}
});
// Handle stdout (AI response)
llxprtProcess.stdout.on('data', (chunk) => {
const content = chunk.toString();
// Record first token time
if (!firstTokenTime && content.trim()) {
firstTokenTime = Date.now() - startTime;
}
totalChunks++;
totalChars += content.length;
// Send chunk to WebSocket
ws.send(JSON.stringify({
type: 'chunk',
content,
sessionId,
metadata: {
chunkIndex: totalChunks,
timestamp: Date.now()
}
}));
});
// Handle stderr (errors and system messages)
llxprtProcess.stderr.on('data', (data) => {
const error = data.toString();
console.error(`LLxprt stderr: ${error}`);
// Send error to client if it's not just debug info
if (error.toLowerCase().includes('error')) {
ws.send(JSON.stringify({
type: 'error',
error,
sessionId
}));
}
});
// Handle process completion
llxprtProcess.on('close', (code) => {
const duration = Date.now() - startTime;
// Update metrics
this.metrics.averageResponseTime =
(this.metrics.averageResponseTime + duration) / 2;
const response = {
type: 'complete',
sessionId,
metadata: {
duration,
totalChunks,
totalChars,
firstTokenTime,
exitCode: code,
tokensPerSecond: totalChars / (duration / 1000)
}
};
ws.send(JSON.stringify(response));
if (code === 0) {
resolve(response.metadata);
} else {
reject(new Error(`LLxprt process exited with code ${code}`));
}
});
// Handle process errors
llxprtProcess.on('error', (error) => {
console.error('LLxprt process error:', error);
ws.send(JSON.stringify({
type: 'error',
error: error.message,
sessionId
}));
reject(error);
});
// Send message to LLxprt
llxprtProcess.stdin.write(`${message}\n`);
llxprtProcess.stdin.end();
// Store process reference for cleanup
const clientInfo = this.activeStreams.get(sessionId);
if (clientInfo) {
clientInfo.process = llxprtProcess;
}
});
}
cleanup(sessionId) {
const clientInfo = this.activeStreams.get(sessionId);
if (clientInfo?.process) {
try {
clientInfo.process.kill('SIGTERM');
} catch (error) {
console.error('Error killing LLxprt process:', error);
}
}
this.activeStreams.delete(sessionId);
}
generateSessionId() {
return `llxprt_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
}
getMetrics() {
return {
...this.metrics,
activeStreams: this.activeStreams.size,
uptime: process.uptime()
};
}
}
module.exports = LLxprtStreamingService;
Provider Management System
Dynamic Provider Configuration
// services/provider-service.js
class ProviderService {
constructor(llxprtManager) {
this.llxprtManager = llxprtManager;
this.providerHealth = new Map();
this.costTracking = new Map();
this.usageStats = new Map();
}
async checkProviderHealth(provider) {
try {
const testMessage = "Hello, this is a health check.";
const startTime = Date.now();
await this.llxprtManager.switchProvider(provider,
this.llxprtManager.providers[provider].models[0]);
// Quick test with minimal token usage
const result = await this.sendTestMessage(testMessage);
const responseTime = Date.now() - startTime;
this.providerHealth.set(provider, {
status: 'healthy',
responseTime,
lastCheck: Date.now(),
error: null
});
return true;
} catch (error) {
this.providerHealth.set(provider, {
status: 'unhealthy',
responseTime: null,
lastCheck: Date.now(),
error: error.message
});
return false;
}
}
async sendTestMessage(message) {
return new Promise((resolve, reject) => {
const process = spawn('llxprt', [], {
stdio: ['pipe', 'pipe', 'pipe'],
env: {
...process.env,
LLXPRT_CONFIG_DIR: this.llxprtManager.configDir
}
});
let output = '';
const timeout = setTimeout(() => {
process.kill('SIGTERM');
reject(new Error('Health check timeout'));
}, 10000);
process.stdout.on('data', (data) => {
output += data.toString();
});
process.on('close', (code) => {
clearTimeout(timeout);
if (code === 0 && output.trim()) {
resolve(output);
} else {
reject(new Error(`Health check failed with code ${code}`));
}
});
process.stdin.write(`${message}\n`);
process.stdin.end();
});
}
async getOptimalProvider(preferredProvider = null) {
const availableProviders = this.llxprtManager.getAvailableProviders();
// If preferred provider is specified and healthy, use it
if (preferredProvider) {
const isHealthy = await this.checkProviderHealth(preferredProvider);
if (isHealthy) {
return preferredProvider;
}
}
// Find the best available provider based on health and performance
const healthyProviders = [];
for (const { provider } of availableProviders) {
const isHealthy = await this.checkProviderHealth(provider);
if (isHealthy) {
const health = this.providerHealth.get(provider);
healthyProviders.push({
provider,
responseTime: health.responseTime,
priority: this.getProviderPriority(provider)
});
}
}
if (healthyProviders.length === 0) {
throw new Error('No healthy providers available');
}
// Sort by priority, then by response time
healthyProviders.sort((a, b) => {
if (a.priority !== b.priority) {
return a.priority - b.priority; // Lower priority number = higher priority
}
return a.responseTime - b.responseTime;
});
return healthyProviders[0].provider;
}
getProviderPriority(provider) {
const priorities = {
'openai': 1,
'anthropic': 2,
'google': 3
};
return priorities[provider] || 99;
}
trackUsage(provider, model, inputTokens, outputTokens, cost) {
const key = `${provider}:${model}`;
const current = this.usageStats.get(key) || {
requests: 0,
inputTokens: 0,
outputTokens: 0,
totalCost: 0,
averageResponseTime: 0
};
current.requests++;
current.inputTokens += inputTokens;
current.outputTokens += outputTokens;
current.totalCost += cost;
this.usageStats.set(key, current);
}
getUsageStats() {
const stats = {};
for (const [key, data] of this.usageStats.entries()) {
stats[key] = {
...data,
averageCostPerRequest: data.totalCost / data.requests,
totalTokens: data.inputTokens + data.outputTokens
};
}
return stats;
}
getProviderHealth() {
const health = {};
for (const [provider, data] of this.providerHealth.entries()) {
health[provider] = data;
}
return health;
}
}
module.exports = ProviderService;
Express.js API Integration
Complete Server Implementation
// server.js
const express = require('express');
const http = require('http');
const cors = require('cors');
const helmet = require('helmet');
const rateLimit = require('express-rate-limit');
const LLxprtStreamingService = require('./services/streaming-service');
const ProviderService = require('./services/provider-service');
class SashaStudioServer {
constructor() {
this.app = express();
this.server = http.createServer(this.app);
this.streamingService = new LLxprtStreamingService();
this.providerService = new ProviderService(this.streamingService.llxprtManager);
this.setupMiddleware();
this.setupRoutes();
}
setupMiddleware() {
// Security middleware
this.app.use(helmet({
contentSecurityPolicy: {
directives: {
defaultSrc: ["'self'"],
styleSrc: ["'self'", "'unsafe-inline'"],
scriptSrc: ["'self'"],
imgSrc: ["'self'", "data:", "https:"],
connectSrc: ["'self'", "ws:", "wss:"]
}
}
}));
// CORS configuration
this.app.use(cors({
origin: process.env.CORS_ORIGIN || 'http://localhost:3000',
credentials: true
}));
// Rate limiting
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // Limit each IP to 100 requests per windowMs
message: 'Too many requests from this IP, please try again later.'
});
this.app.use('/api/', limiter);
// Body parsing
this.app.use(express.json({ limit: '10mb' }));
this.app.use(express.urlencoded({ extended: true, limit: '10mb' }));
// Request logging
this.app.use((req, res, next) => {
console.log(`${req.method} ${req.path} - ${req.ip}`);
next();
});
}
setupRoutes() {
// Health check
this.app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
version: process.env.npm_package_version || '1.0.0'
});
});
// Provider information
this.app.get('/api/providers', async (req, res) => {
try {
const providers = this.streamingService.llxprtManager.getAvailableProviders();
const health = this.providerService.getProviderHealth();
res.json({
providers: providers.map(p => ({
...p,
health: health[p.provider] || { status: 'unknown' }
}))
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Provider health check
this.app.post('/api/providers/:provider/health', async (req, res) => {
try {
const { provider } = req.params;
const isHealthy = await this.providerService.checkProviderHealth(provider);
const health = this.providerService.getProviderHealth()[provider];
res.json({
provider,
healthy: isHealthy,
...health
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Usage statistics
this.app.get('/api/stats', (req, res) => {
try {
const streamingMetrics = this.streamingService.getMetrics();
const usageStats = this.providerService.getUsageStats();
const providerHealth = this.providerService.getProviderHealth();
res.json({
streaming: streamingMetrics,
usage: usageStats,
health: providerHealth,
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Switch provider/model
this.app.post('/api/configure', async (req, res) => {
try {
const { provider, model } = req.body;
if (!provider || !model) {
return res.status(400).json({
error: 'Provider and model are required'
});
}
await this.streamingService.llxprtManager.switchProvider(provider, model);
res.json({
success: true,
currentProvider: provider,
currentModel: model,
timestamp: new Date().toISOString()
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
// Error handling middleware
this.app.use((error, req, res, next) => {
console.error('Unhandled error:', error);
res.status(500).json({
error: 'Internal server error',
timestamp: new Date().toISOString()
});
});
// 404 handler
this.app.use((req, res) => {
res.status(404).json({
error: 'Not found',
path: req.path,
timestamp: new Date().toISOString()
});
});
}
async start() {
try {
// Initialize services
await this.streamingService.initialize();
console.log('β
LLxprt streaming service initialized');
// Setup WebSocket server
this.streamingService.setupWebSocketServer(this.server);
console.log('β
WebSocket server configured');
// Start HTTP server
const port = process.env.PORT || 3000;
this.server.listen(port, () => {
console.log(`π Sasha Studio server running on port ${port}`);
console.log(`π WebSocket endpoint: ws://localhost:${port}/chat/stream`);
console.log(`π Health check: http://localhost:${port}/health`);
});
} catch (error) {
console.error('β Failed to start server:', error);
process.exit(1);
}
}
async shutdown() {
console.log('π Shutting down server...');
// Cleanup active streams
for (const [sessionId] of this.streamingService.activeStreams) {
this.streamingService.cleanup(sessionId);
}
// Close server
this.server.close(() => {
console.log('β
Server shutdown complete');
process.exit(0);
});
}
}
// Handle graceful shutdown
const server = new SashaStudioServer();
process.on('SIGTERM', () => server.shutdown());
process.on('SIGINT', () => server.shutdown());
// Start server
server.start();
module.exports = SashaStudioServer;
Testing Strategy
Integration Tests
// tests/llxprt-integration.test.js
const WebSocket = require('ws');
const { spawn } = require('child_process');
const LLxprtStreamingService = require('../services/streaming-service');
describe('LLxprt Integration Tests', () => {
let streamingService;
let server;
beforeAll(async () => {
streamingService = new LLxprtStreamingService();
await streamingService.initialize();
// Start test server
server = require('http').createServer();
streamingService.setupWebSocketServer(server);
server.listen(0); // Random port
});
afterAll(async () => {
server.close();
});
test('should initialize LLxprt successfully', async () => {
expect(streamingService.llxprtManager.isInitialized).toBe(true);
});
test('should establish WebSocket connection', (done) => {
const ws = new WebSocket(`ws://localhost:${server.address().port}/chat/stream`);
ws.on('open', () => {
expect(ws.readyState).toBe(WebSocket.OPEN);
ws.close();
done();
});
});
test('should stream AI response', (done) => {
const ws = new WebSocket(`ws://localhost:${server.address().port}/chat/stream`);
let receivedChunks = 0;
ws.on('open', () => {
ws.send(JSON.stringify({
message: 'Say hello',
provider: 'openai',
model: 'gpt-4o-mini'
}));
});
ws.on('message', (data) => {
const response = JSON.parse(data);
if (response.type === 'chunk') {
receivedChunks++;
expect(response.content).toBeDefined();
expect(response.sessionId).toBeDefined();
}
if (response.type === 'complete') {
expect(receivedChunks).toBeGreaterThan(0);
expect(response.metadata.duration).toBeGreaterThan(0);
ws.close();
done();
}
});
}, 30000);
test('should handle provider switching', async () => {
const providers = streamingService.llxprtManager.getAvailableProviders();
expect(providers.length).toBeGreaterThan(0);
const provider = providers[0];
await streamingService.llxprtManager.switchProvider(
provider.provider,
provider.models[0]
);
expect(streamingService.llxprtManager.currentProvider).toBe(provider.provider);
});
test('should handle errors gracefully', (done) => {
const ws = new WebSocket(`ws://localhost:${server.address().port}/chat/stream`);
ws.on('open', () => {
ws.send(JSON.stringify({
message: 'Test message',
provider: 'invalid-provider',
model: 'invalid-model'
}));
});
ws.on('message', (data) => {
const response = JSON.parse(data);
if (response.type === 'error') {
expect(response.error).toBeDefined();
ws.close();
done();
}
});
});
});
Load Testing
// tests/load-test.js
const WebSocket = require('ws');
async function loadTest() {
const concurrent = 10;
const messagesPerConnection = 3;
const connections = [];
const results = [];
console.log(`π§ͺ Starting load test with ${concurrent} concurrent connections`);
for (let i = 0; i < concurrent; i++) {
const connectionPromise = new Promise((resolve) => {
const ws = new WebSocket('ws://localhost:3000/chat/stream');
const startTime = Date.now();
let messagesCompleted = 0;
let totalResponseTime = 0;
ws.on('open', () => {
console.log(`Connection ${i} opened`);
sendMessage();
});
ws.on('message', (data) => {
const response = JSON.parse(data);
if (response.type === 'complete') {
messagesCompleted++;
totalResponseTime += response.metadata.duration;
if (messagesCompleted < messagesPerConnection) {
setTimeout(sendMessage, 1000);
} else {
const totalTime = Date.now() - startTime;
resolve({
connection: i,
totalTime,
messagesCompleted,
averageResponseTime: totalResponseTime / messagesCompleted
});
ws.close();
}
}
});
function sendMessage() {
ws.send(JSON.stringify({
message: `Load test message ${messagesCompleted + 1}`,
provider: 'openai',
model: 'gpt-4o-mini'
}));
}
});
connections.push(connectionPromise);
}
const results = await Promise.all(connections);
console.log('\nπ Load Test Results:');
console.log(`Total connections: ${concurrent}`);
console.log(`Messages per connection: ${messagesPerConnection}`);
console.log(`Total messages: ${concurrent * messagesPerConnection}`);
const avgResponseTime = results.reduce((sum, r) => sum + r.averageResponseTime, 0) / results.length;
console.log(`Average response time: ${avgResponseTime.toFixed(2)}ms`);
const maxResponseTime = Math.max(...results.map(r => r.averageResponseTime));
console.log(`Max response time: ${maxResponseTime.toFixed(2)}ms`);
const minResponseTime = Math.min(...results.map(r => r.averageResponseTime));
console.log(`Min response time: ${minResponseTime.toFixed(2)}ms`);
}
if (require.main === module) {
loadTest().catch(console.error);
}
module.exports = loadTest;
Performance Optimization
Connection Pooling and Resource Management
// utils/resource-manager.js
class ResourceManager {
constructor() {
this.processPool = [];
this.maxPoolSize = 5;
this.activeProcesses = new Set();
this.processTimeout = 30000; // 30 seconds
}
async getProcess() {
// Try to get from pool first
if (this.processPool.length > 0) {
const process = this.processPool.pop();
this.activeProcesses.add(process);
return process;
}
// Create new process if under limit
if (this.activeProcesses.size < this.maxPoolSize) {
const process = this.createProcess();
this.activeProcesses.add(process);
return process;
}
// Wait for available process
return new Promise((resolve) => {
const checkInterval = setInterval(() => {
if (this.processPool.length > 0) {
clearInterval(checkInterval);
const process = this.processPool.pop();
this.activeProcesses.add(process);
resolve(process);
}
}, 100);
});
}
createProcess() {
const process = spawn('llxprt', [], {
stdio: ['pipe', 'pipe', 'pipe'],
env: { ...process.env }
});
// Set timeout for idle processes
process.idleTimeout = setTimeout(() => {
this.destroyProcess(process);
}, this.processTimeout);
process.on('error', () => {
this.destroyProcess(process);
});
return process;
}
releaseProcess(process) {
this.activeProcesses.delete(process);
// Return to pool if healthy
if (process.killed === false && this.processPool.length < this.maxPoolSize) {
this.processPool.push(process);
// Reset idle timeout
clearTimeout(process.idleTimeout);
process.idleTimeout = setTimeout(() => {
this.destroyProcess(process);
}, this.processTimeout);
} else {
this.destroyProcess(process);
}
}
destroyProcess(process) {
try {
clearTimeout(process.idleTimeout);
this.activeProcesses.delete(process);
this.processPool = this.processPool.filter(p => p !== process);
if (!process.killed) {
process.kill('SIGTERM');
}
} catch (error) {
console.error('Error destroying process:', error);
}
}
cleanup() {
// Destroy all processes
for (const process of this.activeProcesses) {
this.destroyProcess(process);
}
for (const process of this.processPool) {
this.destroyProcess(process);
}
this.processPool = [];
this.activeProcesses.clear();
}
getStats() {
return {
poolSize: this.processPool.length,
activeProcesses: this.activeProcesses.size,
maxPoolSize: this.maxPoolSize
};
}
}
module.exports = ResourceManager;
Monitoring and Observability
Metrics Collection
// utils/metrics.js
class MetricsCollector {
constructor() {
this.metrics = {
requests: {
total: 0,
successful: 0,
failed: 0,
byProvider: new Map()
},
performance: {
averageResponseTime: 0,
responseTimeHistory: [],
firstTokenTimes: [],
tokensPerSecond: []
},
resources: {
memoryUsage: [],
activeConnections: 0,
processCount: 0
},
errors: {
total: 0,
byType: new Map(),
recent: []
}
};
this.startCollection();
}
startCollection() {
// Collect system metrics every 30 seconds
setInterval(() => {
const usage = process.memoryUsage();
this.metrics.resources.memoryUsage.push({
timestamp: Date.now(),
heapUsed: usage.heapUsed / 1024 / 1024,
heapTotal: usage.heapTotal / 1024 / 1024,
external: usage.external / 1024 / 1024,
rss: usage.rss / 1024 / 1024
});
// Keep only last 100 measurements (50 minutes)
if (this.metrics.resources.memoryUsage.length > 100) {
this.metrics.resources.memoryUsage.shift();
}
}, 30000);
}
recordRequest(provider, model, success, responseTime, firstTokenTime, tokensPerSecond) {
this.metrics.requests.total++;
if (success) {
this.metrics.requests.successful++;
} else {
this.metrics.requests.failed++;
}
// Track by provider
const providerKey = `${provider}:${model}`;
const providerStats = this.metrics.requests.byProvider.get(providerKey) || {
total: 0,
successful: 0,
failed: 0
};
providerStats.total++;
if (success) {
providerStats.successful++;
} else {
providerStats.failed++;
}
this.metrics.requests.byProvider.set(providerKey, providerStats);
// Record performance metrics
if (success && responseTime) {
this.metrics.performance.responseTimeHistory.push(responseTime);
if (firstTokenTime) {
this.metrics.performance.firstTokenTimes.push(firstTokenTime);
}
if (tokensPerSecond) {
this.metrics.performance.tokensPerSecond.push(tokensPerSecond);
}
// Keep only last 1000 measurements
if (this.metrics.performance.responseTimeHistory.length > 1000) {
this.metrics.performance.responseTimeHistory.shift();
}
// Update average
const sum = this.metrics.performance.responseTimeHistory.reduce((a, b) => a + b, 0);
this.metrics.performance.averageResponseTime = sum / this.metrics.performance.responseTimeHistory.length;
}
}
recordError(error, type = 'unknown', context = {}) {
this.metrics.errors.total++;
const errorCount = this.metrics.errors.byType.get(type) || 0;
this.metrics.errors.byType.set(type, errorCount + 1);
this.metrics.errors.recent.push({
timestamp: Date.now(),
error: error.message || error,
type,
context
});
// Keep only last 100 errors
if (this.metrics.errors.recent.length > 100) {
this.metrics.errors.recent.shift();
}
}
updateResourceMetrics(activeConnections, processCount) {
this.metrics.resources.activeConnections = activeConnections;
this.metrics.resources.processCount = processCount;
}
getMetrics() {
return {
...this.metrics,
timestamp: new Date().toISOString(),
uptime: process.uptime()
};
}
getHealthScore() {
const successRate = this.metrics.requests.total > 0
? this.metrics.requests.successful / this.metrics.requests.total
: 1;
const avgResponseTime = this.metrics.performance.averageResponseTime || 0;
const responseTimeScore = Math.max(0, 1 - (avgResponseTime / 10000)); // Penalty after 10s
const memoryUsage = this.metrics.resources.memoryUsage;
const latestMemory = memoryUsage[memoryUsage.length - 1];
const memoryScore = latestMemory
? Math.max(0, 1 - (latestMemory.heapUsed / 1000)) // Penalty after 1GB
: 1;
const healthScore = (successRate * 0.5) + (responseTimeScore * 0.3) + (memoryScore * 0.2);
return {
overall: Math.round(healthScore * 100),
components: {
successRate: Math.round(successRate * 100),
responseTime: Math.round(responseTimeScore * 100),
memory: Math.round(memoryScore * 100)
}
};
}
generateReport() {
const metrics = this.getMetrics();
const health = this.getHealthScore();
return {
summary: {
totalRequests: metrics.requests.total,
successRate: `${((metrics.requests.successful / metrics.requests.total) * 100).toFixed(2)}%`,
averageResponseTime: `${metrics.performance.averageResponseTime.toFixed(0)}ms`,
healthScore: `${health.overall}%`,
uptime: `${(metrics.uptime / 3600).toFixed(2)} hours`
},
performance: {
averageResponseTime: metrics.performance.averageResponseTime,
averageFirstTokenTime: metrics.performance.firstTokenTimes.length > 0
? metrics.performance.firstTokenTimes.reduce((a, b) => a + b, 0) / metrics.performance.firstTokenTimes.length
: 0,
averageTokensPerSecond: metrics.performance.tokensPerSecond.length > 0
? metrics.performance.tokensPerSecond.reduce((a, b) => a + b, 0) / metrics.performance.tokensPerSecond.length
: 0
},
resources: {
currentMemoryUsage: metrics.resources.memoryUsage[metrics.resources.memoryUsage.length - 1],
activeConnections: metrics.resources.activeConnections,
processCount: metrics.resources.processCount
},
providers: Object.fromEntries(metrics.requests.byProvider),
errors: {
total: metrics.errors.total,
byType: Object.fromEntries(metrics.errors.byType),
recent: metrics.errors.recent.slice(-10)
},
health
};
}
}
module.exports = MetricsCollector;
Deployment and Production Considerations
Production Docker Configuration
# Production Dockerfile
FROM node:24-slim AS base
# Install system dependencies including Python for potential native modules
RUN apt-get update && apt-get install -y \
ca-certificates \
curl \
gnupg \
python3 \
python3-pip \
make \
g++ \
git \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean
# Create non-root user
RUN groupadd --gid 1000 sasha \
&& useradd --uid 1000 --gid sasha --shell /bin/bash --create-home sasha
# Configure npm for non-root user
USER sasha
ENV NPM_CONFIG_PREFIX=/home/sasha/.npm-global
ENV PATH=$PATH:/home/sasha/.npm-global/bin
WORKDIR /home/sasha/app
# Install LLxprt Code
RUN npm install -g @vybestack/llxprt-code@latest
# Verify installation
RUN llxprt --version
FROM base AS production
# Copy package files
COPY --chown=sasha:sasha package*.json ./
# Install production dependencies
RUN npm ci --omit=dev --no-audit --no-fund
# Copy source code
COPY --chown=sasha:sasha . .
# Create necessary directories
RUN mkdir -p /home/sasha/.config/llxprt \
&& mkdir -p /home/sasha/logs \
&& mkdir -p /home/sasha/tmp
# Set environment variables
ENV NODE_ENV=production
ENV LOG_LEVEL=info
ENV MAX_CONCURRENT_STREAMS=10
ENV STREAM_TIMEOUT=300000
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
# Start application
CMD ["node", "server.js"]
Kubernetes Deployment
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: sasha-studio
labels:
app: sasha-studio
spec:
replicas: 3
selector:
matchLabels:
app: sasha-studio
template:
metadata:
labels:
app: sasha-studio
spec:
containers:
- name: sasha-studio
image: sasha-studio:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: ai-api-keys
key: openai
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: ai-api-keys
key: anthropic
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: ai-api-keys
key: google
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
volumeMounts:
- name: llxprt-config
mountPath: /home/sasha/.config/llxprt
- name: logs
mountPath: /home/sasha/logs
volumes:
- name: llxprt-config
configMap:
name: llxprt-config
- name: logs
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: sasha-studio-service
spec:
selector:
app: sasha-studio
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer
Summary and Next Steps
Implementation Roadmap
Week 1: Core Setup
- Install and configure LLxprt Code in Docker
- Implement basic streaming service
- Test multi-provider configuration
Week 2: Integration
- Build WebSocket streaming system
- Implement provider management
- Add error handling and resilience
Week 3: Production Ready
- Add monitoring and metrics
- Implement resource management
- Security hardening and testing
Week 4: Optimization
- Performance tuning
- Load balancing
- Production deployment
Key Benefits of LLxprt Integration
- Multi-Provider Support: 8+ AI providers with unified interface
- Streaming Native: Built-in real-time streaming capabilities
- Local Model Support: Ollama, LM Studio, llama.cpp integration
- Docker Ready: Existing containerization support
- Node.js Compatible: NPM package with CLI interface
- Active Development: Maintained by Vybestack team
- Configuration Management: Flexible provider and model switching
Success Metrics
- Performance: < 500ms first token, < 50ms subsequent tokens
- Reliability: 99.9% uptime, graceful fallback between providers
- Scalability: 100+ concurrent streams, < 1GB memory usage
- Security: Zero vulnerabilities, encrypted API key management
This comprehensive integration guide provides everything needed to implement LLxprt Code into Sasha Studio with production-grade quality and performance.
With LLxprt Code as our foundation, Sasha Studio will have enterprise-grade AI integration with maximum flexibility and reliability.