Model Selector Implementation Plan
Generated: 2025-01-08 UTC
Purpose: Comprehensive implementation plan for the Sasha Studio model selector feature
Scope: Backend architecture, WebSocket protocol, UI components, and model discovery
Executive Summary
This document outlines the implementation plan for adding a comprehensive model selector to the Sasha Studio chat interface. The selector will allow users to choose between different AI models including cloud providers (Claude, GPT, Gemini) and local models (TinyLlama), with real-time availability status and cost information.
Architecture Overview
System Components
Model Metadata Schema
Core Model Structure
{
id: "claude-3-opus",
provider: "anthropic",
name: "Claude 3 Opus",
description: "Most capable model for complex tasks",
category: "premium",
capabilities: ["reasoning", "coding", "analysis", "creative"],
pricing: {
inputCost: 15.00, // per million tokens
outputCost: 75.00, // per million tokens
currency: "USD"
},
limits: {
contextWindow: 200000,
maxOutput: 4096,
rateLimit: 100 // requests per minute
},
performance: {
speed: "medium", // fast/medium/slow
quality: "excellent", // good/very-good/excellent
latency: 500 // ms to first token
},
availability: {
status: "available", // available/degraded/unavailable
lastChecked: "2025-01-08T10:00:00Z",
region: "us-east-1"
},
tags: ["general", "coding", "research"],
icon: "ph-brain",
recommended: true,
isLocal: false,
requiresAuth: true
}
Local Model Structure (TinyLlama)
{
id: "tinyllama-1.1b",
provider: "ollama",
name: "TinyLlama 1.1B",
description: "Ultra-fast local model for simple queries",
category: "local",
capabilities: ["basic-chat", "simple-questions"],
pricing: {
inputCost: 0,
outputCost: 0,
currency: "USD",
note: "Free - runs locally"
},
limits: {
contextWindow: 2048,
maxOutput: 512
},
performance: {
speed: "very-fast",
quality: "good",
latency: 50
},
availability: {
status: "available",
isInstalled: true,
modelSize: "637MB",
quantization: "q4_k_m"
},
tags: ["local", "privacy", "offline"],
icon: "ph-house",
isLocal: true,
requiresAuth: false,
isFallback: true
}
WebSocket Protocol Extensions
Model Discovery Messages
// Client -> Server: Request available models
{
type: "get_models",
filters: {
category: "all", // all/premium/standard/local
capabilities: [], // optional capability filter
includeUnavailable: false
}
}
// Server -> Client: Return model list
{
type: "models_list",
models: [...], // Array of model objects
defaultModel: "claude-3-haiku",
currentModel: "gpt-4-turbo",
timestamp: "2025-01-08T10:00:00Z"
}
Model Selection Messages
// Client -> Server: Switch model
{
type: "switch_model",
modelId: "claude-3-opus",
applyToConversation: false // or true to re-run with new model
}
// Server -> Client: Confirm switch
{
type: "model_switched",
previousModel: "gpt-4-turbo",
currentModel: "claude-3-opus",
costEstimate: {
perMessage: 0.05,
per1000Tokens: 0.75
}
}
Real-time Status Updates
// Server -> Client: Model status change
{
type: "model_status_update",
modelId: "gpt-4",
status: "degraded",
message: "Experiencing high latency",
alternativeModels: ["claude-3-opus", "gpt-4-turbo"]
}
// Server -> Client: Cost alert
{
type: "cost_alert",
currentSpend: 12.50,
projectedMonthly: 375.00,
suggestedModel: "claude-3-haiku",
savingsEstimate: "60%"
}
UI Component Structure
Model Selector Component Hierarchy
ModelSelector/
βββ ModelSelectorButton.jsx # Trigger button with current model
βββ ModelSelectorModal.jsx # Main modal container
βββ ModelGrid.jsx # Grid of model cards
β βββ ModelCard.jsx # Individual model display
βββ ModelComparison.jsx # Side-by-side comparison
βββ ModelFilters.jsx # Category and capability filters
βββ CostCalculator.jsx # Interactive cost estimator
βββ ModelStatusIndicator.jsx # Real-time availability
Key UI Features
Quick Switch Bar
- Recently used models
- Favorite models
- One-click switching
Model Cards
- Visual status indicators
- Key capabilities
- Pricing at a glance
- Performance metrics
Comparison Mode
- Side-by-side feature comparison
- Cost comparison
- Performance benchmarks
Smart Recommendations
- Task-based suggestions
- Cost optimization tips
- Quality vs. speed trade-offs
Backend Implementation
1. Model Registry Service
// services/model-registry.js
class ModelRegistry {
constructor() {
this.models = new Map();
this.providers = new Map();
this.healthMonitor = new HealthMonitor();
}
async initialize() {
// Load model definitions
await this.loadCloudModels();
await this.loadLocalModels();
// Start health monitoring
this.healthMonitor.startMonitoring();
}
async getAvailableModels(filters = {}) {
const models = Array.from(this.models.values());
return models.filter(model => {
// Apply filters
if (filters.category && model.category !== filters.category) {
return false;
}
if (filters.capabilities?.length) {
const hasCapabilities = filters.capabilities.every(
cap => model.capabilities.includes(cap)
);
if (!hasCapabilities) return false;
}
if (!filters.includeUnavailable &&
model.availability.status === 'unavailable') {
return false;
}
return true;
});
}
async switchModel(sessionId, modelId) {
const model = this.models.get(modelId);
if (!model) throw new Error('Model not found');
// Validate availability
if (model.availability.status === 'unavailable') {
throw new Error('Model currently unavailable');
}
// Update session
await this.updateSessionModel(sessionId, modelId);
// Return confirmation
return {
success: true,
model: model,
costEstimate: this.calculateCostEstimate(model)
};
}
}
2. Provider Health Monitor
// services/provider-health-monitor.js
class ProviderHealthMonitor {
constructor() {
this.checkInterval = 30000; // 30 seconds
this.providers = new Map();
}
async checkProviderHealth(provider) {
try {
const startTime = Date.now();
// Provider-specific health check
const isHealthy = await provider.healthCheck();
const latency = Date.now() - startTime;
return {
status: isHealthy ? 'available' : 'unavailable',
latency: latency,
lastChecked: new Date().toISOString()
};
} catch (error) {
return {
status: 'unavailable',
error: error.message,
lastChecked: new Date().toISOString()
};
}
}
startMonitoring() {
setInterval(() => {
this.providers.forEach(async (provider, name) => {
const health = await this.checkProviderHealth(provider);
this.updateProviderStatus(name, health);
});
}, this.checkInterval);
}
}
3. Cost Calculator Service
// services/cost-calculator.js
class CostCalculator {
calculateMessageCost(model, message) {
const inputTokens = this.estimateTokens(message);
const outputTokens = this.estimateOutputTokens(model);
const inputCost = (inputTokens / 1000000) * model.pricing.inputCost;
const outputCost = (outputTokens / 1000000) * model.pricing.outputCost;
return {
inputTokens,
outputTokens,
totalTokens: inputTokens + outputTokens,
cost: inputCost + outputCost,
breakdown: {
input: inputCost,
output: outputCost
}
};
}
projectMonthlyCost(model, messagesPerDay) {
const avgCostPerMessage = this.getAverageMessageCost(model);
return avgCostPerMessage * messagesPerDay * 30;
}
suggestOptimization(currentUsage) {
const suggestions = [];
if (currentUsage.complexQueryRatio < 0.3) {
suggestions.push({
model: 'claude-3-haiku',
savings: '70%',
reason: 'Most queries are simple'
});
}
if (currentUsage.privateDataRatio > 0.5) {
suggestions.push({
model: 'tinyllama-1.1b',
savings: '100%',
reason: 'High privacy requirement - use local model'
});
}
return suggestions;
}
}
Implementation Phases
Phase 1: Backend Foundation (Week 1)
Objectives:
- Set up model registry service
- Implement provider health monitoring
- Create WebSocket message handlers
- Add model metadata storage
Deliverables:
services/model-registry.jsservices/provider-health-monitor.js- WebSocket protocol extensions
- Model configuration files
Success Criteria:
- Models can be discovered via WebSocket
- Health status updates every 30 seconds
- Model switching works in backend
Phase 2: Core UI Components (Week 2)
Objectives:
- Create model selector button
- Build modal with model grid
- Implement model cards
- Add real-time status indicators
Deliverables:
- Model selector React components
- CSS styling matching mockup
- State management integration
- Basic model switching UI
Success Criteria:
- UI matches design mockup
- Models display with correct metadata
- Click to switch model works
Phase 3: Advanced Features (Week 3)
Objectives:
- Add comparison mode
- Implement cost calculator
- Create smart recommendations
- Add filtering and search
Deliverables:
- Comparison table component
- Cost calculator with projections
- Recommendation engine
- Advanced filtering UI
Success Criteria:
- Users can compare models
- Cost estimates are accurate
- Recommendations are relevant
Phase 4: Integration & Polish (Week 4)
Objectives:
- Integrate with existing chat interface
- Add keyboard shortcuts
- Implement preferences saving
- Performance optimization
Deliverables:
- Full integration with chat
- User preferences persistence
- Keyboard navigation
- Performance improvements
Success Criteria:
- Seamless integration
- <100ms UI response time
- Preferences persist across sessions
Testing Strategy
Unit Tests
describe('ModelRegistry', () => {
test('should return available models', async () => {
const models = await registry.getAvailableModels();
expect(models).toContainEqual(
expect.objectContaining({
id: 'tinyllama-1.1b',
isLocal: true
})
);
});
test('should filter by category', async () => {
const localModels = await registry.getAvailableModels({
category: 'local'
});
expect(localModels.every(m => m.isLocal)).toBe(true);
});
});
Integration Tests
describe('Model Switching', () => {
test('should switch from cloud to local model', async () => {
const ws = new WebSocket('ws://localhost:3002/chat');
ws.send(JSON.stringify({
type: 'switch_model',
modelId: 'tinyllama-1.1b'
}));
const response = await waitForMessage(ws, 'model_switched');
expect(response.currentModel).toBe('tinyllama-1.1b');
});
});
E2E Tests
describe('Model Selector UI', () => {
test('should open selector and switch model', async () => {
await page.click('[data-testid="model-selector-button"]');
await page.waitForSelector('[data-testid="model-grid"]');
await page.click('[data-testid="model-card-tinyllama"]');
const currentModel = await page.textContent(
'[data-testid="current-model-name"]'
);
expect(currentModel).toBe('TinyLlama 1.1B');
});
});
Success Metrics
Technical Metrics
- Model switch time: <500ms
- Health check latency: <100ms
- UI render time: <50ms
- WebSocket message round-trip: <200ms
User Metrics
- Model selection usage: >60% of users
- Average models tried: 3+ per user
- Cost reduction achieved: >50%
- User satisfaction: >4.5/5
Business Metrics
- Support ticket reduction: 30%
- User retention improvement: 20%
- Cost per user reduction: 40%
Security Considerations
API Key Protection
- Never expose API keys to frontend
- Use secure key rotation
Model Access Control
- Enforce model permissions per user tier
- Audit model usage
Local Model Security
- Sandbox local model execution
- Validate model files
Rate Limiting
- Implement per-model rate limits
- Prevent abuse of expensive models
Documentation Requirements
User Documentation
- Model selector user guide
- Model comparison guide
- Cost optimization tips
Developer Documentation
- API reference
- WebSocket protocol docs
- Component documentation
Admin Documentation
- Model configuration guide
- Monitoring setup
- Troubleshooting guide
Next Steps
Implementation Status
Phase 1: Backend Foundation (Completed - 2025-08-06)
- Model Registry Service (
services/model-registry.js) - Provider Health Monitor (
services/provider-health-monitor.js) - Cost Calculator Service (
services/cost-calculator.js) - WebSocket message handlers in server.js
- Support for 7 cloud models and 3 local models
- Real-time health monitoring
- Cost tracking and optimization suggestions
Phase 2: UI Components (Completed - 2025-08-06)
- Modal HTML structure in index-enhanced.html
- CSS styling for modal and model cards
- JavaScript modal management functions
- WebSocket integration for model operations
- Model filtering and search
- Cost display and alerts
- Responsive design and keyboard shortcuts (Cmd/Ctrl+K)
Phase 2.5: API Key Detection Integration (Completed - 2025-08-06)
- Provider availability detection from environment variables
- Model registry integration with provider availability
- Dynamic model availability based on API keys
- Frontend UI updates for unavailable models
- Visual indicators (grayed out, lock icon) for models without API keys
- Tooltips explaining API key requirements
- Prevention of unavailable model selection
Implementation Details:
- Modified
model-registry.jsto accept and use provider availability information - Updated
server.jsto pass provider availability from AI service to model registry - Enhanced frontend to visually distinguish unavailable models
- Added clear messaging about API key configuration requirements
Phase 2.6: OpenRouter Integration & Model Fixes (Completed - 2025-08-06)
- Distinguish between direct API keys and OpenRouter proxy access
- Add OpenRouter-specific models to the registry (4 models)
- Implement connection type badges (Direct vs OpenRouter)
- Fix model ID mapping for OpenRouter API compatibility (modelId field)
- Remove uninstalled Ollama models from registry (kept only TinyLlama)
- Fix Ollama model detection to only show actually installed models
Implementation Details:
- Added OpenRouter-specific models with proper modelId mapping (e.g., 'openai/gpt-4o')
- Updated provider health monitor to check direct API keys only
- Fixed connection type badges to accurately show access method
- Removed Llama 2 7B and Mistral 7B as they weren't installed
- Corrected Ollama availability detection logic to prevent false positives
Phase 3: Advanced Features (Pending)
- Model comparison mode
- Advanced cost analytics
- Smart recommendations engine
- Usage-based suggestions
Phase 4: Integration & Polish (Pending)
- Full integration testing
- Performance optimization
- Documentation updates
- User guide creation
Dependencies
- Existing WebSocket infrastructure
- LLxprt bridge for cloud models
- Ollama service for local models
- React frontend framework
Appendix: Model Configurations
Default Model Set
models:
# Premium Models
- id: claude-3-opus
tier: premium
default: false
- id: gpt-4-turbo
tier: premium
default: false
# Standard Models
- id: claude-3-haiku
tier: standard
default: true
- id: gpt-3.5-turbo
tier: standard
default: false
# Local Models
- id: tinyllama-1.1b
tier: local
default: false
fallback: true
Model Categories
categories:
premium:
name: "Premium Models"
description: "Most capable models for complex tasks"
icon: "ph-crown"
standard:
name: "Standard Models"
description: "Balanced performance and cost"
icon: "ph-star"
local:
name: "Local Models"
description: "Privacy-focused, zero-cost models"
icon: "ph-house"
specialized:
name: "Specialized Models"
description: "Task-specific optimized models"
icon: "ph-wrench"
Document Status: In Progress
Implementation Ready: Yes
Estimated Timeline: 4 weeks
Priority: High
Phase 1 Status: Complete (Backend Services)
Phase 2 Status: Complete (UI Implementation)
Phase 3 Status: Pending (Advanced Features)
Last Updated: 2025-08-06