Model Selector Implementation Plan

Generated: 2025-01-08 UTC
Purpose: Comprehensive implementation plan for the Sasha Studio model selector feature
Scope: Backend architecture, WebSocket protocol, UI components, and model discovery

Executive Summary

This document outlines the implementation plan for adding a comprehensive model selector to the Sasha Studio chat interface. The selector will allow users to choose between different AI models including cloud providers (Claude, GPT, Gemini) and local models (TinyLlama), with real-time availability status and cost information.

Architecture Overview

System Components

graph TB subgraph "Frontend Layer" A[Model Selector UI] B[WebSocket Client] C[State Manager] end subgraph "Backend Services" D[Model Registry Service] E[Provider Health Monitor] F[Cost Calculator] G[WebSocket Handler] end subgraph "AI Providers" H[Cloud Providers] I[Local Models] J[Provider APIs] end A --> B B --> G G --> D D --> E E --> J D --> F D --> I style A fill:#e3f2fd style D fill:#f3e5f5 style I fill:#e8f5e9

Model Metadata Schema

Core Model Structure

{
  id: "claude-3-opus",
  provider: "anthropic",
  name: "Claude 3 Opus",
  description: "Most capable model for complex tasks",
  category: "premium",
  capabilities: ["reasoning", "coding", "analysis", "creative"],
  
  pricing: {
    inputCost: 15.00,  // per million tokens
    outputCost: 75.00, // per million tokens
    currency: "USD"
  },
  
  limits: {
    contextWindow: 200000,
    maxOutput: 4096,
    rateLimit: 100 // requests per minute
  },
  
  performance: {
    speed: "medium",      // fast/medium/slow
    quality: "excellent", // good/very-good/excellent
    latency: 500         // ms to first token
  },
  
  availability: {
    status: "available",  // available/degraded/unavailable
    lastChecked: "2025-01-08T10:00:00Z",
    region: "us-east-1"
  },
  
  tags: ["general", "coding", "research"],
  icon: "ph-brain",
  recommended: true,
  isLocal: false,
  requiresAuth: true
}

Local Model Structure (TinyLlama)

{
  id: "tinyllama-1.1b",
  provider: "ollama",
  name: "TinyLlama 1.1B",
  description: "Ultra-fast local model for simple queries",
  category: "local",
  
  capabilities: ["basic-chat", "simple-questions"],
  
  pricing: {
    inputCost: 0,
    outputCost: 0,
    currency: "USD",
    note: "Free - runs locally"
  },
  
  limits: {
    contextWindow: 2048,
    maxOutput: 512
  },
  
  performance: {
    speed: "very-fast",
    quality: "good",
    latency: 50
  },
  
  availability: {
    status: "available",
    isInstalled: true,
    modelSize: "637MB",
    quantization: "q4_k_m"
  },
  
  tags: ["local", "privacy", "offline"],
  icon: "ph-house",
  isLocal: true,
  requiresAuth: false,
  isFallback: true
}

WebSocket Protocol Extensions

Model Discovery Messages

// Client -> Server: Request available models
{
  type: "get_models",
  filters: {
    category: "all", // all/premium/standard/local
    capabilities: [], // optional capability filter
    includeUnavailable: false
  }
}

// Server -> Client: Return model list
{
  type: "models_list",
  models: [...], // Array of model objects
  defaultModel: "claude-3-haiku",
  currentModel: "gpt-4-turbo",
  timestamp: "2025-01-08T10:00:00Z"
}

Model Selection Messages

// Client -> Server: Switch model
{
  type: "switch_model",
  modelId: "claude-3-opus",
  applyToConversation: false // or true to re-run with new model
}

// Server -> Client: Confirm switch
{
  type: "model_switched",
  previousModel: "gpt-4-turbo",
  currentModel: "claude-3-opus",
  costEstimate: {
    perMessage: 0.05,
    per1000Tokens: 0.75
  }
}

Real-time Status Updates

// Server -> Client: Model status change
{
  type: "model_status_update",
  modelId: "gpt-4",
  status: "degraded",
  message: "Experiencing high latency",
  alternativeModels: ["claude-3-opus", "gpt-4-turbo"]
}

// Server -> Client: Cost alert
{
  type: "cost_alert",
  currentSpend: 12.50,
  projectedMonthly: 375.00,
  suggestedModel: "claude-3-haiku",
  savingsEstimate: "60%"
}

UI Component Structure

Model Selector Component Hierarchy

ModelSelector/
├── ModelSelectorButton.jsx       # Trigger button with current model
├── ModelSelectorModal.jsx         # Main modal container
├── ModelGrid.jsx                  # Grid of model cards
│   └── ModelCard.jsx             # Individual model display
├── ModelComparison.jsx           # Side-by-side comparison
├── ModelFilters.jsx              # Category and capability filters
├── CostCalculator.jsx            # Interactive cost estimator
└── ModelStatusIndicator.jsx      # Real-time availability

Key UI Features

Quick Switch Bar
- Recently used models
- Favorite models
- One-click switching
Model Cards
- Visual status indicators
- Key capabilities
- Pricing at a glance
- Performance metrics
Comparison Mode
- Side-by-side feature comparison
- Cost comparison
- Performance benchmarks
Smart Recommendations
- Task-based suggestions
- Cost optimization tips
- Quality vs. speed trade-offs

Backend Implementation

1. Model Registry Service

// services/model-registry.js
class ModelRegistry {
  constructor() {
    this.models = new Map();
    this.providers = new Map();
    this.healthMonitor = new HealthMonitor();
  }
  
  async initialize() {
    // Load model definitions
    await this.loadCloudModels();
    await this.loadLocalModels();
    
    // Start health monitoring
    this.healthMonitor.startMonitoring();
  }
  
  async getAvailableModels(filters = {}) {
    const models = Array.from(this.models.values());
    
    return models.filter(model => {
      // Apply filters
      if (filters.category && model.category !== filters.category) {
        return false;
      }
      
      if (filters.capabilities?.length) {
        const hasCapabilities = filters.capabilities.every(
          cap => model.capabilities.includes(cap)
        );
        if (!hasCapabilities) return false;
      }
      
      if (!filters.includeUnavailable && 
          model.availability.status === 'unavailable') {
        return false;
      }
      
      return true;
    });
  }
  
  async switchModel(sessionId, modelId) {
    const model = this.models.get(modelId);
    if (!model) throw new Error('Model not found');
    
    // Validate availability
    if (model.availability.status === 'unavailable') {
      throw new Error('Model currently unavailable');
    }
    
    // Update session
    await this.updateSessionModel(sessionId, modelId);
    
    // Return confirmation
    return {
      success: true,
      model: model,
      costEstimate: this.calculateCostEstimate(model)
    };
  }
}

2. Provider Health Monitor

// services/provider-health-monitor.js
class ProviderHealthMonitor {
  constructor() {
    this.checkInterval = 30000; // 30 seconds
    this.providers = new Map();
  }
  
  async checkProviderHealth(provider) {
    try {
      const startTime = Date.now();
      
      // Provider-specific health check
      const isHealthy = await provider.healthCheck();
      const latency = Date.now() - startTime;
      
      return {
        status: isHealthy ? 'available' : 'unavailable',
        latency: latency,
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      return {
        status: 'unavailable',
        error: error.message,
        lastChecked: new Date().toISOString()
      };
    }
  }
  
  startMonitoring() {
    setInterval(() => {
      this.providers.forEach(async (provider, name) => {
        const health = await this.checkProviderHealth(provider);
        this.updateProviderStatus(name, health);
      });
    }, this.checkInterval);
  }
}

3. Cost Calculator Service

// services/cost-calculator.js
class CostCalculator {
  calculateMessageCost(model, message) {
    const inputTokens = this.estimateTokens(message);
    const outputTokens = this.estimateOutputTokens(model);
    
    const inputCost = (inputTokens / 1000000) * model.pricing.inputCost;
    const outputCost = (outputTokens / 1000000) * model.pricing.outputCost;
    
    return {
      inputTokens,
      outputTokens,
      totalTokens: inputTokens + outputTokens,
      cost: inputCost + outputCost,
      breakdown: {
        input: inputCost,
        output: outputCost
      }
    };
  }
  
  projectMonthlyCost(model, messagesPerDay) {
    const avgCostPerMessage = this.getAverageMessageCost(model);
    return avgCostPerMessage * messagesPerDay * 30;
  }
  
  suggestOptimization(currentUsage) {
    const suggestions = [];
    
    if (currentUsage.complexQueryRatio < 0.3) {
      suggestions.push({
        model: 'claude-3-haiku',
        savings: '70%',
        reason: 'Most queries are simple'
      });
    }
    
    if (currentUsage.privateDataRatio > 0.5) {
      suggestions.push({
        model: 'tinyllama-1.1b',
        savings: '100%',
        reason: 'High privacy requirement - use local model'
      });
    }
    
    return suggestions;
  }
}

Implementation Phases

Phase 1: Backend Foundation (Week 1)

Objectives:

Set up model registry service
Implement provider health monitoring
Create WebSocket message handlers
Add model metadata storage

Deliverables:

services/model-registry.js
services/provider-health-monitor.js
WebSocket protocol extensions
Model configuration files

Success Criteria:

Models can be discovered via WebSocket
Health status updates every 30 seconds
Model switching works in backend

Phase 2: Core UI Components (Week 2)

Objectives:

Create model selector button
Build modal with model grid
Implement model cards
Add real-time status indicators

Deliverables:

Model selector React components
CSS styling matching mockup
State management integration
Basic model switching UI

Success Criteria:

UI matches design mockup
Models display with correct metadata
Click to switch model works

Phase 3: Advanced Features (Week 3)

Objectives:

Add comparison mode
Implement cost calculator
Create smart recommendations
Add filtering and search

Deliverables:

Comparison table component
Cost calculator with projections
Recommendation engine
Advanced filtering UI

Success Criteria:

Users can compare models
Cost estimates are accurate
Recommendations are relevant

Phase 4: Integration & Polish (Week 4)

Objectives:

Integrate with existing chat interface
Add keyboard shortcuts
Implement preferences saving
Performance optimization

Deliverables:

Full integration with chat
User preferences persistence
Keyboard navigation
Performance improvements

Success Criteria:

Seamless integration
<100ms UI response time
Preferences persist across sessions

Testing Strategy

Unit Tests

describe('ModelRegistry', () => {
  test('should return available models', async () => {
    const models = await registry.getAvailableModels();
    expect(models).toContainEqual(
      expect.objectContaining({
        id: 'tinyllama-1.1b',
        isLocal: true
      })
    );
  });
  
  test('should filter by category', async () => {
    const localModels = await registry.getAvailableModels({
      category: 'local'
    });
    expect(localModels.every(m => m.isLocal)).toBe(true);
  });
});

Integration Tests

describe('Model Switching', () => {
  test('should switch from cloud to local model', async () => {
    const ws = new WebSocket('ws://localhost:3002/chat');
    
    ws.send(JSON.stringify({
      type: 'switch_model',
      modelId: 'tinyllama-1.1b'
    }));
    
    const response = await waitForMessage(ws, 'model_switched');
    expect(response.currentModel).toBe('tinyllama-1.1b');
  });
});

E2E Tests

describe('Model Selector UI', () => {
  test('should open selector and switch model', async () => {
    await page.click('[data-testid="model-selector-button"]');
    await page.waitForSelector('[data-testid="model-grid"]');
    
    await page.click('[data-testid="model-card-tinyllama"]');
    
    const currentModel = await page.textContent(
      '[data-testid="current-model-name"]'
    );
    expect(currentModel).toBe('TinyLlama 1.1B');
  });
});

Success Metrics

Technical Metrics

Model switch time: <500ms
Health check latency: <100ms
UI render time: <50ms
WebSocket message round-trip: <200ms

User Metrics

Model selection usage: >60% of users
Average models tried: 3+ per user
Cost reduction achieved: >50%
User satisfaction: >4.5/5

Business Metrics

Support ticket reduction: 30%
User retention improvement: 20%
Cost per user reduction: 40%

Security Considerations

API Key Protection
- Never expose API keys to frontend
- Use secure key rotation
Model Access Control
- Enforce model permissions per user tier
- Audit model usage
Local Model Security
- Sandbox local model execution
- Validate model files
Rate Limiting
- Implement per-model rate limits
- Prevent abuse of expensive models

Documentation Requirements

User Documentation
- Model selector user guide
- Model comparison guide
- Cost optimization tips
Developer Documentation
- API reference
- WebSocket protocol docs
- Component documentation
Admin Documentation
- Model configuration guide
- Monitoring setup
- Troubleshooting guide

Next Steps

Implementation Status

Phase 1: Backend Foundation (Completed - 2025-08-06)

Model Registry Service (services/model-registry.js)
Provider Health Monitor (services/provider-health-monitor.js)
Cost Calculator Service (services/cost-calculator.js)
WebSocket message handlers in server.js
Support for 7 cloud models and 3 local models
Real-time health monitoring
Cost tracking and optimization suggestions

Phase 2: UI Components (Completed - 2025-08-06)

Modal HTML structure in index-enhanced.html
CSS styling for modal and model cards
JavaScript modal management functions
WebSocket integration for model operations
Model filtering and search
Cost display and alerts
Responsive design and keyboard shortcuts (Cmd/Ctrl+K)

Phase 2.5: API Key Detection Integration (Completed - 2025-08-06)

Provider availability detection from environment variables
Model registry integration with provider availability
Dynamic model availability based on API keys
Frontend UI updates for unavailable models
Visual indicators (grayed out, lock icon) for models without API keys
Tooltips explaining API key requirements
Prevention of unavailable model selection

Implementation Details:

Modified model-registry.js to accept and use provider availability information
Updated server.js to pass provider availability from AI service to model registry
Enhanced frontend to visually distinguish unavailable models
Added clear messaging about API key configuration requirements

Phase 2.6: OpenRouter Integration & Model Fixes (Completed - 2025-08-06)

Distinguish between direct API keys and OpenRouter proxy access
Add OpenRouter-specific models to the registry (4 models)
Implement connection type badges (Direct vs OpenRouter)
Fix model ID mapping for OpenRouter API compatibility (modelId field)
Remove uninstalled Ollama models from registry (kept only TinyLlama)
Fix Ollama model detection to only show actually installed models

Implementation Details:

Added OpenRouter-specific models with proper modelId mapping (e.g., 'openai/gpt-4o')
Updated provider health monitor to check direct API keys only
Fixed connection type badges to accurately show access method
Removed Llama 2 7B and Mistral 7B as they weren't installed
Corrected Ollama availability detection logic to prevent false positives

Phase 3: Advanced Features (Pending)

Model comparison mode
Advanced cost analytics
Smart recommendations engine
Usage-based suggestions

Phase 4: Integration & Polish (Pending)

Full integration testing
Performance optimization
Documentation updates
User guide creation

Dependencies

Existing WebSocket infrastructure
LLxprt bridge for cloud models
Ollama service for local models
React frontend framework

Appendix: Model Configurations

Default Model Set

models:
  # Premium Models
  - id: claude-3-opus
    tier: premium
    default: false
    
  - id: gpt-4-turbo
    tier: premium
    default: false
    
  # Standard Models  
  - id: claude-3-haiku
    tier: standard
    default: true
    
  - id: gpt-3.5-turbo
    tier: standard
    default: false
    
  # Local Models
  - id: tinyllama-1.1b
    tier: local
    default: false
    fallback: true

Model Categories

categories:
  premium:
    name: "Premium Models"
    description: "Most capable models for complex tasks"
    icon: "ph-crown"
    
  standard:
    name: "Standard Models"
    description: "Balanced performance and cost"
    icon: "ph-star"
    
  local:
    name: "Local Models"
    description: "Privacy-focused, zero-cost models"
    icon: "ph-house"
    
  specialized:
    name: "Specialized Models"
    description: "Task-specific optimized models"
    icon: "ph-wrench"

Document Status: In Progress
Implementation Ready: Yes
Estimated Timeline: 4 weeks
Priority: High
Phase 1 Status: Complete (Backend Services)
Phase 2 Status: Complete (UI Implementation)
Phase 3 Status: Pending (Advanced Features)
Last Updated: 2025-08-06