Last updated: Aug 12, 2025, 11:07 AM UTC

Model Selector Implementation Plan

Generated: 2025-01-08 UTC
Purpose: Comprehensive implementation plan for the Sasha Studio model selector feature
Scope: Backend architecture, WebSocket protocol, UI components, and model discovery


Executive Summary

This document outlines the implementation plan for adding a comprehensive model selector to the Sasha Studio chat interface. The selector will allow users to choose between different AI models including cloud providers (Claude, GPT, Gemini) and local models (TinyLlama), with real-time availability status and cost information.


Architecture Overview

System Components

graph TB subgraph "Frontend Layer" A[Model Selector UI] B[WebSocket Client] C[State Manager] end subgraph "Backend Services" D[Model Registry Service] E[Provider Health Monitor] F[Cost Calculator] G[WebSocket Handler] end subgraph "AI Providers" H[Cloud Providers] I[Local Models] J[Provider APIs] end A --> B B --> G G --> D D --> E E --> J D --> F D --> I style A fill:#e3f2fd style D fill:#f3e5f5 style I fill:#e8f5e9

Model Metadata Schema

Core Model Structure

{
  id: "claude-3-opus",
  provider: "anthropic",
  name: "Claude 3 Opus",
  description: "Most capable model for complex tasks",
  category: "premium",
  capabilities: ["reasoning", "coding", "analysis", "creative"],
  
  pricing: {
    inputCost: 15.00,  // per million tokens
    outputCost: 75.00, // per million tokens
    currency: "USD"
  },
  
  limits: {
    contextWindow: 200000,
    maxOutput: 4096,
    rateLimit: 100 // requests per minute
  },
  
  performance: {
    speed: "medium",      // fast/medium/slow
    quality: "excellent", // good/very-good/excellent
    latency: 500         // ms to first token
  },
  
  availability: {
    status: "available",  // available/degraded/unavailable
    lastChecked: "2025-01-08T10:00:00Z",
    region: "us-east-1"
  },
  
  tags: ["general", "coding", "research"],
  icon: "ph-brain",
  recommended: true,
  isLocal: false,
  requiresAuth: true
}

Local Model Structure (TinyLlama)

{
  id: "tinyllama-1.1b",
  provider: "ollama",
  name: "TinyLlama 1.1B",
  description: "Ultra-fast local model for simple queries",
  category: "local",
  
  capabilities: ["basic-chat", "simple-questions"],
  
  pricing: {
    inputCost: 0,
    outputCost: 0,
    currency: "USD",
    note: "Free - runs locally"
  },
  
  limits: {
    contextWindow: 2048,
    maxOutput: 512
  },
  
  performance: {
    speed: "very-fast",
    quality: "good",
    latency: 50
  },
  
  availability: {
    status: "available",
    isInstalled: true,
    modelSize: "637MB",
    quantization: "q4_k_m"
  },
  
  tags: ["local", "privacy", "offline"],
  icon: "ph-house",
  isLocal: true,
  requiresAuth: false,
  isFallback: true
}

WebSocket Protocol Extensions

Model Discovery Messages

// Client -> Server: Request available models
{
  type: "get_models",
  filters: {
    category: "all", // all/premium/standard/local
    capabilities: [], // optional capability filter
    includeUnavailable: false
  }
}

// Server -> Client: Return model list
{
  type: "models_list",
  models: [...], // Array of model objects
  defaultModel: "claude-3-haiku",
  currentModel: "gpt-4-turbo",
  timestamp: "2025-01-08T10:00:00Z"
}

Model Selection Messages

// Client -> Server: Switch model
{
  type: "switch_model",
  modelId: "claude-3-opus",
  applyToConversation: false // or true to re-run with new model
}

// Server -> Client: Confirm switch
{
  type: "model_switched",
  previousModel: "gpt-4-turbo",
  currentModel: "claude-3-opus",
  costEstimate: {
    perMessage: 0.05,
    per1000Tokens: 0.75
  }
}

Real-time Status Updates

// Server -> Client: Model status change
{
  type: "model_status_update",
  modelId: "gpt-4",
  status: "degraded",
  message: "Experiencing high latency",
  alternativeModels: ["claude-3-opus", "gpt-4-turbo"]
}

// Server -> Client: Cost alert
{
  type: "cost_alert",
  currentSpend: 12.50,
  projectedMonthly: 375.00,
  suggestedModel: "claude-3-haiku",
  savingsEstimate: "60%"
}

UI Component Structure

Model Selector Component Hierarchy

ModelSelector/
β”œβ”€β”€ ModelSelectorButton.jsx       # Trigger button with current model
β”œβ”€β”€ ModelSelectorModal.jsx         # Main modal container
β”œβ”€β”€ ModelGrid.jsx                  # Grid of model cards
β”‚   └── ModelCard.jsx             # Individual model display
β”œβ”€β”€ ModelComparison.jsx           # Side-by-side comparison
β”œβ”€β”€ ModelFilters.jsx              # Category and capability filters
β”œβ”€β”€ CostCalculator.jsx            # Interactive cost estimator
└── ModelStatusIndicator.jsx      # Real-time availability

Key UI Features

  1. Quick Switch Bar

    • Recently used models
    • Favorite models
    • One-click switching
  2. Model Cards

    • Visual status indicators
    • Key capabilities
    • Pricing at a glance
    • Performance metrics
  3. Comparison Mode

    • Side-by-side feature comparison
    • Cost comparison
    • Performance benchmarks
  4. Smart Recommendations

    • Task-based suggestions
    • Cost optimization tips
    • Quality vs. speed trade-offs

Backend Implementation

1. Model Registry Service

// services/model-registry.js
class ModelRegistry {
  constructor() {
    this.models = new Map();
    this.providers = new Map();
    this.healthMonitor = new HealthMonitor();
  }
  
  async initialize() {
    // Load model definitions
    await this.loadCloudModels();
    await this.loadLocalModels();
    
    // Start health monitoring
    this.healthMonitor.startMonitoring();
  }
  
  async getAvailableModels(filters = {}) {
    const models = Array.from(this.models.values());
    
    return models.filter(model => {
      // Apply filters
      if (filters.category && model.category !== filters.category) {
        return false;
      }
      
      if (filters.capabilities?.length) {
        const hasCapabilities = filters.capabilities.every(
          cap => model.capabilities.includes(cap)
        );
        if (!hasCapabilities) return false;
      }
      
      if (!filters.includeUnavailable && 
          model.availability.status === 'unavailable') {
        return false;
      }
      
      return true;
    });
  }
  
  async switchModel(sessionId, modelId) {
    const model = this.models.get(modelId);
    if (!model) throw new Error('Model not found');
    
    // Validate availability
    if (model.availability.status === 'unavailable') {
      throw new Error('Model currently unavailable');
    }
    
    // Update session
    await this.updateSessionModel(sessionId, modelId);
    
    // Return confirmation
    return {
      success: true,
      model: model,
      costEstimate: this.calculateCostEstimate(model)
    };
  }
}

2. Provider Health Monitor

// services/provider-health-monitor.js
class ProviderHealthMonitor {
  constructor() {
    this.checkInterval = 30000; // 30 seconds
    this.providers = new Map();
  }
  
  async checkProviderHealth(provider) {
    try {
      const startTime = Date.now();
      
      // Provider-specific health check
      const isHealthy = await provider.healthCheck();
      const latency = Date.now() - startTime;
      
      return {
        status: isHealthy ? 'available' : 'unavailable',
        latency: latency,
        lastChecked: new Date().toISOString()
      };
    } catch (error) {
      return {
        status: 'unavailable',
        error: error.message,
        lastChecked: new Date().toISOString()
      };
    }
  }
  
  startMonitoring() {
    setInterval(() => {
      this.providers.forEach(async (provider, name) => {
        const health = await this.checkProviderHealth(provider);
        this.updateProviderStatus(name, health);
      });
    }, this.checkInterval);
  }
}

3. Cost Calculator Service

// services/cost-calculator.js
class CostCalculator {
  calculateMessageCost(model, message) {
    const inputTokens = this.estimateTokens(message);
    const outputTokens = this.estimateOutputTokens(model);
    
    const inputCost = (inputTokens / 1000000) * model.pricing.inputCost;
    const outputCost = (outputTokens / 1000000) * model.pricing.outputCost;
    
    return {
      inputTokens,
      outputTokens,
      totalTokens: inputTokens + outputTokens,
      cost: inputCost + outputCost,
      breakdown: {
        input: inputCost,
        output: outputCost
      }
    };
  }
  
  projectMonthlyCost(model, messagesPerDay) {
    const avgCostPerMessage = this.getAverageMessageCost(model);
    return avgCostPerMessage * messagesPerDay * 30;
  }
  
  suggestOptimization(currentUsage) {
    const suggestions = [];
    
    if (currentUsage.complexQueryRatio < 0.3) {
      suggestions.push({
        model: 'claude-3-haiku',
        savings: '70%',
        reason: 'Most queries are simple'
      });
    }
    
    if (currentUsage.privateDataRatio > 0.5) {
      suggestions.push({
        model: 'tinyllama-1.1b',
        savings: '100%',
        reason: 'High privacy requirement - use local model'
      });
    }
    
    return suggestions;
  }
}

Implementation Phases

Phase 1: Backend Foundation (Week 1)

Objectives:

  • Set up model registry service
  • Implement provider health monitoring
  • Create WebSocket message handlers
  • Add model metadata storage

Deliverables:

  1. services/model-registry.js
  2. services/provider-health-monitor.js
  3. WebSocket protocol extensions
  4. Model configuration files

Success Criteria:

  • Models can be discovered via WebSocket
  • Health status updates every 30 seconds
  • Model switching works in backend

Phase 2: Core UI Components (Week 2)

Objectives:

  • Create model selector button
  • Build modal with model grid
  • Implement model cards
  • Add real-time status indicators

Deliverables:

  1. Model selector React components
  2. CSS styling matching mockup
  3. State management integration
  4. Basic model switching UI

Success Criteria:

  • UI matches design mockup
  • Models display with correct metadata
  • Click to switch model works

Phase 3: Advanced Features (Week 3)

Objectives:

  • Add comparison mode
  • Implement cost calculator
  • Create smart recommendations
  • Add filtering and search

Deliverables:

  1. Comparison table component
  2. Cost calculator with projections
  3. Recommendation engine
  4. Advanced filtering UI

Success Criteria:

  • Users can compare models
  • Cost estimates are accurate
  • Recommendations are relevant

Phase 4: Integration & Polish (Week 4)

Objectives:

  • Integrate with existing chat interface
  • Add keyboard shortcuts
  • Implement preferences saving
  • Performance optimization

Deliverables:

  1. Full integration with chat
  2. User preferences persistence
  3. Keyboard navigation
  4. Performance improvements

Success Criteria:

  • Seamless integration
  • <100ms UI response time
  • Preferences persist across sessions

Testing Strategy

Unit Tests

describe('ModelRegistry', () => {
  test('should return available models', async () => {
    const models = await registry.getAvailableModels();
    expect(models).toContainEqual(
      expect.objectContaining({
        id: 'tinyllama-1.1b',
        isLocal: true
      })
    );
  });
  
  test('should filter by category', async () => {
    const localModels = await registry.getAvailableModels({
      category: 'local'
    });
    expect(localModels.every(m => m.isLocal)).toBe(true);
  });
});

Integration Tests

describe('Model Switching', () => {
  test('should switch from cloud to local model', async () => {
    const ws = new WebSocket('ws://localhost:3002/chat');
    
    ws.send(JSON.stringify({
      type: 'switch_model',
      modelId: 'tinyllama-1.1b'
    }));
    
    const response = await waitForMessage(ws, 'model_switched');
    expect(response.currentModel).toBe('tinyllama-1.1b');
  });
});

E2E Tests

describe('Model Selector UI', () => {
  test('should open selector and switch model', async () => {
    await page.click('[data-testid="model-selector-button"]');
    await page.waitForSelector('[data-testid="model-grid"]');
    
    await page.click('[data-testid="model-card-tinyllama"]');
    
    const currentModel = await page.textContent(
      '[data-testid="current-model-name"]'
    );
    expect(currentModel).toBe('TinyLlama 1.1B');
  });
});

Success Metrics

Technical Metrics

  • Model switch time: <500ms
  • Health check latency: <100ms
  • UI render time: <50ms
  • WebSocket message round-trip: <200ms

User Metrics

  • Model selection usage: >60% of users
  • Average models tried: 3+ per user
  • Cost reduction achieved: >50%
  • User satisfaction: >4.5/5

Business Metrics

  • Support ticket reduction: 30%
  • User retention improvement: 20%
  • Cost per user reduction: 40%

Security Considerations

  1. API Key Protection

    • Never expose API keys to frontend
    • Use secure key rotation
  2. Model Access Control

    • Enforce model permissions per user tier
    • Audit model usage
  3. Local Model Security

    • Sandbox local model execution
    • Validate model files
  4. Rate Limiting

    • Implement per-model rate limits
    • Prevent abuse of expensive models

Documentation Requirements

  1. User Documentation

    • Model selector user guide
    • Model comparison guide
    • Cost optimization tips
  2. Developer Documentation

    • API reference
    • WebSocket protocol docs
    • Component documentation
  3. Admin Documentation

    • Model configuration guide
    • Monitoring setup
    • Troubleshooting guide

Next Steps

Implementation Status

Phase 1: Backend Foundation (Completed - 2025-08-06)

  • Model Registry Service (services/model-registry.js)
  • Provider Health Monitor (services/provider-health-monitor.js)
  • Cost Calculator Service (services/cost-calculator.js)
  • WebSocket message handlers in server.js
  • Support for 7 cloud models and 3 local models
  • Real-time health monitoring
  • Cost tracking and optimization suggestions

Phase 2: UI Components (Completed - 2025-08-06)

  • Modal HTML structure in index-enhanced.html
  • CSS styling for modal and model cards
  • JavaScript modal management functions
  • WebSocket integration for model operations
  • Model filtering and search
  • Cost display and alerts
  • Responsive design and keyboard shortcuts (Cmd/Ctrl+K)

Phase 2.5: API Key Detection Integration (Completed - 2025-08-06)

  • Provider availability detection from environment variables
  • Model registry integration with provider availability
  • Dynamic model availability based on API keys
  • Frontend UI updates for unavailable models
  • Visual indicators (grayed out, lock icon) for models without API keys
  • Tooltips explaining API key requirements
  • Prevention of unavailable model selection

Implementation Details:

  • Modified model-registry.js to accept and use provider availability information
  • Updated server.js to pass provider availability from AI service to model registry
  • Enhanced frontend to visually distinguish unavailable models
  • Added clear messaging about API key configuration requirements

Phase 2.6: OpenRouter Integration & Model Fixes (Completed - 2025-08-06)

  • Distinguish between direct API keys and OpenRouter proxy access
  • Add OpenRouter-specific models to the registry (4 models)
  • Implement connection type badges (Direct vs OpenRouter)
  • Fix model ID mapping for OpenRouter API compatibility (modelId field)
  • Remove uninstalled Ollama models from registry (kept only TinyLlama)
  • Fix Ollama model detection to only show actually installed models

Implementation Details:

  • Added OpenRouter-specific models with proper modelId mapping (e.g., 'openai/gpt-4o')
  • Updated provider health monitor to check direct API keys only
  • Fixed connection type badges to accurately show access method
  • Removed Llama 2 7B and Mistral 7B as they weren't installed
  • Corrected Ollama availability detection logic to prevent false positives

Phase 3: Advanced Features (Pending)

  • Model comparison mode
  • Advanced cost analytics
  • Smart recommendations engine
  • Usage-based suggestions

Phase 4: Integration & Polish (Pending)

  • Full integration testing
  • Performance optimization
  • Documentation updates
  • User guide creation

Dependencies

  • Existing WebSocket infrastructure
  • LLxprt bridge for cloud models
  • Ollama service for local models
  • React frontend framework

Appendix: Model Configurations

Default Model Set

models:
  # Premium Models
  - id: claude-3-opus
    tier: premium
    default: false
    
  - id: gpt-4-turbo
    tier: premium
    default: false
    
  # Standard Models  
  - id: claude-3-haiku
    tier: standard
    default: true
    
  - id: gpt-3.5-turbo
    tier: standard
    default: false
    
  # Local Models
  - id: tinyllama-1.1b
    tier: local
    default: false
    fallback: true

Model Categories

categories:
  premium:
    name: "Premium Models"
    description: "Most capable models for complex tasks"
    icon: "ph-crown"
    
  standard:
    name: "Standard Models"
    description: "Balanced performance and cost"
    icon: "ph-star"
    
  local:
    name: "Local Models"
    description: "Privacy-focused, zero-cost models"
    icon: "ph-house"
    
  specialized:
    name: "Specialized Models"
    description: "Task-specific optimized models"
    icon: "ph-wrench"

Document Status: In Progress
Implementation Ready: Yes
Estimated Timeline: 4 weeks
Priority: High
Phase 1 Status: Complete (Backend Services)
Phase 2 Status: Complete (UI Implementation)
Phase 3 Status: Pending (Advanced Features)
Last Updated: 2025-08-06