Last updated: Aug 12, 2025, 01:09 PM UTC

Lessons Learnt

A collection of important lessons from developing and deploying Sasha Studio.

2025-08-09: Production Deployment and Container Health Checks

1. Container Health Checks Must Be First Priority

Issue: Sliplane deployment failing because health check endpoints were hanging
Root Cause: Health endpoints placed AFTER middleware (CORS, body parsers) in Express routing
Problem: Middleware was blocking/delaying responses, causing health checks to timeout
Solution:

  • Move health endpoints to very beginning of middleware stack
  • Add simple /health endpoint returning immediate 200 OK response
  • Keep detailed /api/health for monitoring but ensure it's also before auth middleware
    Lesson:
  • Container orchestrators need instant health responses (< 1s)
  • Health endpoints should never depend on authentication, database, or complex middleware
  • Place health routes BEFORE any app.use() statements

2. Database Reset in Production Requires Proper Timing

Issue: RESET_DATABASE environment variable not working in Sliplane deployment
Root Cause: Database reset logic executed BEFORE tables were created
Problem: DELETE statements failed because tables didn't exist yet
Solution:

  • Move RESET_DATABASE logic AFTER init.sql execution
  • Add comprehensive logging to verify reset success
  • Delete from ALL related tables, not just users table
  • Verify final state (0 users, 0 profiles)
    Lesson:
  • Database initialization order matters critically
  • Always verify reset operations with logging and final state checks
  • Consider foreign key constraints when deleting data

3. Container Image Caching Can Hide Deployments

Issue: Webhooks triggering successfully but old code still running
Root Cause: Container platform using cached "latest" tag instead of pulling new image
Problem: Multiple deployments appeared successful but no changes were visible
Solution:

  • Use timestamp-based tags (20250809-192853) instead of "latest"
  • Update deployment scripts to use specific tags
  • Manually update image version in platform dashboard when needed
    Lesson:
  • "latest" tag is often cached aggressively by container platforms
  • Always use specific version tags for production deployments
  • Verify actual container image being used, not just webhook success

4. Production Debugging Requires Container Platform Expertise

Issue: Hard to diagnose why deployments weren't working
Breakthrough: Reading container platform logs showed exact error messages
Solution:

  • Check platform-specific logs (Sliplane logs tab)
  • Look for health check failure messages
  • Monitor both build logs and runtime logs
  • Use platform webhooks for automated deployments
    Lesson:
  • Each container platform (Sliplane, Heroku, Cloud Run) has unique behavior
  • Platform logs are more reliable than webhook responses
  • Understand platform-specific health check requirements

2025-08-09: Docker Deployment and CLI Integration

1. Always Verify CLI Flags Exist

Issue: Added --project workspace flag to Claude CLI for Docker deployments
Problem: The flag doesn't exist in Claude CLI, causing "unknown option '--project'" error
Lesson:

  • Always check CLI documentation/help before adding new flags
  • Test in the target environment immediately after making changes
  • Don't assume flags from other tools or older versions exist

2. Git Commits Can Break Working Code

Issue: Sliplane deployment commit introduced breaking changes to previously working Docker setup
Problem: Multiple changes across files created cascading issues
Lesson:

  • Review commits carefully, especially when they touch multiple files
  • Test in all environments (Docker, local) after deployment-related changes
  • Be cautious of "standardization" commits that change core functionality

3. Complex Workarounds Often Create More Problems

Issue: Added Nginx reverse proxy to work around Docker Desktop vpnkit issues
Problem:

  • Broke WebSocket connections
  • Added complexity without solving the core issue
  • Made debugging harder with logs split across services
    Lesson:
  • Fix root causes, not symptoms
  • If a platform has fundamental issues (vpnkit), use a different platform
  • Simple solutions are usually better than complex workarounds

4. Docker Desktop for Mac Has Networking Limitations

Issue: File uploads failing with timeouts in Docker Desktop
Root Cause: vpnkit networking layer has bugs with HTTP keep-alive connections
Lesson:

  • Be aware of platform-specific limitations
  • Docker Desktop != Linux Docker
  • Consider alternative container runtimes (Podman) or Linux VMs for development

5. Authentication Flows Need Clear Testing

Issue: User asked if they should be redirected to login when not authenticated
Reality: The feature was already implemented and working
Lesson:

  • Document authentication flows clearly
  • Provide testing instructions (e.g., "delete localStorage auth-token")
  • Make authentication state visible in the UI

2025-08-08: State Management and React

1. React State Updates Can Be Tricky

Issue: User messages disappearing in chat interface
Root Cause: State being overwritten instead of merged
Lesson: Always carefully manage state merging, especially with arrays

2. Markdown Rendering Requires Version-Specific Handling

Issue: Inline code showing as full code blocks
Root Cause: react-markdown v10 changed how it passes parameters
Lesson: Test thoroughly when upgrading markdown libraries

General Development Principles

1. Test in Target Environment Early

Don't assume local development matches production/Docker behavior

2. Document Experiments

When trying workarounds (like Nginx proxy), document why and what was learned

3. Check Your Assumptions

  • CLI flags might not exist
  • Libraries might have changed behavior
  • Platform differences matter (Mac vs Linux)

4. Simple > Complex

Avoid adding layers (proxies, middleware) to fix issues - address root causes

5. Version Control is Your Friend

Use git blame/log to understand how problematic code got introduced

Docker-Specific Lessons

1. Container Environments Need Special Handling

  • Paths are different (/app/workspaces vs local paths)
  • Permissions matter more
  • Network behavior differs from host

2. Environment Variables Drive Behavior

  • RUNNING_IN_DOCKER changes code paths
  • Test with the same env vars as production

3. Volume Mounts Can Cause Issues

  • Permission problems between host and container
  • Path mapping confusion
  • Sync issues with file watchers

Debugging Strategies That Worked

  1. Check git history first - Understanding when/why code was added helps fix it
  2. Read the actual CLI help - Don't assume flags exist
  3. Test incrementally - Remove one thing at a time (like Nginx)
  4. Check logs at every layer - Container logs, app logs, proxy logs
  5. Verify environment variables - They control critical code paths