May 31, 2026 · MCP-first · 7 min read Self-hosting open models without a GPU farm When local inference makes sense, and how quantization and right-sizing let capable open-weight models run on modest hardware, with the tradeoffs spelled out. #Models #Self-hosting