Running SOTA LLMs Locally: Expert Guide
- High hardware costs: $40K to $50K for a basic setup
- Model quality issues: quantization and REAP techniques can reduce output quality
- Security concerns: isolation systems and potential backdoors
The Buzz Score
The Internet’s Verdict: 70% Hyped, 30% Skeptical
Expert Opinions
Running local LLMs can be expensive and lower quality than expected. As one expert notes:
I play with local LLMs a lot. I’ve spent more on hardware than I should. I’m friends with a local group of people who have spent a lot more than I have. The warning I would have for everyone is to temper your expectations and read the fine print carefully.
Another expert warns about the costs and quality issues:
A great way to go is 2x RTX 3090s for a total of 48GB VRAM total. You can then run Qwen3.6-27B, which is an awesome model. Just want to note that for $3k you can get an M5 macbook pro with 48gb of shared memory, and it will not be a giant box.
Conclusion
Running SOTA LLMs locally can be a complex and expensive endeavor. While some experts swear by the benefits, others warn about the potential drawbacks. As one expert notes:
For qwen3.6-27b you can also run the q4 variant with full ~250K context on one 3090. It’s fast enough to not be frustrating so the speed gains with 2x 3090s wouldn’t be worth it to me.
Focus Keyword: Local LLMs