Cline and LM Studio: the local coding stack with Qwen3 Coder 30B
Discussion Activity
Active discussionFirst comment
7h
Peak period
17
Day 1
Avg / period
6.7
Based on 20 loaded comments
Key moments
- 01Story posted
8/31/2025, 2:50:17 PM
79d ago
Step 01 - 02First comment
8/31/2025, 9:25:32 PM
7h after posting
Step 02 - 03Peak activity
17 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
9/3/2025, 5:13:29 AM
77d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
"What you need" only includes software requirements.
So about 700 bucks for a 3090 on eBay
With a 3090 I guess you'd have to reduce context or go for a slightly more aggressive quantization level.
Summarizing llama-arch.cpp which is roughly 40k tokens I get ~50 tok/sec generation speed and ~14 seconds to first token.
For short prompts I get more like ~90 tok/sec and <1 sec to first token.
I didn't do anything fancy and found it to do much better than the experience I had with codex cli and similar quality to Claude Code if I used sonnet or opus.
Honestly the cli stuff was the hardest part but I chose not to use something like crossterm.
(As an aside, my "ideal" language mix would be a pairing of Rust with Python, though the PyO3 interface could be improved.)
Would also love to learn more about your Rust agent + Qwen3!
In python there are hidden sharp edges and depending on what dependencies you use you can get into deadlocks in production without ever knowing you were in danger.
Rust has traits to protect against this. Async in rust is great.
I'd do something like:
let (tx, rx) = std::sync::mpsc::channel(); thread::spawn(move || { // blocking request let response = reqwest::blocking::get(url).unwrap(); tx.send(response.text().unwrap()); });
Or
let (tx, mut rx) = tokio::sync::mpsc::channel(100); tokio::spawn(async move { let response = client.get(url).send().await; tx.send(response).await; });
I've heard of deadlocks when using aiohttp or maybe httpx (e.g. due to hidden async-related globals), but have never managed myself to get any system based on asyncio + concurrent.futures + urllib (i.e. stdlib-only) to deadlock, including w/ some mix of asyncio and threading locks.
If you have 32gb of memory you are not using, it is worth running for small tasks. Otherwise, I would stick with a cloud hosted model.
Begs the question of long-term support, etc...
edit: are you the author? You seem to post a lot from that blog and the blog author's other accounts.
Keep in mind that closed, proprietary models:
1) Use your data internally for training, analytics, and more - because "the data is the moat"
2) Are out of your control - one day something might work, another day it might fail because of a model update, a new "internal" system prompt, or a new guardrail that just simply blocks your task
4) Are built on the "biggest intellectual property theft" of this century, so they should be open and free ;-)
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.