Cline and Lm Studio: the Local Coding Stack with Qwen3 Coder 30b
Posted4 months agoActive4 months ago
cline.botTechstory
calmmixed
Debate
60/100
AI Coding AssistantsLocal ModelsOpen-Source Software
Key topics
AI Coding Assistants
Local Models
Open-Source Software
The post discusses using Cline and LM Studio with Qwen3 Coder 30B for local coding tasks, with commenters sharing their experiences and concerns about the model's performance, hardware requirements, and security vulnerabilities.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
7h
Peak period
9
6-12h
Avg / period
4
Comment distribution20 data points
Loading chart...
Based on 20 loaded comments
Key moments
- 01Story posted
Aug 31, 2025 at 10:50 AM EDT
4 months ago
Step 01 - 02First comment
Aug 31, 2025 at 5:25 PM EDT
7h after posting
Step 02 - 03Peak activity
9 comments in 6-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 3, 2025 at 1:13 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45083582Type: storyLast synced: 11/20/2025, 2:52:47 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
"What you need" only includes software requirements.
So about 700 bucks for a 3090 on eBay
With a 3090 I guess you'd have to reduce context or go for a slightly more aggressive quantization level.
Summarizing llama-arch.cpp which is roughly 40k tokens I get ~50 tok/sec generation speed and ~14 seconds to first token.
For short prompts I get more like ~90 tok/sec and <1 sec to first token.
I didn't do anything fancy and found it to do much better than the experience I had with codex cli and similar quality to Claude Code if I used sonnet or opus.
Honestly the cli stuff was the hardest part but I chose not to use something like crossterm.
(As an aside, my "ideal" language mix would be a pairing of Rust with Python, though the PyO3 interface could be improved.)
Would also love to learn more about your Rust agent + Qwen3!
In python there are hidden sharp edges and depending on what dependencies you use you can get into deadlocks in production without ever knowing you were in danger.
Rust has traits to protect against this. Async in rust is great.
I'd do something like:
let (tx, rx) = std::sync::mpsc::channel(); thread::spawn(move || { // blocking request let response = reqwest::blocking::get(url).unwrap(); tx.send(response.text().unwrap()); });
Or
let (tx, mut rx) = tokio::sync::mpsc::channel(100); tokio::spawn(async move { let response = client.get(url).send().await; tx.send(response).await; });
I've heard of deadlocks when using aiohttp or maybe httpx (e.g. due to hidden async-related globals), but have never managed myself to get any system based on asyncio + concurrent.futures + urllib (i.e. stdlib-only) to deadlock, including w/ some mix of asyncio and threading locks.
If you have 32gb of memory you are not using, it is worth running for small tasks. Otherwise, I would stick with a cloud hosted model.
Begs the question of long-term support, etc...
edit: are you the author? You seem to post a lot from that blog and the blog author's other accounts.
Keep in mind that closed, proprietary models:
1) Use your data internally for training, analytics, and more - because "the data is the moat"
2) Are out of your control - one day something might work, another day it might fail because of a model update, a new "internal" system prompt, or a new guardrail that just simply blocks your task
4) Are built on the "biggest intellectual property theft" of this century, so they should be open and free ;-)