Run Interactive Commands in Gemini CLI
Key topics
Google's Gemini CLI now supports running interactive commands, but the community is divided on its usefulness and reliability, with some users praising its potential while others criticize its performance and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6d
Peak period
57
156-168h
Avg / period
19.3
Based on 77 loaded comments
Key moments
- 01Story posted
Oct 16, 2025 at 10:31 AM EDT
3 months ago
Step 01 - 02First comment
Oct 22, 2025 at 10:31 PM EDT
6d after posting
Step 02 - 03Peak activity
57 comments in 156-168h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 24, 2025 at 9:44 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I've had to convince it to do things it should just be able to do but thinks it can't for some reason. Like reading from a file outside of the project directory- it can do it fine, but refuses to unless you convince it that no it actually can.
Also has inserted "\n" instead of newlines on a number of occasions.
I'd argue these behaviors are much more important than being able to use interactive commands.
Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Codex is much better at following system instructions but the CLI is..... very bad.
Claude is better at this.
I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).
Starting to feel like LLMs models are more of a representation of the culture of the company training them, than a fair representation of the world at large.
Yup, I've tried to use Gemini so many times, but the lack of being able to strictly follow system prompts makes it so hard to get useful stuff out of it that doesn't need to be cleaned out. Code comments is short of impossible to get rid of, they must have trained it with only code that has comments, because the model really likes to add them everywhere.
Every agent+model combination has issues right now, I'm personally swapping between them depending on the task.
Gemini is great for stuff you need fast and don't care about the quality, as you can just throw it away.
Claude Code + Sonnet is great in many ways and follows prompts way better, but has a tendency to go off on tangents and really get lost in the woods. It requires handholding and basically interrupt it as soon as you see something weird, to steer it in the right direction. Complex stuff has to be aggressively split into smaller validated sub-tasks manually. Tends to also stop continuing by itself to say "Well, we've done half now, you want me to continue with the other half?"
Codex + GPT-5is the best at following prompts, produces the highest quality code, but is way slower than others, and still struggles with seemingly arbitrary stuff yet able to solve complex tasks by itself without any hand-holding. It can get stuck on something obvious, but at least it won't run off on it's own and it'll complete everything as well as it can, even if it takes 30 minutes.
Qwen Coder seems outright unusable and haven't been able to use it for anything good at all.
Tried AMP for a while as well, nice UI and model seems good, but too expensive (and I say this as someone who currently gives $200/month to OpenAI).
At the same time, even the big versions of Qwen3 Coder (480B) regularly mess up file paths and use the wrong path separators, leading to files like srccomponentsMyComponent.vue from being created instead of src/components/MyComponent.vue.
> And it still puts code comments nearly everywhere, it drives me nuts.
I’ve had the issue of various models sometimes inserting comments like “// removed Foo” when it makes no sense to indicate the absence of something that’s not necessary there for a code block that isn’t there.
At the same time, sometimes the LLMs love to eat my comments when doing changes and leave behind only the code.
How silly (and annoying). It’s good to be able to try out multiple models with the exact same prompts though, maybe I should create my own custom mode for RooCode with all of the important stuff I want baked in.
Of course my experience is anecdotal, but we hardly have any decent benchmarks to compare these models. I suspect most benchmarks have leaked into training sets, rendering them useless anyway.
Not to mention not all models/inference works the same way so you can't really replicate the same experience. For example, new Harmony format means you can now inject messages while GPT-OSS is running inference, but obviously Claude Code don't support that because their models don't support that.
This is a garbage state of affairs though
It's like saying official car repair shops should repair any type of car, not just their brand. That's just not how the real world works.
Unfortunately, nearly all the foundation model companies are just wasting their efforts on the clients, which are kind of ass, instead of focusing on the model.
Google would be much better off if they ditch their dogshit cli, and allow us to have the generous quota login off any client.
Gemini 3.0 is likely to be released soon, and likely they would improve agentic coding experience.
GPT-5 insisted on using bash commands to edit a file, despite the dedicated tool for doing this. Problem was that the bash tool it used wrapped at 80 chars, splitting some strings between lines, which then broke the code at a syntax level. It was never able to recover, I was not impressed with GPT-5
If not, the model is just shooting in the dark and guessing.
Terminal serializer code: https://github.com/google-gemini/gemini-cli/blob/main/packag...
Uses @xterm/headless npm package.
It's a choice some teams make, presumably because _they_ see value in it (or at least think they will). The team I'm on has particular practices which I'm sure would not work on other teams, and might cause you to look at them with the same incredulity, but they work for us.
For what it's worth, the prefixes you use as examples do arise from a convention with an actual spec:
https://www.conventionalcommits.org/en/v1.0.0/
The main reason this exists is because Angular was doing it to generate their changelogs from it. Which makes sense, but outside of that context it doesn't feel fully baked.
I usually see junior devs make such commits, but at the same time they leave the actual commit message body completely empty and don't even include a reference to the ticket they're working on.
But I'm also not a fan of this being an enforced convention because somebody higher up decided he/she it brings some value and now it's the 101st convention a new dev has to follow which actually reduces productivity.
LLM wrote this article it seems.
For me Gemini CLI is not as good as Claude Code and sometimes writes more code than necessary and makes it hard to maintain. but hope it gets there with gemini 3.0 release. It's open source so I can imagine it getting there faster with community contributions.
On the other hand, now that I’ve read this, I can see how having some hooks between the code agent CLIs and ghostty/etc could be extremely powerful.
I always imagined they'd have an easier time if they could start a vim instance and send search/movement/insert commands instead, not having to keep track of numbers and do calculations, but instead visually inspect the right thing happening.
I haven't tried this new feature yet, but that was the first thing that came to mind when seeing it, it might be easier for LLMs to do edits this way.
Of all my issues with Gemini CLI (and there are many), this addresses none of them. This is a fascinating product management prioritization decision. It makes me wonder if the people who build Gemini CLI actually use Gemini CLI for real work. Because I would think that if they did, they would surely have prioritized other things.
My personal biggest issue with Gemini CLI, which is a deal breaker if I have a say in the tooling I'm using, is that if you hit a per-minute rate limit (meaning it will be resolved in a few seconds) your session is forcefully and permanently switched over to using Flash and there is nothing you can do other than manually quit and restart to get back to using Pro 2.5. The status footer line will even continue to lie to you about what model you are using. I would genuinely like to understand the use cases for which this is desirable behavior. But even IF those use cases do exist, what is the harm or difficulty in giving an option to override this behavior? These models are not interchangeable. GitHub issues have been opened for months, some even with PRs attached, with no action from Google.
For comparison, Claude Code handles this situation with a simple exponential back off until the request succeeds. That's what I want, ESPECIALLY in a CLI agent that may be running headlessly in a pipeline.
Your complaint is likely a product design decision rather than a engineering capacity prioritization one. As you've noted the fix is pretty trivial. I imagine that some designer or product person is intentionally holding this back for one reason or another
In Claude Code, when you edit a file independent from the agent, it automatically notices and says something like "I see you've made a change. Let me take a look."
I wish Gemini CLI would've taken a similar approach, since it seems to fit better with a CLI and its associated Unix philosophy.
That’s it.
It’s not a “product”, it’s a keeping-up-with-the-Joneses checklist item.
There isn't a drive to actually cater to users, it's a selfish endeavor which sometimes aligns with what users want. So the game is feature pack so you can leverage it for jumping ship or spring boarding internally.
It's the absolutely worst aspect of google, and I think its something worth dumping Sundar over, in order to get in a leader that will unify goals and get people who want to make great products, not great window dressings for themselves.
> feat(shell): enable interactive commands with virtual terminal
1 in 3 times I used it in past 2 months, it failed for really odd reasons, sometimes the node app just exception quit, sometimes gemini stuck and blame itself and gave up. same task I throw to cc and codex, they nailed without a blink...
I guess for Google this will be a treasure trove of real developer interactions to train on.
I might try this once Gemini 3 comes out. Until then, if you're running tmux or zellij, this seems like a worse user experience since you're in a subwindow and have less screen real estate to work with.
It looks like they've added a layer on top of node-pty to allow serializing/streaming of the contents to the terminal within the mini-terminal viewports they're allocating for the terminal rendering. I wonder if they're releasing that portion as open source?
Maybe I'm not getting it right, but it seems there are two competing paradigms which certainly with llms coding for llms, who cares. </rant>
What the heck is going on in Google-land?
but https://geminicli.com/docs/tools/shell/#enabling-interactive... > To enable interactive commands, you need to set the tools.shell.enableInteractiveShell setting to true.
Seems contradictory. I can't get it to work in either case .
It's seemingly very hard to understand how it should be configured at all if you don't have a personal Google account. Rather than just using your credentials to login and start, you need to find some forum posts of people that have reversed engineered that you need to use a Google Cloud environment variable, even if you are operating without a "Code Assist License" on a Google Business account.
No matter what I do on my paid subscription through Google Business with a Google Cloud project provided in the environment configured, which I had to explicitly set up just to test the CLI even though I have access to the Models through my subscription and AI Studio, I always get error 429 after one to five messages. The limits that Google claim on Gemini seem to be just a fraction of what is claimed in my case, No clearly stated reason as to why, not in the cloud console and not when using the tool itself, except for the HTTP error message.
These are not big prompts or anything of that nature. It's simple things like review a readme file or double check a single file for errors. It's been like this from the very beginning.
Even now just to verify it, I havent used Gemini for over a week, I ask it to review 3 files that are in git diff, the files are between 50-100 lines long, after checking the first file it's already on 429, on a PAID subscription, and it even states "99%" context left. So my paid subscription lets me use less than 1% of the context window and I get locked out for a unknown amount of time.
Contrasting this to both Codex and Claue Code, where you just log in and go, it's really a night and day difference. The user experience of the paid version of Gemini CLI is just utterly terrible.
In a world where you have 100 options, trust is of utmost importance. The CLI’s integration with node‑pty and the ability to stream pseudo‑tty output into mini‑terminal viewports is clever, and I’d love to see that layer documented or open‑sourced so other tools can build on it. I see this feature as something you’d use for short‑lived tasks like running a quick script, checking a log, or doing a one‑off database query. For longer editing sessions I’d still use a real terminal multiplexer and editor. If Google can fix the reliability issues and make the API for interactive sessions open, that would be hella good for everyone!
And really, ctrl-f? Do these devs not use the terminal at all?
To be honest, at this point having Claude Code monitor the output of a `tmux pipe-pane` is probably going to be superior.