Love C, Hate C: Web Framework Memory Problems
Posted3 months agoActive3 months ago
alew.isTechstoryHigh profile
heatednegative
Debate
80/100
C ProgrammingWeb FrameworkMemory Safety
Key topics
C Programming
Web Framework
Memory Safety
The post critiques a C web framework for its memory management issues, sparking a discussion on the challenges of writing safe and reliable C code, particularly when using AI-assisted coding tools.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
67
0-12h
Avg / period
20
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 9, 2025 at 11:39 PM EDT
3 months ago
Step 01 - 02First comment
Oct 10, 2025 at 1:44 AM EDT
2h after posting
Step 02 - 03Peak activity
67 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 17, 2025 at 1:53 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45535183Type: storyLast synced: 11/20/2025, 7:40:50 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
As a learning exercise it is useful, but it should never see production use. What is interesting is that the apparent cleanliness of the code (it reads very well) is obscuring the fact that the quality is actually quite low.
If anything I think the conclusion should be that AI+novice does not create anything that is useable without expert review and that that probably adds up to a net negative other than that the novice will (hopefully) learn something. It would be great if someone could put in the time to do a full review of the code, I have just read through it casually and already picked up a couple of problems, I'm pretty sure that if you did a thorough job of it there would be many more.
I think this is a general feature and one of the greatest advantages of C. It's simple, and it reads well. Modern C++ and Rust are just horrible to look at.
I don't remember any other language's proponents actively attacking the users of other programming language.
I just saw someone on Hacker News saying that Rust was a bad language because of its users
I have noticed my fair share of Rust Derangement Syndrome in C++ spaces that seems completely outsized from the series of microaggressions that they eventually point out when asked "Why?"
I second this; for a few years it was impossible to have any sort of discussion on various programming places when the topic was C: the conversation would get quickly derailed with accusations of "dinosaur", etc.
Things have gone quiet recently (last three years, though) and there have been much fewer derailments.
What seems to have changed in recent years is the buy-in from corporations that seemingly see value in its promises of safety. This seems to be paired with a general pulling back of corporate support from the C++ world as well as a general recession of fresh faces, a change that at least from the sidelines seems to be mostly down to a series of standards committee own-goals.
I like the safety promise of Rust. But the complicated interop story with C and C++ hurt it a lot. I mean, in a typical codebase, what proportion of bugs will be memory-safety related vs other reasons? Ideally, we could just wrap the safety-critical bits in a memory-safe wrapper and continue to use C and C++ for everything else.
It's very strange to witness. Annoying advocacy of languages is nothing new. C++ was at one point one of those languages, then it was Java, then Python, then Node.js. I feel like if anything, Rust was a victim of a period of increased polarization on social media, which blew what might have been previously seen as simple microaggressions completely out of proportion.
These days Go/Zig/Nim/C#/Java/Python/JS and other languages are fast enough for most use cases.
And Rust learning curve doesn't help either. C++ was basically C with OOP on steroids. Rust is very different.
I say that because I wouldn't group Rust opposition with any of those languages you cited. It's different for mostly different reasons and magnitudes.
- At the time, with a few minor differences, C++ was Typescript for C, thus very easy to adopt into existing projects
- Being born on the same birthplace as C and UNIX, meant all C compiler vendors saw as added value to have it as part of their offering, and it was natural that every UNIX SDK also had C++ support available alongside C.
- Apple, Metrowerks, IBM, Borland and Microsoft helped to push C++ adoption, by making it the official way to use application frameworks. MacApp (originally in Object Pascal), PowerPlant, CSet++, Turbo Vision/OWL/VCL, and MFC respectively.
This kept C++ as the language to go for performance in enterprise computing, while Delphi and VB got the "easy" development role, until Java and .NET took over all those frameworks.
Rust doesn't have this kind of industry wide push, even in OSes where it is being embraced like Windows and Android, note that it isn't being pushed as yet another way to write userspace applications, rather low level OS services.
This seems apropos in a world where C++ has been bleeding userspace buy-in for longer than I've been professionally programming.
I started learning Rust a few months ago in an attempt to teach an old dog new tricks, and while it's quite pleasant as far as it went, I can think of several classes of programs that I would be reluctant to use the language for. But I wouldn't dream of using C++ for those types of programs either.
There are rumors floating around that Microsoft is rolling their own rustc-codegen-gcc paired with their C2 codegen backend. Don't know what to make of those rumors, but it helped to reassure me to feel like the time I invested thus far hasn't been wasted.
"From Blue Screens to Orange Crabs: Microsoft's Rusty Revolution"
https://www.youtube.com/watch?v=uDtMuS7BExE
That is very different to my memories of the past decade+ of working on Go.
Almost every single language decision they eventually caved on that I can think of (internal packages, vendoring, error wrapping, versioning, generics) was preceded by months if not years of arguing that it wasn't necessary, often followed by an implementation attempt that seems to be ever so slightly off just out of spite.
Let's don't forget that the original Go 1.0 expected every project's main branch to maintain backward compatibility forever or else downstreams would break, and without hacks (which eventually became vendoring) you could not build anything without an internet connection.
To be clear, I use Go (and C... and Rust) and I do like it on the whole (despite and for its flaws) but I don't think the Go authors are that different to the Rust authors. There are (unfortunately) more fanatics in the Rust community but I think there's also a degree to which some people see anything Rust-related as being an attack on other projects regardless of whether the Rust authors intended it to be that way.
And this goes for almost all programming languages. Each and every one of them has warts and issues with syntax and expressiveness. That holds true even for the most advanced languages in the field, Haskell, Erlang, Lisp and more so for languages that were originally designed for 'readability'. Programming is by its very nature more akin to solving a puzzle than to describing something. The puzzle is to how to get the machine to do something, to do it correctly, to do it safely and to do it efficiently, and all of those while satisfying the constraint of how much time you are prepared (or allowed) to spend on it. Picking the 'right' language will always be a compromise on some of these, there is no programming language that is perfect (or even just 'the best' or 'suitable') for all tasks, and there are no programming languages that are better than any other for any subset of all tasks until 'tasks' is a very low number.
For example, the problem with Vec<Vec<T>> for a 2D array is not that one is not used to it, but that the syntax is just badly designed. Not that C would not have problematic syntax, but I still think it is fairly good in comparison.
With that `char msg[static 1]` you're telling the compiler that `msg` can't possibly be NULL, which means it will optimize away any NULL check you put in the function. But it will still happily call it with a pointer that could be NULL, with no warnings whatsoever.
The end result is that with an "unsafe" `char *msg`, you can at least handle the case of `msg` being NULL. With the "safe" `char msg[static 1]` there's nothing you can do -- if you receive NULL, you're screwed, no way of guarding against it.
For a demonstration, see[1]. Both gcc and clang are passed `-Wall -Wextra`. Note that the NULL check is removed in the "safe" version (check the assembly). See also the gcc warning about the "useless" NULL check ("'nonnull' argument 'p' compared to NULL"), and worse, the lack of warnings in clang. And finally, note that neither gcc or clang warn about the call to the "safe" function with a pointer that could be NULL.
[1] https://godbolt.org/z/qz6cYPY73
Yup, and I don't even need to check your godbolt link - I've had this happen to me once. It's the implicit casting that makes it a problem. You cannot even typedef it away as a new type (the casting still happens).
The real solution is to create and use opaque types. In this case, wrapping the `char[1]` in a struct would almost certainly generate compilation errors if any caller passed the wrong thing in the `char[1]` field.
It is as if just pointing this out already antagonizes people.
Ignoring what happened since 1958 (JOVIAL being a first attempt), and thus all its failings are excused because it was discovering the world.
And yet, you can't go a day without someone declaring that now is the time to do it right, this time it will be different. And then they proceed to do one thing after another for which the outcome is already known, just not to them. I think the best way to teach would be to start off with a fairly detailed history of what had gone before, just to give people a map and some basic awareness of the degree to which things have already been done, rather than to find new and interesting ways to shoot themselves in the foot (again).
With ignorance comes arrogance of an individualist intellectual, thinking their unique revolutionary contribution will wow the public and move the field forward. Except inevitably they're not only reinventing the wheel but the entire automobile, without knowing basic principles and the work of predecessors. It has a lot in common with modern art.
> we're teaching this whole discipline wrong
I sometimes think languages after C, like C++ and Java, were misguided in some ways. Sure they provided business value, brought new ideas, and the software worked - but their popularity came at a cost of leaving countless great thoughts behind in history, and resulted in a poverty of software culture, education and imagination.
There are optimistic signs of people returning to the roots, re-learning the lessons and re-discovering ideas. I think many are coming to realize the need for a reformation of sorts.
[1] https://github.com/lifthrasiir/wah/blob/main/wah.h
One good defense is to reduce your scope continuously. The smaller you make your scope the smaller the chances of something escaping your attention. Stay away from globals and global data structures. Make it impossible to inspect the contents of a box without going through a well defined interface. Use assertions liberally. Avoid fault propagation, abort immediately when something is out of the expected range.
But the lack of a good string library is by itself responsible for a very large number of production issues, as is the lack of foresight regarding de-referencing pointers that are no longer valid. Lack of guardrails seems to translate in 'do what you want' not necessarily 'build guard rails at the right level for you', most projects simply don't bother with guardrails at all.
Rust tries to address a lot of these issues, but it does so by tossing out a lot of the good stuff as well and introducing a whole pile of new issues and concepts that I'm not sure are an improvement over what was there before. This creates a take-it-or-leave it situation, and a barrier to entry. I would have loved to see that guard rails concept extended to the tooling in the form of compile time flags resulting in either compile time flagging of risky practices (there is some of this now, but I still think it is too little) and runtime errors for clear violations.
The temptation to 'start over' is always there, I think C with all of its warts and shortcomings is not the best language for a new programmer to start with if they want to do low level work. At the same time, I would - still, maybe that will change - hesitate to advocate for rust, it is a massive learning curve compared to the kind of appeal that C has for a novice. I'd probably recommend Go or Java over both C and rust if you're into imperative code and want to do low level work. For functional programming I'd recommend Erlang (if only because of the very long term view of the people that build it) or Clojure, though the latter seems to be on its retour.
In another comment recently I opined that C projects, initiated in 2025, are likely to be much more secure than the same project written in Python/PHP (etc).
This is because the only people choosing C in 2025 are those who have been using it already for decades, have internalised the handful of footguns via actual experience and have a set of strategies for minimising those footguns, all shaped with decades of experience working around that tiny handful of footguns.[1]
Sadly, this project has rendered my opinion wrong - it's a project initiated in 2025, in C, that was obviously done by an LLM, and thus is filled with footguns and over-engineering.
============
[1] I also have a set of strategies for dealing with the footguns; I would gues if we sat down together and compared notes our strategies would have more in common than they would differ.
I do think that LLM C code if made with great testing tooling in concert has great promise.
How are you doing your fuzzing? You need either valgrind (or compiler sanitiser flags) in the loop for a decent level of confidence.
I have an issue with high strung opinions like this. I wrote plenty of crappy delphi code while learning the language that saw production use and made a living from it.
Sure, it wasn't the best experience for users, it took years to iron out all the bugs and there was plenty of frustration during the support phase (mostly null pointer exceptions and db locks in gui).
But nobody would be better off now if that code never saw production use. A lot of business was built around it.
I guess the question at spotlight is: At what point would your custom server's buffer overflow when reading a header matter and would that bug even exist at that point?
Could a determined hacker get to your server without even knowing what weird software you cooked up and how to exploit your binary?
We have a lot of success stories born from bad code. I mean look at Micro$oft.
Look at all the big players like discord leaking user credentials. Why would you still call out the little fish?
Maybe I should create a form for all these ahah.
Yes.
That may well be because this isn't your field?
There are lots of ways the server could leak information about its internal state, and exploits have absolutely been implemented in the past based only on what was visible remotely.
Once upon a time, you could put up a relatively vulnerable server, and unless you got a ton of traffic, there weren't too many things that would attack it. Nowadays, pretty much anything Internet facing will get a constant stream of probes. Putting up a server requires a stricter mindset than it used to.
I suppose I was just surprised to find this code promoted in my feed when it's not up to snuff. And I'm not hating, I do in fact love the project idea.
Trust me I love C. Probably over 90% of my lifetime code has been written in C. But python newbies don't get their web frameworks stack smashed. That's kind of nice.
Hah! True :-)
The thing is, smashed stacks are difficult to exploit deterministically or automatically. Even heartbleed, as widespread as it was, was not a guaranteed RCE.
OTOH, an exploit in a language like Python is almost certainly going to be easier to exploit deterministically. Log4j, for example, was a guaranteed exploit and the skill level required was basically "Create a Java object".
This is because of the ease with which even very junior programmers can create something that appears to run and work and not crash.
That’s like driving without a seatbelt - it’s not safe, but it would only matter on that very rare chance you have a crash. I would rather just wear a seatbelt!
This is also the reason why AI will not replace any actual jobs with merit.
Books still exist, be they in print or electronic form.
(interactive labs + quizzes) > Learning from books
Good online documentation > 5yr old tome on bookshelf
chat/search with ai > CTRL+F in a PDF manual
EDIT:
An exemplar to consider is how the Actor Model[0] can be used to define a FaaS[1]-based system. Without being aware of this paper, it is unrealistic to expect someone to be able to formulate LLM prompts incorporating concepts identified by same.
Side note: the Actor Model[0] paper is far older than a "5yr old tome" and is very much applicable to this day.
0 - https://dspace.mit.edu/bitstream/handle/1721.1/41962/AI_WP_1...
1 - https://en.wikipedia.org/wiki/Function_as_a_service
Hypertext is better than printed book format, but if you’re just starting with something you need a guide that provides a coherent overview. Also most online documentation are just bad.
Why ctrl+f? You can still have a table of contents and an index with pdf. And the pdf formats support link. And I’d prefer filtering/querying over generation because the latter is always tainted by my prompt. If I type `man unknown_function`, I will get an error, not a generated manual page.
No they are not, as examples lack explanation of the concepts underlying a programming language's definition.
> ... and we now have a machine to produce infinite examples tailored specifically to any situation
This is like saying, "to learn X language, just read a bunch of source in GitHub repositories that use it."
What books written by authoritative people provide, such as language designers or recognized luminaries, is conveyance of key linguistic concepts and an explanation of "the why" they are important. This is the sole purvey of people.
There are good reasons for this choice in C (and C++) due to broken integer promotion and casting rules.
See: "Subscripts and sizes should be signed" (Bjarne Stroustrup) https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0...
As a nice bonus, it means that ubsan traps on overflow (unsigned overflows just wrap).
The reason you should make length signed is that you can use the sanitizer to find or mitigate overflow as you correctly observe, while unsigned wraparound leads to bugs which are basically impossible to find. But this has nothing to do with integer promotion and wraparound bugs can also create bugs in - say - Rust.
I don’t mean to be disrespectful, but this cavalier attitude towards it reads like vaccine skepticism to me. It is not serious.
Programming can be inconsequential, but it can also be national security. I know which engineers I would trust with the latter, and they aren’t the kind who believe that discipline is “enough”.
That includes being honest about the actual costs of software when you don’t YOLO the details. Zero UB is table stakes now - it didn’t use to be, but we don’t live in that world anymore.
It’s totally fine to use C or whatever language for it, but you are absolutely kidding yourself if you think the cost is less than at least an order of magnitude higher than the equivalent code written in Rust, C#, or any other language that helps you avoid these bugs. Rust even lets you get there at zero performance cost, so we’re down to petty squabbles about syntax or culture - not serious.
I agree. For me that means: software engineering should start taking the same attitude to writing software that structural engineers bring to the table when they talk about bridges, buildings and other structures that will have people's lives depending on them. I'm not sure how we're going to make rings out of bits but we need to realize - continuously - that the price of failure is often paid in blood, or in the best case with financial loss and usually not by us. And in turn we should be enabled to impose that same ethic on management, because more often than not that's the root cause of the problem.
> That includes being honest about the actual costs of software when you don’t YOLO the details.
Does that include development cost?
Maintenance costs?
Or just secondary costs?
Why the focus on costs?
> Zero UB is table stakes now - it didn’t use to be, but we don’t live in that world anymore.
This is because 'Rust and C# exist'? Or is it because Java, Erlang, Visual Basic, Lisp etc exist?
> It’s totally fine to use C or whatever language for it, but you are absolutely kidding yourself if you think the cost is less than at least an order of magnitude higher than the equivalent code written in Rust, C#, or any other language that helps you avoid these bugs.
We were talking about responsibility first, and that goes well beyond just measuring 'cost'. The mistake in bringing cost into it is that cost is a business concept that is used to justify picking a particular technology over another. And just like security is an expense that doesn't show anything on the balance sheet if it works besides that it cost money the same goes for picking a programming language eco-system.
So I think focusing on cost is a mistake. That just allows the bookkeeper to make the call and that call will often be the wrong one.
> Rust even lets you get there at zero performance cost, so we’re down to petty squabbles about syntax or culture - not serious.
The debate goes a lot further than that. You have millions of people that are writing software every day that are not familiar with Rust. To get them to pick a managed language over what they are used to is going to take a lot of convincing.
It starts of with ethics, and I don't think it should start off with picking a favorite language. You educate, show by example and you deliver at or below the same cost that those other eco-systems do and then you slowly eat the world because your projects are delivered on-time, with provably lower real world defects and hopefully at a lower cost.
And then I really couldn't care what language was picked, in the rust world that translates into 'anything but C' because that is perceived to be the enemy somehow, which is strange because there are many alternatives to rust that are perfectly suitable, have much higher mind share already.
C is - even today - at 10x the popularity that rust is, it will take a massive amount of resources to switch those people over, and likely it will take more than one generation. In the meantime all of the C code in the world will have to be maintained, which means there is massive job security for people learning C. For people learning rust to the exclusion of learning C that situation is far worse. This needs to be solved.
These are not 'petty squabbles' about syntax or culture. They are the harsh reality of the software development world at large, which has seen massive projects deployed at scale developed with those really bad languages full of undefined behavior (well, that's at least one thing that Assembly Language has going for it, as long as the CPU does what it says in the book undefined behavior doesn't exist). People are going to point at that and say 'good enough'. And they see all those memory overflows, CVEs etc as a given, and they realize that in spite of all of those the main vector for security issues is people, and configuration mistakes not so much the software itself.
This is not ideal, obviously, but C, like any bad habit, is very hard to dislodge if your main argument is 'you should drop this tool because mine is better'. Then you need to show that your tool is better, so much better that it negates the cost to switch. And that's a very tall order, for any programming language, much more so for one that is struggling for adoption in the first place.
> This is because 'Rust and C# exist'? Or is it because Java, Erlang, Visual Basic, Lisp etc exist?
Things have changed for three important reasons: (1) C/C++ compilers have evolved, and UB is significantly more catastrophic than it was in the 90s and early 00s. (2) As societies digitize, the stakes are higher than even - leaking personal data has huge legal and moral consequences, and system outages can have business-killing financial consequences. (3) There are actual, viable alternatives - GC is no longer a requirement for memory safety.
> To get them to pick a managed language over what they are used to is going to take a lot of convincing.
Perhaps you didn't mean to say so, but Rust is not a managed language (that's a .NET term referring to C#, F#, etc.).
Me and other Rust users are obviously trying to convince even more people to use the language, and that's because we are having a great time over here. It's a very pleasant language with a pleasant community and a high level of technical expertise, and it allows me to get significantly closer to living up to my own ideals. I'm not making a moral argument here, trying to say that you or anyone is a bad person for not using Rust, but I am making a moral argument saying that denying the huge cost and risk associated with developing software in C and C++ is bullshit.
> And then I really couldn't care what language was picked, in the rust world that translates into 'anything but C' because that is perceived to be the enemy somehow, which is strange because there are many alternatives to rust that are perfectly suitable, have much higher mind share already.
The point here is that, until Rust came along, you had the choice between wildly risky (but fast) C and C++ code, or completely safe (but slow) garbage collected languages with heavy runtimes and significant deployment challenges.
C is certainly not "the enemy" - I never said that, and I wouldn't. But that old world is gone. The excuse of picking risky, problem-riddled languages that we know are associated with extreme costs for reasons of performance no longer has any technical merit. There can be other reasons, but this isn't it.
> C is - even today - at 10x the popularity that rust is, it will take a massive amount of resources to switch those people over [...]
It's insane to me that anyone would limit themselves to a single language. Every competent programmer I know knows at least a handful. Why are we worried about this? I'm a decent C programmer, and a very good C++ programmer - better at both because I'm also fairly good at Rust.
> And they see all those memory overflows, CVEs etc as a given, and they realize that in spite of all of those the main vector for security issues is people, and configuration mistakes not so much the software itself.
"Pobody's nerfect." I'm sorry, I really dislike this attitude. We can't let the fact that security is hard, or that perfection is unattainable, be an excuse to deliver more crap.
> This is not ideal, obviously, but C, like any bad habit, is very hard to dislodge if your main argument is 'you should drop this tool because mine is better'
Again, that's not my argument. My argument is that you should be honest about what the actual costs, or alternatively the actual quality.
The answer to that though is probably more something like Zig than something like Rust.
> Cut-resistant gloves are an essential piece of safety equipment in any kitchen.
https://www.restaurantware.com/blogs/smallwares/how-to-choos...
Where are C's gloves?
Of course, if you consistently treat unsigned wraparound as a bug in your code, you can also use a sanitizer to screen for it. But in general I find it more practical to use signed integers for everything except for modular arithmetic where I use unsigned (and where wraparound is then expected and not a bug)
The issues really arise when you mix signed/unsigned arithmetic and end up promoting everything to signed unexpectedly. That's usually "okay", as long as you're not doing arithmetic on anything smaller than an int.
As an aside, if you like C enough to have opinions on promotion rules then you might enjoy the programming language Zig. It's around the same level as C, but with much nicer ergonomics, and overflow traps by default in Debug/ReleaseSafe optimization modes. If you want explicit two's complement overflow it has +%, *% and -% variants of the usual arithmetic operations, as well as saturating +|, *|, -| variants that clamp to [minInt(T), maxInt(T)].
EDIT to the aside: it's also true if you hate C enough to have opinions on promotion rules.
The "promoting unexpectedly" is something I do not think happens if you know C well. At least, I can't remember ever having a bug because of this. In most cases the promotion prevents you from having a bug, because you do not get unexpected overflow or wraparound because your type is too small.
Mixing signed and unsigned is problematic, but I see issues mostly in code from people who think they need to use unsigned when they shouldn't because they heard signed integers are dangerous. Recently I saw somebody "upgrading" a C code basis to C++ and also changing all loop variables to size_t. This caused a bug which he blamed on working on the "legacy C code" he is working on, although the original code was just fine. In general, there are compiler warnings that should catch issues with sign for conversions.
I had the same experience about 10 years back when a colleague "upgrade" code from using size_t to `int`; on that platform (ATMEGA or XMEGA, not too sure now) `int` was too small, overflowed and bad stuff happened in the field.
The only takeaway is "don't needlessly change the size and sign of existing integer variables".
What?
unsigned sizes are way easier to check, you just need one invariant:
if(x < capacity) // good to go
Always works, regardless how x is calculated and you never have to worry about undefined behavior when computing x. And the same invariant is used for forward and backward loops - some people bring up i >= 0 as a problem with unsigned, but that's because you should use i < n for backward loops as well, The One True Invariant.
We can argue til we're blue in the face that people should just not make any mistakes, but history is against us - People will always make mistakes.
That's why surgeons are supposed to follow checklists and count their sponges in and out
Actually, unchecked math on an integer is going to be bad regardless of whether it's signed or unsigned. The difference is that with signed integers, your sanity check is simple and always the same and requires no thought for edge cases: `if(index < 0 || index > max)`. Plus ubsan, as mentioned above.
My policy is: Always use signed, unless you have a specific reason to use unsigned (such as memory addresses).
Wait, what? How is that easier than `if (index > max)`?
Or if index is counting down, a calculated index could silently wrap around and cause the same issue.
And if both are calculated and wrap around, you'll have fun debugging spooky action at a distance!
If both are signed, that won't happen. You probably do have a bug if max or index is calculated to a negative value, but it's likely not an exploitable one.
And what about escapes?
That kills any non-allocation dreams. Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate. If source is read-only you have to allocate. If source is mutable you have to waste CPU to rewrite the string.
Depends on what you are doing with it. If you aren't displaying it (and typically you are not in a server application), you don't need to unescape it.
And usually if you want maximum performance, buffered read is the way to go, which means you need a write slab allocation.
Where did that allocation happen? You can write into the buffer you're reading from, because the replacement data is shorter than the original data.
Even if we pretend that the read buffer is not allocating (plausible), you will have to allocate for the write source for the general case (think GiB or TiB of XML or JSON).
The "somewhere you have to write to" is the same buffer you are reading from.
Writing to it would be pointless because clears obliterate anything written; or inefficient because you are somehow offsetting clears, which would sabotage the buffered reading performance gains.
Same goes for other characters such as \n, \0, \t, \r, etc. All half in native byte representation.
The voice of experience appears. Upvoted.
It is conceivable to deal with escaping in-place, and thus remain zero-alloc. It's hideous to think about, but I'll bet someone has done it. Dreams are powerful things.
However, for many applications, it will be better to use a binary format (or in some cases, a different text format) rather than JSON or XML.
(For the PostScript binary format, there is no escaping, and the structure does not need to be parsed and converted ahead of time; items in an array are consecutive and fixed size, and data it references (strings and other arrays) is given by an offset, so you can avoid most of the parsing. However, note that key/value lists in PostScript binary format is nonstandard (even though PostScript does have that type, it does not have a standard representation in the binary object format), and that PostScript has a better string type than JavaScript but a worse numeric type than JavaScript.)
The parser is then able to go through the JSON, and initialize the struct directly, as if you had reflection in the language. It'll validate the types as well. All this without having to allocate a node type, perform copies, or things like that.
This approach has its limitations, but it's pretty efficient -- and safe!
Someone wrote a nice blog post about (and even a video) it a while back: https://blog.golioth.io/how-to-parse-json-data-in-zephyr/
The opposite is true, too -- you can use the same descriptor to serialize a struct back to JSON.
I've been maintaining it outside Zephyr for a while, although with different constraints (I'm not using it for an embedded system where memory is golden): https://github.com/lpereira/lwan/blob/master/src/samples/tec...
If the source JSON/XML is in a writeable buffer, with some helper functions you can do it. I've done it for a few small-memory systems.
Ergonomically, it's pretty much the same as parsing the JSON into some AST first, and then working on the AST. And it can be much faster than dumb parsers that use malloc for individual AST elements.
You can even do JSON path queries on top of this, without allocations.
Eg. https://xff.cz/git/megatools/tree/lib/sjson.c
Not sure why many people seem fixated on the idea that using a programming language must follow a particular approach. You can do minimal alloc Java, you can simulate OOP-like in C, etc.
Unconventional, but why do we need to restrict certain optimizations (space/time perf, "readability", conciseness, etc) to only a particular language?
In Java, you don't care because the GC cleans after you and you don't usually care about millisecond-grade performance.
GP didn't say "zero-alloc", but "minimal alloc"
> Why should "nice" javaesque make little sense in C?
There's little to no indirection in idiomatic C compared with idiomatic Java.
Of course, in both languages you can write unidiomatically, but that is a great way to ensure that bugs get in and never get out.
It sounds weird. If I write Python code with minimal side effects like in Haskell, wouldn't it at least reduce the possibility of side-effect bugs even though it wasn't "Pythonic"?
AFAIK, nothing in the language standard mentions anything about "idiomatic" or "this is the only correct way to use X". The definition of "idiomatic X" is not as clear-cut and well-defined as you might think.
I agree there's a risk with an unidiomatic approach. Irresponsibly applying "cool new things" is a good way to destroy "readability" while gaining almost nothing.
Anyway, my point is that there's no single definition of "good" that covers everything, and "idiomatic" is just whatever convention a particular community is used to.
There's nothing wrong with applying an "unidiomatic" mindset like awareness of stack/heap alloc, CPU cache lines, SIMD, static/dynamic dispatch, etc in languages like Java, Python, or whatever.
There's nothing wrong either with borrowing ideas like (Haskell) functor, hierarchical namespaces, visibility modifiers, borrow checking, dynamic dispatch, etc in C.
Whether it's "good" or not is left as an exercise for the reader.
Because when you stray from idioms you're going off down unfamiliar paths. All languages have better support for specific idioms. Trying to pound a square peg into a round hole can work, but is unlikely to work well.
> You're basically saying an unidiomatic approach is doomed to introduce bugs and will never reduce them.
Well, yes. Who's going to reduce them? Where are you planning to find people who are used to code written in an unusual manner?
By definition alone, code is written for humans to read. If you're writing it in a way that's difficult for humans to read, then of course the bug level can only go up and not down.
> It sounds weird. If I write Python code with minimal side effects like in Haskell, wouldn't it at least reduce the possibility of side-effect bugs even though it wasn't "Pythonic"?
"Pythonic" does not mean the same thing as "Idiomatic code in Python".
I started writing sort of a style guide to C a while ago, which attempts to transfer ideas like this one more by example:
https://github.com/codr7/hacktical-c
But we use different approaches for different languages because those languages are designed for that approach. You can do OOP in C and you can do manual memory management in C#. Most people don't because it's unnecessarily difficult to use languages in a way they aren't designed for. Plus when you re-invent a wheel like "classes" you will inevitably introduce a bug you wouldn't have if you'd used a language with proper support for that construct. You can use a hammer to pull out a screw, but you'd do a much better job if you used a screwdriver instead.
Programming languages are not all created equal and are absolutely not interchangeable. A language is much, much more than the text and grammar. The entire reason we have different languages is because we needed different ways to express certain classes of problems and constructs that go way beyond textual representation.
For example, in a strictly typed OOP language like C#, classes are hideously complex under the hood. Miles and miles of code to handle vtables, inheritance, polymorphism, virtual, abstract functions and fields. To implement this in C would require effort far beyond what any single programmer can produce in a reasonable time. Similarly, I'm sure one could force JavaScript to use a very strict typing and generics system like C#, but again the effort would be enormous and guaranteed to have many bugs.
We use different languages in different ways because they're different and work differently. You're asking why everyone twists their screwdrivers into screws instead of using the back end to pound a nail. Different tools, different uses.
Very importantly, because Java is tracking the memory.
In java, you could create an item, send it into a queue to be processed concurrently, but then also deal with that item where you created it. That creates a huge problem in C because the question becomes "who frees that item"?
In java, you don't care. The freeing is done automatically when nobody references the item.
In C, it's a big headache. The concurrent consumer can't free the memory because the producer might not be done with it. And the producer can't free the memory because the consumer might not have ran yet. In idiomatic java, you just have to make sure your queue is safe to use concurrently. The right thing to do in C would be to restructure things to ensure the item isn't used before it's handed off to the queue or that you send a copy of the item into the queue so the question of "who frees this" is straight forward. You can do both approaches in java, but why would you? If the item is immutable there's no harm in simply sharing the reference with 100 things and moving forward.
In C++ and Rust, you'd likely wrap that item in some sort of atomic reference counted structure.
I've upvoted you, but I'm not so sure I agree though.
Sure, each allocation imposes a new obligation to track that allocation, but on the downside, passing around already-allocated blocks imposes a new burden for each call to ensure that the callees have the correct permissions (modify it, reallocate it, free it, etc).
If you're doing any sort of concurrency this can be hard to track - sometimes it's easier to simply allocate a new block and give it to the callee, and then the caller can forget all about it (callee then has the obligation to free it).
Short-run programs are even easier. You just never deallocate and then exit(0).
56 more comments available on Hacker News