TCP, the workhorse of the internet
Mood
thoughtful
Sentiment
positive
Category
tech
Key topics
TCP
networking
internet infrastructure
The article provides an in-depth look at the internals of TCP, a fundamental protocol of the internet. The discussion revolves around the technical aspects and inner workings of TCP.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
91
Day 1
Avg / period
46
Based on 92 loaded comments
Key moments
- 01Story posted
11/15/2025, 6:37:50 AM
4d ago
Step 01 - 02First comment
11/15/2025, 7:58:08 AM
1h after posting
Step 02 - 03Peak activity
91 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/17/2025, 3:47:38 PM
1d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
QUIC is still "their own protocol", just implemented as another protocol nested inside a UDP envelope, the same way that HTTP is another protocol typically nested inside a TCP connection. It makes some sense that they'd piggyback on UDP, since (1) it doesn't require an additional IP protocol header code to be assigned by IANA, (2) QUIC definitely wants to coexist with other services on any given node, and (3) it allows whatever middleware analyses that exist for UDP to apply naturally to QUIC applications.
(Regarding (3) specifically, I imagine NAT in particular requires cooperation from residential gateways, including awareness of both the IP and the TCP/UDP port. Allowing a well-known outer UDP header to surface port information, instead of re-implementing ports somewhere in the QUIC header, means all existing NAT implementations should work unchanged for QUIC.)
Most firewalls will drop unknown IP protocols. Many will drop a lot of TCP; some drop almost all UDP. This is why so much stuff runs over tcp ports 80 and 443; it's almost always open. QUIC/HTTP/3 encourages opening of udp/443, so it's a good port to run unrelated things over too.
Also, given that SCTP had similar goals to QUIC and never got much deployment or support in OSes and NATs and firewalls and etc. It's a clear win to just use UDP and get something that will just work on a large portion of networks.
Additionally, firewalls are also designed to filter out any weird packets. If the packet doesn't look like you wanted to receive it, it's dropped. It usually does this by tracking open ports just like NAT, therefore many firewalls also don't trust custom protocols.
Some people here will argue that it actually really is, and that everybody experiencing issues is just on a really weird connection or using broken hardware, but those weird connections and bad hardware make up the overwhelming majority of Internet connections these days.
If middleware decides to do packet inspection, it better make sure that any behavioral differences (relative to not doing any inspection) is strictly an optimization and does not impact the correctness of the link.
Also, although I'm not a network operator by any stretch, my understanding is that TCP congestion control is primarily a function of the endpoints of the TCP link, not the IP routers along the way. As Wikipedia explains [0]:
> Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself.
Classic congestion control is done on the sender alone. The router's job is simply to drop packets when the queue is too large.
Maybe the router supports ECN, so if there's a queue going to the next hop, it will look for protocol specific ECN headers to manipulate.
Some network elements do more than the usual routing work. A traffic shaper might have per-user queues with outbound bandwidth limits. A network accelerator may effectively reterminate TCP in hopes of increasing acheivable bandwidth.
Often, the router has an aggregated connection to the next hop, so it'll use a hash on the addresses in the packet to choose which of the underlying connections to use. That hash could be based on many things, but it's not uncommon to use tcp or udp port numbers if available. This can also be used to chose between equally scored next hops and that's why you often see several different paths during a traceroute. Using port numbers is helpful to balance connections from IP A to IP B over multiple links. If you us an unknown protocol, even if it is multiplexed into ports or similar (like tcp and udp), the different streams will likely always hash onto the same link and you won't be able to exceed the bandwidth of a single link and a damaged or congested link will affect all or none of your connections.
Core routers don't inspect that field, NAT/ISP boxes can. I believe that with two suitable dedicated linux servers it is very possible to send and receive single custom IP packet between them even using 253 or 254 (= Use for experimentation and testing [RFC3692]) as the protocol number
I do hope we'll have stopped using IPv4 by then... But well, a decade after address exhaustion we are still on it, so who knows?
It uses them a little differently -- in IPv4, there is one protocol per packet, while in IPv6, "protocols" can be chained in a mechanism called extension headers -- but this actually makes the problem of number exhaustion more acute.
There is sometimes drama with it, though. Awhile back, the OpenBSD guys created CARP as a fully open source router failover protocol, but couldn't get an official IP number and ended up using the same one as VRRP. There's also a lot of historical animosity that some companies got numbers for proprietary protocols (eg Cisco got one for its then-proprietary EIGRP).
To save a skim (though it's an interesting list!), protocol codes 253 and 254 are suitable "for experimentation and testing".
(It's absolutely worth reading some of those old April Fools' RFCs, by the way [0]. I'm a big fan of RFC 7168, which introduced HTTP response code 418 "I'm a teapot".)
[0]: https://en.wikipedia.org/wiki/April_Fools%27_Day_Request_for...
You left out ICMP, my favourite! (And a lot more important in IPv6 than in v4.)
Another pretty well known protocol that is neither TCP nor UDP is IPsec. (Which is really two new IP protocols.) People really did design proper IP protocols still in the 90s.
> Can I just make up a packet and send it to a host across the Internet?
You should be able to. But if you are on a corporate network with a really strict firewalling router that only forwards traffic it likes, then likely not. There are also really crappy home routers which gives similar problems from the other end of enterpriseness.
NAT also destroyed much of the end-to-end principle. If you don't have a real IP address and relies on a NAT router to forward your data, it needs to be in a protocol the router recognizes.
Anyway, for the past two decades people have grown tired of that and just piles hacks on top of TCP or UDP instead. That's sad. Or who am I kidding? Really it's on top of HTTP. HTTP will likely live on long past anything IP.
Not necessarily. Many protocols can survive being NATed if they don't carry IP/port related information inside their payload. FTP is a famous counterexample - it uses a control channel (TCP21) which contains commands to open data channels (TCP20), and those commands specify IP:port pairs, so, depending on the protocol, a NAT router has to rewrite them and/or open ports dynamically and/or create NAT entries on the fly. A lot of other stuff has no need for that and will happily go through without any rewriting.
The end-to-end principle at the IP layer (i.e. having the IP forwarding layer be agnostic to the transport layer protocols above it) is still violated.
TCP and UDP have port numbers that the NAT software can extract and keep state tables for, so we can send the return traffic to its intended destination.
For unknown IP protocols that is not possible. It may at best act like network diode, which is one way of violating the end-to-end principle.
Even ICMP has a hard time traversing NATs and firewalls these days, for largely bad reasons. Try pinging anything in AWS, for example...
If any host is firewalling out ICMP then it won't be pingable but that does not depend on the hosting provider. AWS is no better or worse than any other in that regard, IME.
A part of the problem with UDP is the lack of good platforms and tooling. Examples as well. I’m trying to help with that, but it’s an uphill battle for sure.
There are many routers that don't care at all about what's going through them. But there aren't any firewalls that don't route anymore (not even at the endpoints).
A bunch of multicast stuff (IGMP, PIM)
A few routing protocols (OSPF, but notably not BGP which just uses TCP, and (usually) not MPLS which just goes over the wire - it sits at the same layer as IP and not above it)
A few VPN/encapsulation solutions like GRE, IP-in-IP, L2TP and probably others I can't remember
As usual, Wikipedia has got you covered, much better than my own recollection: https://en.wikipedia.org/wiki/List_of_IP_protocol_numbers
Behind a NA(P)T, you can obviously only use those protocols that the translator knows how to remap ports for.
They absolutely don't. Routers are layer 3 devices; TCP & UDP are layer 4. The only impact is that the ECMP flow hashes will have less entropy, but that's purely an optimization thing.
Note TCP, UDP and ICMP are nowhere near all the protocols you'll commonly see on the internet — at minimum, SCTP, GRE, L2TP and ESP are reasonably widespread (even a tiny fraction of traffic is still a giant number considering internet scales).
You can send whatever protocol number with whatever contents your heart desires. Whether the other end will do anything useful with it is another question.
Idealized routers are, yes.
Actual IP paths these days usually involve at least one NAT, and these will absolutely throw away anything other than TCP, UDP, and if you're lucky ICMP.
And note the GP talked about "intermediate routers". That's the ones in a telco service site or datacenter by my book.
*The protocol.
You might have more luck with an IPv6 packet.
As soon as you start thinking about having multiple services on a host you end up with the idea of having a service id or "port"
UDP or UDP Lite gives you exactly that at the cost of 8 bytes, so there's no real value in not just putting everything on top of UDP
The three drawbacks of the original TCP algorithm were the window size (the maximum value is just too small for today's speeds), poor handling of missing packets (addressed by extensions such as selective-ACK), and the fact that it only manages one stream at a time, and some applications want multiple streams that don't block each other. You could use multiple TCP connections, but that adds its own overhead, so SCTP and QUIC were designed to address those issues.
The congestion control algorithm is not part of the on-the-wire protocol, it's just some code on each side of the connection that decides when to (re)send packets to make the best use of the available bandwidth. Anything that implements a reliable stream on top of datagrams needs to implement such an algorithm. The original ones (Reno, Vegas, etc) were very simple but already did a good job, although back then network equipment didn't have large buffers. A lot of research is going into making better algorithms that handle large buffers, large roundtrip times, varying bandwidth needs and also being fair when multiple connections share the same bandwidth.
I'll add that at the time of TCP's writing, the telephone people far outnumbered everyone else in the packet switching vs circuit switching debate. TCP gives you a virtual circuit over a packet switched network as a pair of reliable-enough independent byte streams over IP. This idea, that the endpoints could implement reliability through retransmission came from an earlier French network, Cylades, and ends up being a core principle of IP networks.
> poor handling of missing packets
so it was poor at exact thing it was designed for?
When I started at university the ftp speed from the US during daytime was 500 bytes per second! You don't have many unacknowledged packages in such a connection.
Back then even a 1 megabits/sec connection was super high speed and very expensive.
I'll take flak for saying it, but I feel web developers are partially at fault for laziness on this one. I've often seen them trigger a swath of connections (e.g. for uncoordinated async events), when carefully managed multiplexing over one or a handful will do just fine.
Eg. In prehistoric times I wrote a JavaScript library that let you queue up several downloads over one stream, with control over prioritization and cancelability.
It was used in a GreaseMonkey script on a popular dating website, to fetch thumbnails and other details of all your matches in the background. Hovering over a match would bring up all their photos, and if some hadn't been retrieved yet they'd immediately move to the top of the queue. I intentionally wanted to limit the number of connections, to avoid oversaturating the server or the user's bandwidth. Idle time was used to prefetch all matches on the page (IIRC in a sensible order responsive to your scroll location). If you picked a large enough pagination, then stepped away to top up your coffee, by the time you got back you could browse through all of your recent matches instantly, without waiting for any server roundtrip lag.
It was pretty slick. I realize these days modern stacks give you multiplexing for free, but to put in context this was created in the era before even JQuery was well-known.
Funny story, I shared it with one of my matches and she found it super useful but was a bit surprised that, in a way, I was helping my competition. Turned out OK... we're still together nearly two decades later and now she generously jokes I invented Tinder before it was a thing.
Other applications work just fine with a single TCP connection
If I am using TCP for DNS, for example, and I am retrieving data from a single host such as a DNS cache, I can send multiple queries over a single TCP connection and receive multiple responses over the same single TCP single connection, out of order. No blocking.^1 If the cache (application) supports it, this is much faster than receiving answers sequentially and it's more efficient and polite than opening multiple TCP connections
1. I do this every day outside the browser with DNS over TLS (DoT) using something like streamtcp from NLNet Labs. I'm not sure that QUIC is faster, server support for QUIC is much more limited, but QUIC may have other advantages
I also do it with DNS over HTTPS (DoH), outside the browser, using HTTP/1.1 pipelining, but there I receive answers sequentially. I'm still not convinced that HTTP/2 is faster for this particular use case, i.e., downloading data from a single host using multiple HTTP requests (compared to something like integrating online advertising into websites, for example)
You're missing the point. You have one TCP connection, and the sever sends you response1 and then response2. Now if response1 gets lost or delayed due to network conditions, you must wait for response1 to be retransmitted before you can read response2. That is blocking, no way around it. It has nothing to do with advertising(?), and the other protocols mentioned don't have this drawback.
• Full-duplex connections are probably a good idea, but certainly are not the only way, or the most obvious way, to create a reliable stream of data on top of an unreliable datagram layer. TCP's predecessor NCP was half-duplex.
• TCP itself also supports a half-duplex mode—even if one end sends FIN, the other end can keep transmitting as long as it wants. This was probably also a good idea, but it's certainly not the only obvious choice.
• Sequence numbers on messages or on bytes?
• Wouldn't it be useful to expose message boundaries to applications, the way 9P, SCTP, and some SNA protocols do?
• If you expose message boundaries to applications, maybe you'd also want to include a message type field? Protocol-level message-type fields have been found to be very useful in Ethernet and IP, and in a sense the port-number field in UDP is also a message-type field.
• Do you really need urgent data?
• Do servers need different port numbers? TCPMUX is a straightforward way of giving your servers port names, like in CHAOSNET, instead of port numbers. It only creates extra overhead at connection-opening time, assuming you have the moral equivalent of file descriptor passing on your OS. The only limitation is that you have to use different client ports for multiple simultaneous connections to the same server host. But in TCP everyone uses different client ports for different connections anyway. TCPMUX itself incurs an extra round-trip time delay for connection establishment, because the requested server name can't be transmitted until the client's ACK packet, but if you incorporated it into TCP, you'd put the server name in the SYN packet. If you eliminate the server port number in every TCP header, you can expand the client port number to 24 or even 32 bits.
• Alternatively, maybe network addresses should be assigned to server processes, as in Appletalk (or IP-based virtual hosting before HTTP/1.1's Host: header, or, for TLS, before SNI became widespread), rather than assigning network addresses to hosts and requiring port numbers or TCPMUX to distinguish multiple servers on the same host?
• Probably SACK was actually a good idea and should have always been the default? SACK gets a lot easier if you ack message numbers instead of byte numbers.
• Why is acknowledgement reneging allowed in TCP? That was a terrible idea.
• It turns out that measuring round-trip time is really important for retransmission, and TCP has no way of measuring RTT on retransmitted packets, which can pose real problems for correcting a ridiculously low RTT estimate, which results in excessive retransmission.
• Do you really need a PUSH bit? C'mon.
• A modest amount of overhead in the form of erasure-coding bits would permit recovery from modest amounts of packet loss without incurring retransmission timeouts, which is especially useful if your TCP-layer protocol requires a modest amount of packet loss for congestion control, as TCP does.
• Also you could use a "congestion experienced" bit instead of packet loss to detect congestion in the usual case. (TCP did eventually acquire CWR and ECE, but not for many years.)
• The fact that you can't resume a TCP connection from a different IP address, the way you can with a Mosh connection, is a serious flaw that seriously impedes nodes from moving around the network.
• TCP's hardcoded timeout of 5 minutes is also a major flaw. Wouldn't it be better if the application could set that to 1 hour, 90 minutes, 12 hours, or a week, to handle intermittent connectivity, such as with communication satellites? Similarly for very-long-latency datagrams, such as those relayed by single LEO satellites. Together this and the previous flaw have resulted in TCP largely being replaced for its original session-management purpose with new ad-hoc protocols such as HTTP magic cookies, protocols which use TCP, if at all, merely as a reliable datagram protocol.
• Initial sequence numbers turn out not to be a very good defense against IP spoofing, because that wasn't their original purpose. Their original purpose was preventing the erroneous reception of leftover TCP segments from a previous incarnation of the connection that have been bouncing around routers ever since; this purpose would be better served by using a different client port number for each new connection. The ISN namespace is far too small for current LFNs anyway, so we had to patch over the hole in TCP with timestamps and PAWS.
- TCP should have been a reliability layer above UDB, not beside it (made P2P harder than it should be, mainly burdening teleconferencing and video games)
- Window size field bytes should have been arbitrary length
- Checksum size field bytes should have been arbitrary length and the algorithm should have been optionally customizable
- Ports should have been unique binary strings of arbitrary length instead of numbers, and not limited in count (as mentioned)
- Streams should have been encrypted by default, with clear transmission as the special case (symmetric key encryption was invented before TCP)
- IP should have connected to an arbitrary peer ID, not a MAC address, for resumable sessions if network changes (maybe only securable with encryption)
- Encrypted streams should not have been on a special port for HTTPS (not TCP's fault)
- IP address field bytes should have been arbitrary length (not TCP's fault)
- File descriptors could have been universal instead of using network sockets, unix sockets, files, pipes and bind/listen/accept/select (not TCP's fault)
- Streams don't actually make sense in the first place, we needed state transfer with arbitrary datagram size and partial sends/ranges (not TCP's fault)
Linking this to my "why your tunnel won't work" checklist:https://news.ycombinator.com/item?id=44713493
I want to add that the author of the article wrote one of the cleanest and most concise summaries of the TCP protocol that I've ever read.
The Linux kernel supports it but at least when I had tried this those modules were disabled on most distros.
Personally I found the tone of the article quite genuine and the video at the end made a compelling case for it. Well I figure you commented having actually read it.
Edit: I can't downvote but if I could it probably would have been better than this comment!
Well ... he seems very motivated. I am more skeptical.
For instance, Google via chrome controls a lot of the internet, even more so via its search engine, AI, youtube and so forth.
Even aside from this people's habits changed. In the 1990s everyone and their Grandma had a website. Nowadays ... it is a bit different. We suddenly have horrible blogging sites such as medium.com, pestering people with popups. Of course we also had popups in the 1990s, but the diversity was simply higher. Everything today is much more streamlined it seems. And top-down controlled. Look at Twitter, owned by a greedy and selfish billionaire. And the US president? Super-selfish too. We lost something here in the last some 25 years.
they would have looked at you and asked straight out what you hoped to gain by making these things distinguished, because it certainly complicates things.
If the net were designed today it would be some complicated monstrosity where every packet was reminiscent of X.509 in terms of arcane complexity. It might even have JSON in it. It would be incredibly high overhead and we’d see tons of articles about how someone made it fast by leveraging CPU vector instructions or a GPU to parse it.
This is called Eroom’s law, or Moore’s law backwards, and it is very real. Bigger machines let programmers and designers loose to indulge their desire to make things complicated.
Many networking folks including myself consider IPv6 router advertisements and SLAAC to be inferior, in practice, to DHCPv6, and that it would be better if we’d just left IP assignment out of the spec like it was in V4. Right now we have this mess where a lot of nets prefer or require DHCPv6 but some vendors, like apparently Android, refuse to support it.
The rules about how V6 addresses are chopped up and assigned are wasteful and dumb. The entire V4 space could have been mapped onto /32 and an encapsulation protocol made to allow V4 to carry V6, providing a seamless upgrade path that does not require full upgrade of the whole core, but that would have been too logical. Every machine should get like a /96 so it can use 32 bits of space to address apps, VMs, containers, etc. As it stands we waste 64 bits of the space to make SLAAC possible, as near as I can tell. The SLAAC tail must have wagged the dog in that people thought this feature was cool enough to waste 8 bytes per packet.
The V6 header allows extension bits that are never used and blocked by most firewalls. There’s really no point in them existing since middle boxes effectively freeze the base protocol in stone.
Those are some of the big ones.
Basically all they should have done was make IPs 64 or 128 bits and left everything else alone. But I think there was a committee.
As it stands we have what we have and we should just treat V6 as IP128 and ignore the rest. I’m still in favor of the upgrade. V4 is too small, full stop. If we don’t enlarge the addresses we will completely lose end to end connectivity as a supported feature of the network.
You can just SLAAC some more addresses for whatever you want. Although hopefully you don't use more than the ~ARP~ NDP table size on your router; then things get nasty. This should be trivial for VMs, and could be made possible for containers and apps.
> The V6 header allows extension bits that are never used and blocked by most firewalls. [...] Basically all they should have done was make IPs 64 or 128 bits and left everything else alone.
This feels contradictory... IPv4 also had extension headers that were mostly unused and disallowed. V6 changed the header extension mechanism, but offers the same opportunities to try things that might work on one network but probably won't work everywhere.
> The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the protocol provides the message-oriented feature of the User Datagram Protocol (UDP) while ensuring reliable, in-sequence transport of messages with congestion control like the Transmission Control Protocol (TCP). Unlike UDP and TCP, the protocol supports multihoming and redundant paths to increase resilience and reliability.
[…]
> SCTP may be characterized as message-oriented, meaning it transports a sequence of messages (each being a group of bytes), rather than transporting an unbroken stream of bytes as in TCP. As in UDP, in SCTP a sender sends a message in one operation, and that exact message is passed to the receiving application process in one operation. In contrast, TCP is a stream-oriented protocol, transporting streams of bytes reliably and in order. However TCP does not allow the receiver to know how many times the sender application called on the TCP transport passing it groups of bytes to be sent out. At the sender, TCP simply appends more bytes to a queue of bytes waiting to go out over the network, rather than having to keep a queue of individual separate outbound messages which must be preserved as such.
> The term multi-streaming refers to the capability of SCTP to transmit several independent streams of chunks in parallel, for example transmitting web page images simultaneously with the web page text. In essence, it involves bundling several connections into a single SCTP association, operating on messages (or chunks) rather than bytes.
* https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...
UDP has its place as well, and if we have more simple and effective solutions like WireGuard’s handshake and encryption on top of it we’d be better off as an industry.
Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.
https://news.ycombinator.com/newsguidelines.html60 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.