1.Overview of connect
Overview of connect
In client side, after we create a socket, we connect to a remote backend via:
There are many technical details behind this single call. We have discussed how a socket object is created, but in userspace we only hold the file descriptor fd as an integer.
In the Linux kernel it is a combination of multiple objects — file, socket, sock, etc. — each with an ops function pointer set attached. The socket data structure looks like this:

2.The Syscall Entry
The Syscall Entry
When connect is invoked on the client side, the following system call in the Linux kernel is executed:
sockfd_lookup_light resolves the integer fd into the kernel socket object.
3.From inet_stream_connect to tcp_v4_connect
From inet_stream_connect to tcp_v4_connect
Since sock is of type AF_INET, sock->ops->connect points to inet_stream_connect:
A freshly created socket is in state SS_UNCONNECTED.
Inside the switch block, sk->sk_prot->connect points to tcp_v4_connect:
The socket state is immediately promoted to TCP_SYN_SENT. After that, inet_hash_connect selects an available local port, and tcp_connect builds and transmits the SYN packet.
4.Ephemeral Port Selection: inet_hash_connect
Ephemeral Port Selection: inet_hash_connect
inet_sk_port_offset(sk) computes a pseudo-random starting offset by hashing the 4-tuple (src IP, dst IP, dst port, net namespace) against a per-boot random secret. This means the port search begins at a different position on every connect() call, providing port randomization as a security measure.
__inet_check_established is passed as the collision-check callback. It verifies that the candidate port does not produce a duplicate 4-tuple in the established connections table.
5.Inside __inet_hash_connect: Scanning the Port Range
Inside __inet_hash_connect: Scanning the Port Range
If snum is zero the socket has not been bind()-ed manually, so the kernel must pick a port from the ephemeral range. inet_get_local_port_range reads net.ipv4.ip_local_port_range — defaults to 32768–60999 on most Linux systems.
Remark. If we call bind() before connect() on the client side, inet_sk(sk)->inet_num will already hold the bound port number, so snum is non-zero and the kernel skips the entire port-search loop — it simply uses whatever port we specified.
This is almost never desirable in client code: it prevents us from opening more than one simultaneous connection to the same remote endpoint, and it can collide with ports already in use. bind() before connect() is a server-side pattern for pinning a well-known port. In client code, we should leave it out and let the kernel handle ephemeral port selection.
Starting from offset, the loop tries every port in a circular fashion until an available one is found. The port availability check is:
inet_is_reserved_local_port skips any port listed in net.ipv4.ip_local_reserved_ports. We can add application ports there to prevent the kernel from accidentally picking them as ephemeral ports.
hinfo->bhash is a hash map recording all ports currently bound to a socket. A port absent from bhash is unconditionally available — inet_bind_bucket_create registers it and the loop exits via goto ok.
If a port is already in bhash, the loop does not give up immediately — it delegates to check_established to determine whether the port can be safely reused (see the next section).
If no port is available after exhausting the entire range, -EADDRNOTAVAIL is returned to userspace as:
When we encounter this error in production, the first thing to check is whether net.ipv4.ip_local_port_range is wide enough for our connection volume.
6.Port Reuse and the 4-Tuple Uniqueness Rule
Port Reuse and the 4-Tuple Uniqueness Rule
When a port is already in bhash, check_established (which resolves to __inet_check_established) is called:
inet_ehash_bucket is the hash table of all sockets currently in ESTABLISHED or SYN_SENT state. For each bucket entry, INET_MATCH compares the full 4-tuple:
The 4-tuple checked is (local IP, local port, remote IP, remote port). A port is reusable as long as no existing connection shares the exact same 4-tuple. This is why the same local port can legitimately back multiple simultaneous connections — each goes to a different remote IP or remote port, making every 4-tuple globally unique.
7.Why One Machine Can Have Far More Than 65535 Connections
Why One Machine Can Have Far More Than 65535 Connections
The common misconception is that port numbers cap us at 65535 connections. In reality TCP uniqueness is governed by the full 4-tuple, not just the local port alone:
If we connect to N different servers, each local port can be reused once per distinct remote endpoint. With a port range of ~28000 ports and connections spread across many remote IPs and ports, one machine can sustain hundreds of thousands or even millions of simultaneous outgoing connections — the limit is memory and CPU, not the port number space.
8.Building and Sending the SYN: tcp_connect
Building and Sending the SYN: tcp_connect
Once inet_hash_connect returns a valid port, tcp_v4_connect calls tcp_connect:
tcp_connect performs four things in sequence:
- Allocates an
skband initialises it as a SYN segment. - Enqueues it onto
sk_write_queue. - Calls
tcp_transmit_skbto pass it down the network stack and out the NIC. - Arms the retransmit timer so the SYN is resent if no SYN-ACK arrives in time.
9.The Retransmit Timer
The Retransmit Timer
The initial retransmit timeout TCP_TIMEOUT_INIT is defined as 1 second (older kernel versions used 3 seconds). If the SYN is lost and no SYN-ACK is received within this window, the kernel doubles the timeout (exponential backoff) and retransmits, up to the limit set by net.ipv4.tcp_syn_retries.
10.Summary on connect
Summary on connect
-
What happens locally when
connectis called. The kernel immediately promotes the socket state toTCP_SYN_SENT, selects an available ephemeral port viainet_hash_connect, builds a SYN segment, transmits it withtcp_transmit_skb, and arms a retransmit timer — all before any network round-trip completes. -
Port exhaustion and
EADDRNOTAVAIL.inet_hash_connectscansnet.ipv4.ip_local_port_range(default 32768–60999) in a randomised order. If every port in the range is occupied, it returns-EADDRNOTAVAIL("Cannot assign requested address"). The first tuning knob to reach for in production is wideningip_local_port_range, e.g.: -
Reserving ports with
ip_local_reserved_ports. If certain port numbers must not be consumed as ephemeral ports (for example, because an application listens on them intermittently), add them tonet.ipv4.ip_local_reserved_ports. The kernel'sinet_is_reserved_local_portcheck will skip them during the search loop, preventing accidental conflicts. -
One machine can sustain far more than 65 535 connections. TCP uniqueness is enforced on the full 4-tuple
(local IP, local port, remote IP, remote port), not on the local port alone. The same local port can back many simultaneous connections as long as each goes to a distinct remote endpoint. With enough remote diversity, a single machine can maintain hundreds of thousands — or even millions — of concurrent outgoing connections; the real limits are memory and CPU, not the 16-bit port number space. -
Port-search cost grows as the range fills up. Because the search starts at a pseudo-random offset, a lightly loaded system finds a free port in one or two iterations. As
ip_local_port_rangeapproaches saturation, the loop must cycle through progressively more occupied entries before landing on a usable port, driving up CPU cost perconnectcall. Keeping the range comfortably larger than peak concurrent connections avoids this degradation. -
Do not call
bindbeforeconnecton the client side. Oncebindassigns a port,inet_sk(sk)->inet_numis non-zero and the kernel skips the entire port-search loop, locking the socket to that single port. This prevents more than one simultaneous connection to the same remote endpoint from the same port and risks colliding with ports already in use.bindbeforeconnectis a server-side pattern for pinning a well-known port; in client code, omit it and let the kernel manage ephemeral port selection.








