History log of /freebsd-current/sys/netinet/tcp_pcap.c
Revision Date Author Comments
# f7d5900a 28-Dec-2023 John Baldwin <jhb@FreeBSD.org>

sys: Style fix for M_EXT | M_EXTPG

Add a space around the | operator in places testing for either M_EXT
or M_EXTPG.

Reviewed by: imp, glebius
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43216


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# e68b3792 07-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: embed inpcb into tcpcb

For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.

The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.

The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.

One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.

Large part of the change was done with sed script:

s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g

Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.

Differential revision: https://reviews.freebsd.org/D37127


# bf6c6162 15-Jun-2022 Michael Tuexen <tuexen@FreeBSD.org>

tcp: fix TCPPCAP for kernels enabling VNET

Reviewed by: rscheff
MFC after: 3 days
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D35503


# 61664ee7 02-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 4.2: start divorce of M_EXT and M_EXTPG

They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598


# 6edfd179 02-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 4.1: mechanically rename M_NOMAP to M_EXTPG

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598


# 82334850 28-Jun-2019 John Baldwin <jhb@FreeBSD.org>

Add an external mbuf buffer type that holds multiple unmapped pages.

Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: ae, kp (firewalls)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616


# 24b9bb56 06-Jul-2016 Jonathan T. Looney <jtl@FreeBSD.org>

The TCPPCAP debugging feature caches recently-used mbufs for use in
debugging TCP connections. This commit provides a mechanism to free those
mbufs when the system is under memory pressure.

Because this will result in lost debugging information, the behavior is
controllable by a sysctl. The default setting is to free the mbufs.

Reviewed by: gnn
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D6931
Input from: novice_techie.com


# b4b12e52 10-Feb-2016 Gleb Smirnoff <glebius@FreeBSD.org>

Garbage collect unused arguments of m_init().


# 962d02b0 14-Oct-2015 Bjoern A. Zeeb <bz@FreeBSD.org>

Hopefully also unbreak VIMAGE kernels replacing the &V_... with
&VNET_NAME(...).
Everything else is just a whitespace wrapping change.


# f87ec781 14-Oct-2015 Bjoern A. Zeeb <bz@FreeBSD.org>

Properly define functions withut argument and wrap for { for style purposes
as followed in the rest of the file. This will hopefully make gcc more happy.


# 86a996e6 13-Oct-2015 Hiren Panchasara <hiren@FreeBSD.org>

There are times when it would be really nice to have a record of the last few
packets and/or state transitions from each TCP socket. That would help with
narrowing down certain problems we see in the field that are hard to reproduce
without understanding the history of how we got into a certain state. This
change provides just that.

It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is
destroyed, the list is freed. I thought this was likely to be more
performance-friendly than saving copies of the tcpcb. Plus, with the packets,
you should be able to reverse-engineer what happened to the tcpcb.

To enable the feature, you will need to compile a kernel with the TCPPCAP
option. Even then, the feature defaults to being deactivated. You can activate
it by setting a positive value for the number of captured packets. You can do
that on either a global basis or on a per-socket basis (via a setsockopt call).

There is no way to get the packets out of the kernel other than using kmem or
getting a coredump. I thought that would help some of the legal/privacy concerns
regarding such a feature. However, it should be possible to add a future effort
to export them in PCAP format.

I tested this at low scale, and found that there were no mbuf leaks and the peak
mbuf usage appeared to be unchanged with and without the feature.

The main performance concern I can envision is the number of mbufs that would be
used on systems with a large number of sockets. If you save five packets per
direction per socket and have 3,000 sockets, that will consume at least 30,000
mbufs just to keep these packets. I tried to reduce the concerns associated with
this by limiting the number of clusters (not mbufs) that could be used for this
feature. Again, in my testing, that appears to work correctly.

Differential Revision: D3100
Submitted by: Jonathan Looney <jlooney at juniper dot net>
Reviewed by: gnn, hiren