I run lots of Linux-based software routers on my home network to route IPv4 and IPv6. Periodically, they freak out with some IPv6-related errors that seem to indicate a problem but there is no corresponding forwarding impact. Here are two of them:
(trill:18:56:EDT)% dmesg|tail
[40878917.324479] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878920.039800] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878920.875706] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878921.920218] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878924.426656] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878925.471213] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878926.515702] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878929.022420] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878930.066760] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
[40878931.111439] ICMPv6: Received fragmented ndisc packet. Carefully consider disabling suppress_frag_ndisc.
I started getting this (code link) every few seconds for a week or so every few hours on a router that had 400+ days of uptime. From the sysctl documentation, suppress_frag_ndisc
says:
suppress_frag_ndisc - INTEGER
Control RFC 6980 (Security Implications of IPv6 Fragmentation
with IPv6 Neighbor Discovery) behavior:
1 - (default) discard fragmented neighbor discovery packets
0 - allow fragmented neighbor discovery packets
I really shouldn’t have any fragmented ND packets on my network. Everything is 1500 MTU on this specific router, but since it’s the first hop for my general purpose Wi-Fi network, maybe there is a misbehaving device? Well, I would love to debug further but the printk
does not include the MAC address or link-local source. So, my options are to tcpdump all ND until it happens again, maybe?
And now there’s this one:
(starfire:18:53:EDT)% dmesg|tail
[26897673.366452] neighbour: ndisc_cache: neighbor table overflow!
[26897673.366461] neighbour: ndisc_cache: neighbor table overflow!
[26897673.366475] neighbour: ndisc_cache: neighbor table overflow!
[26897673.366828] neighbour: ndisc_cache: neighbor table overflow!
[26897673.366839] neighbour: ndisc_cache: neighbor table overflow!
[26897673.366850] neighbour: ndisc_cache: neighbor table overflow!
[26897674.390436] neighbour: ndisc_cache: neighbor table overflow!
[26897674.390448] neighbour: ndisc_cache: neighbor table overflow!
[26897674.390460] neighbour: ndisc_cache: neighbor table overflow!
[26897674.390831] neighbour: ndisc_cache: neighbor table overflow!
I am getting this on a /core/ router that is the first-hop for a few segments and is transit as well. I’m pretty sure nothing is overflowing:
(starfire:19:02:EDT)% ip -6 nei|wc -l
43
(starfire:19:03:EDT)% ip -4 nei|wc -l
41
This also happens after a few 100s of days of uptime. Oh, but I would love to debug this but the log message doesn’t tell me what the limit is and my current usage. It also doesn’t tell me the last entry that was added and is (presumably?) dropped. So, I can’t really debug this at all. But, there seems to be no forwarding impact so I guess I’ll ignore it and searching the web either indicates this hard locks the CPU (not for me) or is a bug.
It would be really nice if printk
messages would provide a little more information here.