Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neighbor state #3191

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

eugercek
Copy link

@eugercek eugercek commented Nov 23, 2024

Implements part of #535

Some things we can talk:

  • We can omit state="stale" like labels and just give the state value to Gauge (IMO current impl is simpler) although I saw below in CONTRIBUTING.md I didn't see many magic numbers in the metrics and labels thus added human readable names for them.

For example a proc file contains the magic number 42 as some identifier, the Node Exporter should expose it as it is and not keep a mapping in code to make this human readable.

  • For code quality I can split Update function to make it more readable (again IMO it's simple enough to do it in one function)

This pr adds below metrics

# HELP node_arp_states ARP states by device
# TYPE node_arp_states gauge
node_arp_states{device="eth0",state="stale"} 1
node_arp_states{device="eth0",state="failed"} 2
# ...

Old comments

I changed the design in second commit due to not exploding cardinality but keep the comments if maintainers wants that design.

Iff you want to have metrics like below, can continue to read:

node_arp_states{device="eth0",state="stale", ip="1.1.1.1"} 1
node_arp_states{device="eth0",state="stale", ip="1.1.1.2"} 1
node_arp_states{device="eth0",state="failed", ip="1.1.1.3"} 1

Some things we can talk:

  • Can remove neighborState struct and just use string slice, but this is easier to follow IMO
  • This change may create many metrics, to make a estimation Default values of arp cache size are: min 128 and max 1024 therefore we may except up to 1024*count(net_int) as max. But in some "fine tuning"s I saw people generally increase this although it's limited by the size of you subnet, one can have many machines in subnet. Maybe I should make a counter for each state?

@SuperQ it's ready for your review, when you have time 🙏🏻

Signed-off-by: Emin Umut Gercek <[email protected]>
collector/arp_linux.go Outdated Show resolved Hide resolved
Signed-off-by: Emin Umut Gercek <[email protected]>
@eugercek
Copy link
Author

Also fixed the merge conflict.


for _, n := range neighbors {
// Skip entries which have state NUD_NOARP to conform to output of /proc/net/arp.
if n.State&unix.NUD_NOARP == 0 {
entries[n.Interface.Name]++
if n.State&unix.NUD_NOARP != unix.NUD_NOARP {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can say?

Suggested change
if n.State&unix.NUD_NOARP != unix.NUD_NOARP {
if n.State != unix.NUD_NOARP {

Copy link
Contributor

@dswarbrick dswarbrick Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, n.State is a bit mask. The bitwise AND is necessary to isolate the NUD_NOARP flag that we wish to test for. Other bits may be set in n.State, so you cannot just do a simple comparison as you suggest.
cf. man 7 rtnetlink:

              ndm_state is a bit mask of the following states:
              NUD_INCOMPLETE   a currently resolving cache entry
              NUD_REACHABLE    a confirmed working cache entry
              NUD_STALE        an expired cache entry
              NUD_DELAY        an entry waiting for a timer
              NUD_PROBE        a cache entry that is currently reprobed
              NUD_FAILED       an invalid cache entry
              NUD_NOARP        a device with no destination cache
              NUD_PERMANENT    a static entry

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation, makes sense. Then current implementation looks ok. I changed n.State&unix.NUD_NOARP == 0 to have early return, can revert that if you wish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants