I had a frustrating issue recently where I was creating security groups in NSX, and I noticed that under the IP section of the groups it had more IPs than the one IP assigned to the VM.
For the purposes of this post, let’s say the 240 IP is the one the VM shouldn’t have, or is the bad IP.
I could not figure out why NSX had this second IP for the VM. I did release/renew on the VM, this changed the good IP, but the bad one remained.
I disconnected the nic from the vm itself. This removed the good IP, but the bad remained.
I checked DNS, DHCP, and everything else I could imagine. Finally, I just deleted the nic itself from the VM. Then, both IPs disappeared. So the bad IP was directly tied to the nic somehow then.
At this point I did some research into how NSX discovers and “registers” IPs. NSX has a default IP discovery policy.
One of the first things that jumps out to me is “Trust on First Use”….what does that mean???
By default, the discovery methods ARP snooping and ND snooping operate in a mode called trust on first use (TOFU). In TOFU mode, when an address is discovered and added to the realized bindings list, that binding remains in the realized list forever. TOFU applies to the first ‘n’ unique bindings discovered using ARP/ND snooping, where ‘n’ is the binding limit that you can configure. You can disable TOFU for ARP/ND snooping. The methods will then operate in trust on every use (TOEU) mode. In TOEU mode, when an address is discovered, it is added to the realized bindings list and when it is deleted or expired, it is removed from the realized bindings list. DHCP snooping and VM Tools always operate in TOEU modehttps://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.0/administration/GUID-29B42B44-4616-4436-8565-12912E8949DF.html
I underlined the important part.
So TOFU will remember the first IP a VM interface has FOREVER. This is why I couldn’t get rid of the “bad” IP until I deleted the NIC.
In my specific case the VM had been cloned from another. When it first powers on the VM must have had the bad IP for a quick second before DHCP realized it expired and grabbed a new IP.
NSX remembered though.
In the end I created and new IP Discovery Policy that had TOFU disabled, I applied it, and my NSX groups were fixed immediately.