Usually the problem lies in TCP/IP implementation. Each network interface (no matter if virtual, physical, on switch, server, printer) have incoming and outgoing queue (buffer) which have some size. Packets goes first to this buffer and then they are processed one by one by TCP/IP layers.
When there is lot of traffic in very short period of time, typically when you start big file transfer, this buffer fills and newly incoming packets are dropped. For ongoing TCP connection it's no problem, because dropped packets as they are not acknowledged are resend. But new connections and state less protocols are not able recover and it "timeouts".
In monitoring we usually consider such device as down because it's not responding.
So if you have irregularly flapping devices or more specifically packet loss and your problem might be dropped packets and you shall consider extending those buffers.
This is known issue of VMWare by the way
https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039495
If you want to prove this the problem you just need to get the dropped packet counters so here are tips how you can get it.
For Windows
From Zenoss
winrm -r http://hotsname:5985 -u service-acount@example.net -a kerberos --dcip 10.10.10.10 -f "select * from Win32_PerfRawData_Tcpip_NetworkInterface" -d
From Windows Powershell
Get-WmiObject -computername hostname -Query "select Name,PacketsOutboundDiscarded,PacketsReceivedDiscarded from Win32_PerfRawData_Tcpip_NetworkInterface"
Output example:
__GENUS : 2
__CLASS : Win32_PerfRawData_Tcpip_NetworkInterface
__SUPERCLASS :
__DYNASTY :
__RELPATH : Win32_PerfRawData_Tcpip_NetworkInterface.Name="vmxnet3 Ethernet Adapter _2"
__PROPERTY_COUNT : 3
__DERIVATION : {}
__SERVER :
__NAMESPACE :
__PATH :
Name : vmxnet3 Ethernet Adapter _2
PacketsOutboundDiscarded : 0
PacketsReceivedDiscarded : 50
PacketsReceivedDiscarded ( https://msdn.microsoft.com/en-us/library/aa394340(v=vs.85).aspx )Data type: uint32
Access type: Read-only
Qualifiers: DisplayName ("Packets Received Discarded") ,
CounterType (65536) ,
DefaultScale (0) ,
PerfDetail(200)
Number of inbound packets that were chosen to be discarded even though no errors had been detected to prevent delivery to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space.
For SNMP Devices
Snmpwalk/get for below OIDsifInDiscards .1.3.6.1.2.1.2.2.1.13
ifOutDiscards .1.3.6.1.2.1.2.2.1.19
The number of in/out bound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space
root@raspberrypi:/home/pi# snmpwalk -v 2c -c public 10.0.0.40 .1.3.6.1.2.1.2.2.1.2 iso.3.6.1.2.1.2.2.1.2.1 = STRING: "eth0" iso.3.6.1.2.1.2.2.1.2.2 = STRING: "LOOPBACK" root@raspberrypi:/home/pi# snmpwalk -v 2c -c public 10.0.0.40 .1.3.6.1.2.1.2.2.1.13 iso.3.6.1.2.1.2.2.1.13.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.13.2 = Counter32: 0 root@raspberrypi:/home/pi# snmpwalk -v 2c -c public 10.0.0.40 .1.3.6.1.2.1.2.2.1.19 iso.3.6.1.2.1.2.2.1.19.1 = Counter32: 0 iso.3.6.1.2.1.2.2.1.19.2 = Counter32: 0
For Linux base devices
On linux based devices you can use ifconfig command nad check for dropped countersroot@raspberrypi:/var/www# ifconfig
eth0 Link encap:Ethernet HWaddr b8:27:eb:1d:89:fe
inet addr:10.0.0.37 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:19462418 errors:0 dropped:43030 overruns:0 frame:0
TX packets:12891339 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4260766110 (3.9 GiB) TX bytes:2835848497 (2.6 GiB)
No comments:
Post a Comment