Hi, i suspect that there is a problem with my atlas probe, sometimes it is just gone / offline for ~10 minutes (more or less) although there is clearly no issue on the network. Other checks and monitoring works fine at the same time and shows about 100% uptime... I am not sure how to investigate, is there any chance to check the uptime of the probe and not just connection time? Maybe the device is hanging or rebooting? Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-06 17:03:27 5d 22h 40m 2016-01-12 15:44:09 0h 5m xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-01 00:29:44 5d 16h 16m 2016-01-06 16:46:33 0h 16m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 23:47:35 0h 38m 2016-01-01 00:25:40 0h 4m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:59:18 0h 41m 2015-12-31 23:40:32 0h 7m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:35:12 0h 20m 2015-12-31 22:56:08 0h 3m Best Regards Max M.
Even worse: Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for 1xx.xxx.xxx.xx ctr-ams07, NL 2016-01-31 08:48:48 1d 1h 38m Still Connected 1xx.xxx.xxx.xx ctr-nue19, DE 2016-01-29 05:07:31 2d 1h 24m 2016-01-31 06:32:02 2h 16m Could it be a problem of the ripe "controllers"? E.g.if they are offline my probe also has a downtime , how fast do they failover to another Controller?? (Why was it offline for 2hours, until it switched to another one) Best Regards Max M. On 01.02.2016 09:50, Max Mühlbronner wrote:
Hi,
i suspect that there is a problem with my atlas probe, sometimes it is just gone / offline for ~10 minutes (more or less) although there is clearly no issue on the network. Other checks and monitoring works fine at the same time and shows about 100% uptime...
I am not sure how to investigate, is there any chance to check the uptime of the probe and not just connection time? Maybe the device is hanging or rebooting?
Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-06 17:03:27 5d 22h 40m 2016-01-12 15:44:09 0h 5m xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-01 00:29:44 5d 16h 16m 2016-01-06 16:46:33 0h 16m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 23:47:35 0h 38m 2016-01-01 00:25:40 0h 4m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:59:18 0h 41m 2015-12-31 23:40:32 0h 7m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:35:12 0h 20m 2015-12-31 22:56:08 0h 3m
Best Regards
Max M.
Hi Max, ctr-nue19 was rebooted around 11 o'clock UTC because it was not responding. The logic is following: If a controller doesn't send heartbeats for more than 2 hours it is excluded from the list of available controllers. The algorithm that assigns a probe to a controller tries to assign probes the controller it was connected last time. So your case your probe needed to migrate to another controller (ctr-ams07) after 2 hours of trying to connect to the last controller. It is not necessarily a downtime. It is stated here int the report as the disconnection time. During that time your probes does measurements but it doesn't sent results to controller. We are introducing the probe uptime metrics, but it is still in the making and not exposed to users yet and not included into the billing. WBR /vty On 2/1/16 11:29 AM, Max Mühlbronner wrote:
Even worse:
Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for 1xx.xxx.xxx.xx ctr-ams07, NL 2016-01-31 08:48:48 1d 1h 38m Still Connected 1xx.xxx.xxx.xx ctr-nue19, DE 2016-01-29 05:07:31 2d 1h 24m 2016-01-31 06:32:02 2h 16m
Could it be a problem of the ripe "controllers"? E.g.if they are offline my probe also has a downtime , how fast do they failover to another Controller?? (Why was it offline for 2hours, until it switched to another one)
Best Regards
Max M.
On 01.02.2016 09:50, Max Mühlbronner wrote:
Hi,
i suspect that there is a problem with my atlas probe, sometimes it is just gone / offline for ~10 minutes (more or less) although there is clearly no issue on the network. Other checks and monitoring works fine at the same time and shows about 100% uptime...
I am not sure how to investigate, is there any chance to check the uptime of the probe and not just connection time? Maybe the device is hanging or rebooting?
Internet Address Controller Connected (UTC) Connected for Disconnected (UTC) Disconnected for xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-06 17:03:27 5d 22h 40m 2016-01-12 15:44:09 0h 5m xxx.xxx.xxx.xx ctr-nue19, DE 2016-01-01 00:29:44 5d 16h 16m 2016-01-06 16:46:33 0h 16m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 23:47:35 0h 38m 2016-01-01 00:25:40 0h 4m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:59:18 0h 41m 2015-12-31 23:40:32 0h 7m xxx.xxx.xxx.xx ctr-nue19, DE 2015-12-31 22:35:12 0h 20m 2015-12-31 22:56:08 0h 3m
Best Regards
Max M.
participants (2)
-
Max Mühlbronner
-
Viktor Naumov