Re: [atlas] [ripe.net #1133019] Re: Your RIPE Atlas Probe (ID: 16784) is not connected to our network
It's surprising, that you can't manage probe remotelly when it isn't "connected" (even you can directly reach it on L3). Technically, probe runs on Linux, so you can have SSH daemon with restricted IP access running here (listenning daemon has minimal performance overhead) - by current "behavior" you simply loose any way to diagnose problem - and dependence of established tunnel is another layer, which can fail... As stated before - simple rebooting is just a *workaround* of any issue, not a solution (but it seems that's your basic recomendation). I think you should work on improvements on probe diagnostics in such cases - new hardware versions of probes seems to be more problematic compared to older probes, as I can see also on atlas mailing list. I remember, that I had to plug out/in USB flash as some step with probe before (also your recomended "solution"). Also on probe should run some kind of watchdog, which reboots probe in case of similar problem automatically - probably it doesn't run now... With regards, Daniel On 31.7.2014 17:31, Philip Homburg wrote:
Hi Daniel,
On Tue Jul 29 19:29:20 2014, danny@danysek.cz wrote:
What checks did you performed at your side before sending such email? You don't have any kind of "emenergency" probe management, like SSH shell access directly to the device? What do you see in *your* logs about affected probe before it was disconnected?
The mail gets sent when the is not connected for some time. Nothing more.
There are no open ports on the probes. The probe can only be controlled when it is connected.
There is some evidence in the logs that there was already something wrong with the probe when it disconnected. Most likely there is something wrong with the filesystem on the USB flash drive, but that is hard to say.
Philip
On Fri, 01 Aug 2014 10:33:30 +0200 Daniel Suchy <danny@danysek.cz> wrote:
It's surprising, that you can't manage probe remotelly when it isn't "connected" (even you can directly reach it on L3). Technically, probe runs on Linux, so you can have SSH daemon with restricted IP access
So do you suggest that everyone has to configure a port 22 portforward from their router/CPE to the probe? Or place the probe on its own dedicated public IP address? Nope, neither is going to happen, not feasible in perhaps 90+% of cases. -- With respect, Roman
Hi Daniel, On 2014/08/01 10:33 , Daniel Suchy wrote:
It's surprising, that you can't manage probe remotelly when it isn't "connected" (even you can directly reach it on L3). Technically, probe runs on Linux, so you can have SSH daemon with restricted IP access running here (listenning daemon has minimal performance overhead) - by current "behavior" you simply loose any way to diagnose problem - and dependence of established tunnel is another layer, which can fail...
Always having an ssh port open is a security risk. And it doesn't do anything for probes behind NAT.
As stated before - simple rebooting is just a *workaround* of any issue, not a solution (but it seems that's your basic recomendation).
Yes. Because the behavior of your probe is very rare. You can't just code for every special situation.
I think you should work on improvements on probe diagnostics in such cases - new hardware versions of probes seems to be more problematic compared to older probes, as I can see also on atlas mailing list. I remember, that I had to plug out/in USB flash as some step with probe before (also your recomended "solution").
Getting the probe to reflash its USB stick is a cost effective solution from our point of view. It usually solves problems without requiring any effort from our site.
Also on probe should run some kind of watchdog, which reboots probe in case of similar problem automatically - probably it doesn't run now...
The version 1 and 2 probes have a watchdog. That was an endless sort of trouble. Note, the probes are run in such a way that it takes a minimal amount of time to manage a network that consists of thousands of probes. If we spot any patterns that we try to accommodate that. However, some probes just behave in unexpected ways. We can't really do much about that. Philip
Hi, On Fri, Aug 01, 2014 at 10:33:30AM +0200, Daniel Suchy wrote:
It's surprising, that you can't manage probe remotelly when it isn't "connected" (even you can directly reach it on L3). Technically, probe runs on Linux, so you can have SSH daemon with restricted IP access running here (listenning daemon has minimal performance overhead) - by current "behavior" you simply loose any way to diagnose problem - and dependence of established tunnel is another layer, which can fail...
This is intentional and welcome. The probes are not devices that you talk to, secure against intrusions, etc. - they are satellites that are part of a measurement system. (The way the probes signal problems by issueing special DNS requests could be documented, though :) ). Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
(The way the probes signal problems by issueing special DNS requests could be documented, though :) ).
One can argue if we can/should do better, but in the meantime, the "network information" tab of the probe status page has the following: "SOS History (Showing only the last 25) This probe probably last rebooted on 2014-xxx The probe sends a message using DNS queries every time it tries to reconnect to the system. Below you can find a list of the most recent messages. The "power-up time" column shows the approximate "powered-up time" of the probe at the time of sending the message. " Cheers, Robert
Probes can request some https URL, asking for commands(say every 5 minutes), and if there is need to reboot the probe appropriate command can be replied to appropriate probe. This does not need an open SSH, or port forwarding. Does not affect security but accomplishes goal of rebooting probes remotely (either individually or by groups of probes) or sending to probes other commands. Regards. /Alex On Fri, Aug 1, 2014 at 1:10 PM, Robert Kisteleki <robert@ripe.net> wrote:
(The way the probes signal problems by issueing special DNS requests could be documented, though :) ).
One can argue if we can/should do better, but in the meantime, the "network information" tab of the probe status page has the following:
"SOS History (Showing only the last 25)
This probe probably last rebooted on 2014-xxx
The probe sends a message using DNS queries every time it tries to reconnect to the system. Below you can find a list of the most recent messages. The "power-up time" column shows the approximate "powered-up time" of the probe at the time of sending the message. "
Cheers, Robert
Hi, On Fri, Aug 01, 2014 at 11:10:33AM +0200, Robert Kisteleki wrote:
(The way the probes signal problems by issueing special DNS requests could be documented, though :) ).
One can argue if we can/should do better, but in the meantime, the "network information" tab of the probe status page has the following:
"SOS History (Showing only the last 25)
This probe probably last rebooted on 2014-xxx
Oh, cool. Seems I need to check that stuff more often :-) - thanks for adding it. Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
participants (6)
-
Alex Saroyan
-
Daniel Suchy
-
Gert Doering
-
Philip Homburg
-
Robert Kisteleki
-
Roman Mamedov