Hi Daniel, On 2014/08/01 10:33 , Daniel Suchy wrote:
It's surprising, that you can't manage probe remotelly when it isn't "connected" (even you can directly reach it on L3). Technically, probe runs on Linux, so you can have SSH daemon with restricted IP access running here (listenning daemon has minimal performance overhead) - by current "behavior" you simply loose any way to diagnose problem - and dependence of established tunnel is another layer, which can fail...
Always having an ssh port open is a security risk. And it doesn't do anything for probes behind NAT.
As stated before - simple rebooting is just a *workaround* of any issue, not a solution (but it seems that's your basic recomendation).
Yes. Because the behavior of your probe is very rare. You can't just code for every special situation.
I think you should work on improvements on probe diagnostics in such cases - new hardware versions of probes seems to be more problematic compared to older probes, as I can see also on atlas mailing list. I remember, that I had to plug out/in USB flash as some step with probe before (also your recomended "solution").
Getting the probe to reflash its USB stick is a cost effective solution from our point of view. It usually solves problems without requiring any effort from our site.
Also on probe should run some kind of watchdog, which reboots probe in case of similar problem automatically - probably it doesn't run now...
The version 1 and 2 probes have a watchdog. That was an endless sort of trouble. Note, the probes are run in such a way that it takes a minimal amount of time to manage a network that consists of thousands of probes. If we spot any patterns that we try to accommodate that. However, some probes just behave in unexpected ways. We can't really do much about that. Philip