Hi Denis,

Apologies if this email comes off as rude or harsh, that isn't my intention, but I am not quite sure how else to phrase it.

So when I say things like this, it's not because of cost or anything like that.
I say it because I don't think validating the CSV is something that would be a benefit.

Abuse contacts are validated and required (except for some legacy resources iirc) because they are important in order to report abuse.

Having a geofeed service is not a requirement, and additionally I would think that the data consumer would almost always be software dedicated to this.
If so, then that software can easily validate the CSV data itself.

I see abuse contacts and geofeed as very different things considering those 2 things.

-Cynthia

On Wed, Apr 7, 2021, 00:45 denis walker <ripedenis@gmail.com> wrote:
Hi guys

I've changed the subject as it goes a bit off topic and becomes more
general and reaches out beyond just the DB-WG. I've been going to say
this for a while but never got round to it until now. Apologies for
saying it in response to your email Job but it's not directed at you.

There are two phrases that frustrate me every time I see them used:
"The RIPE NCC is not the 'xyz' police"
"It's not the job of the RIPE NCC to do 'abc'"

These are just dramatised ways of saying no to something. But the
drama doesn't really add anything. No one is expecting the RIPE NCC to
investigate any crimes or arrest anyone. They are not the 'geoip
police', the 'internet police', the 'abuse police'. So what are they?
I think everyone would agree that what the RIPE NCC does today is not
the same as they did when they first started in business. So the job
that they do has changed. Their role or mandate has grown, expanded,
contracted, moved sideways, diversified, etc. Every time they started
to do something different or new, it could have been said (and maybe
was said) that it was not their job to do that. But they are doing it
now anyway. So I would rather turn these infamous statements round and
be positive instead of negative. Let's stop saying what it's not their
job to do and ask if it is, or should/could it be, their job to do
something helpful or beneficial.

The internet technical infrastructure is like a whole ecosystem now.
Lots of different elements all working together and managed or
controlled by large numbers of organisations. If anyone wants to have
a good life in this cyber world, all parts of this ecosystem need to
be operating well. Many of these elements have no checks or
monitoring. They run on trust. Trust is hard to build and easy to
lose. Once people lose trust in one element they start to call it a
swamp, say it's inaccurate, useless, needs to be replaced. These
comments have often been made about the RIPE Database as a whole,
often by people partly responsible for it's content. It's also been
said about parts of the content like abuse contacts. It could end up
being said about geofeed data.

One of the reasons people use to justify these infamous statements is
the cost or complexity of doing something. They think to do checks
needs FTEs sitting behind desks doing laborious tasks. That costs
money for the members. They forget this is the 21st century. We have
learned now how to use computers to do these tasks for us. Abuse
contact checking is a good example. Every proposal to do anything in
this area is repeatedly hit with these infamous statements and more.
Perhaps because the technical checks now being done are done the wrong
way. If an email address fails the checks it triggers manual
intervention requiring an FTE to schedule an ARC with the resource
holder and follow up discussions. This should be fully automated. If a
monthly check fails, software should send an email to the registered
contact for the resource holder. If n monthly checks fail the
ORGANISATION object in the RIPE Database should be tagged as having an
invalid abuse contact. That information should be available for anyone
to see. Public disclosure can be the penalty for failing to handle
abuse. People can then make informed decisions.

How does this affect geofeed? The same principles apply here. What we
have now is a handful of companies providing geolocation data. I am
sure they put a lot of effort into ensuring their data is accurate.
This geofeed attribute will delegate this information process out to
thousands of organisations. Some of these will put a lot of effort
into ensuring their data is valid and accurate. Some may put less
effort in, especially over time. If a proportion of this data starts
to degrade over time, is shown to be inaccurate or syntactically
invalid, trust in the whole system dies. If checks and tests can be
done to validate the data in any way it may help to keep it up to date
and accurate. If each RIR maintains a list of geofeed urls in a file
on the FTP site, each RIR can check availability of those urls each
month for all the RIRs lists. I don't know if checks from 5 locations
is enough. Maybe a third party system can be used for the 'is it up'
check? Any repeated failures can be notified to the resource holders'
contact. If each RIR downloads the files for their region they can
check the syntax, check for conflicting data in multiple files within
a hierarchy, etc. Any failures can be reported to the contact. All of
this can be automated. If any repeated errors are not fixed the
geofeed data in the RIPE Database can again be tagged as invalid or
suspect. When anyone accesses this data it comes with a red flag. It
is up to them if they will trust any of that data file.

For both abuse contacts and geofeed, a system can be set up for
(trusted) users to report problems. Maybe abuse contacts that are
valid but never resolve any reported issues. Or geofeed data that is
known to be inaccurate. By adding appropriate tags to the meta data in
the RIPE Database which can be publicly viewed this becomes a
reputational system. Overall it would improve the quality of data
available in or through the RIPE Database, which improves the value of
the services. There may be other elements in the database that could
benefit from this type of tagging and reporting.

I see the RIPE NCC as being in a good position to do these type of
checks and tests. It would not be the RIPE Database software doing the
checks, but an additional RIPE NCC service. Minimal costs with fully
automated checks can give added benefits. I think it is their job to
do this for the good of the internet.

cheers
denis
co-chair DB-WG


On Tue, 6 Apr 2021 at 19:50, Job Snijders <job@sobornost.net> wrote:
>
> Thanks for the extensive note Denis, thanks Cynthia for being
> first-responder. I wanted to jump in on a specific subthread.
>
> On Tue, Apr 06, 2021 at 06:38:29PM +0200, Cynthia Revström via db-wg wrote:
> > > Questions:
> >
> > > -Should the database software do any checks on the
> > > existence/reachability of the url as part of the update with an error
> > > if the check fails?
> >
> > I would say yes as this is not a new concept to the DB as I believe
> > this is already done with domain objects.
>
> I disagree on this one point, what is the RIPE DB supposed to do when it
> discovers one state or another? Should the URIs be probed from many
> vantage points to compare? Once you try to monitor if something is up or
> down it can quickly become complicated.
>
> The content the 'geofeed:' attribute value references to something
> outside the RIPE DB, this means the RIPE DB software should not be
> crawling it.
>
> All RIPE NCC's DB software needs to check is whether the string's syntax
> conforms to the HTTPS URI scheme.
>
> > > -Should the RIPE NCC do any periodic repeat checks on the continued
> > > existence/reachability of the url?
> >
> > I would say that checking once a month or so could be fine, as long as
> > it just results in a just a nudge email.
> > Like don't enforce it, but nudge people if it is down.
>
> It seems an unnecessary burden for RIPE NCC's business to check whether
> a given website is up or down. What is such nudging supposed to
> accomplish? It might end up being busy work if done by an individual RIR.
>
> > > -Should the RIPE NCC do any periodic checks on the content structure
> > > of the csv file referenced by the url?
> >
> > I don't have a strong opinion either way here but I feel like that is
> > not really something the NCC is responsible for checking.
> > But if the NCC should check then my comments about the repeat
> > reachability checks above apply here too.
>
> The RIPE NCC should not check random URIs, they are not the GeoIP police ;-)
>
> Kind regards,
>
> Job