Re: [db-wg] Removing personal data from bulk output from the RIPE Database
* denis@ripe.net
Everyone is correct in what they have said. But when it comes to the "changed:" attribute you have to rely on assumptions, estimates and 'good enough' because the dates are not precisely defined. It is possible 90% of changed dates are spot on, but you don't know that.
No one would suggest that anyone would deliberately enter the wrong date. And of course if you don't add a date the software will generate the correct date for you. But if you wrote a template some time ago for creating assignments and that template has a date in it, then everytime you fill in the template to create a new assignment, it will have the wrong date. Or if you 'change' the date on a single "changed:" attribute every time you update the object, it will only show the 'last updated' date. Or as Kaveh pointed out, maybe the object was created 10 years ago, but you updated the information last week, without altering the changed date. Do you consider the object to be stale data? There are so many ways this date can represent something different to what you assume it to be.
So while I have no illusions about that I'd get an exact number from looking at the "changed:" attribute, what I was really trying to say is that tried to use the database dumps on ftp.ripe.net for some purpose, but due to excessive dummification I couldn't. Hence I support the proposal to reduce the amount of dummification. Whether or not my desired query would stand up to any "scientific" scrutiny is really besides the point. That said, going slightly off topic wrt. the proposal itself: I'm still no closer to finding an answer to my question, which could be stated as: «How many IPv4 PA Assignments have been made by RIPE NCC LIRs since the 14th of September 2012? (If possible, but less important: How many addresses did these assignments consist of, in total?)» Even a very rough estimation is useful to me. Even just the order of magnitude would be interesting. "A dozen", "about five thousands", or "hundreds of thousands" - these are all results I could put in my slides to underline the point I want to make. While I think the use of the "changed:" attribute I originally planned would give me a useful enough result, I'm in no way married to that particular approach. So if there's some other way to get the number, possibly by having any of you RIPE NCC guys fetch it for me from some internal data source, that would be much appreciated! :-) I'd need it before Wednesday (well, preferably ASAP) if I'm going to be able to make use of it though.
As Kaveh also pointed out, you can query an object with the '--list-versions' option and get a more accurate, reliable list of dates for when an object was created and each update. But if you want to find which PA assignments were created within a time period, you would have to query them all to get the accurate historical dates. There are 3.7 million assignment objects. For this type of analysis it might be easier if the split file had a well defined date(s) you could rely on.
No, I really wouldn't want to script 3.7M queries to your whois server...having an auto-generated "created:" attribute in the dumps (and in the whois service too for that matter) would certainly be better. Tore
On May 10, 2013, at 9:37 AM, Tore Anderson <tore@fud.no> wrote:
«How many IPv4 PA Assignments have been made by RIPE NCC LIRs since the 14th of September 2012? (If possible, but less important: How many addresses did these assignments consist of, in total?)»
Hi Tore, Under 20,549 Allocations provided by RIPE NCC (total of 594,469,888 addresses), there are 3,596,598 first level child assignment objects (total of 459,629,744 addresses) and out of those, 251,254 objects (total of 26,147,674 addresses) are created on or after 14th of September 2012. Here is how I did generate the numbers using the standard whois: a) I have downloaded latest version of RIPE NCC Delegated extended stats file (ftp://ftp.ripe.net/ripe/stats/delegated-ripencc-extended-latest) b) For each IPv4 resource in the file with the status "allocated" I have queried the RIPE Database for one level more specific "inetnum" objects c) For each of those objects, I have ran a "--list-versions" query, picked the first revision and compared the date with 14th of September 2012 Kind Regards, Kaveh. --- Kaveh Ranjbar, RIPE NCC Database Group Manager
* Kaveh Ranjbar
Under 20,549 Allocations provided by RIPE NCC (total of 594,469,888 addresses), there are 3,596,598 first level child assignment objects (total of 459,629,744 addresses) and out of those, 251,254 objects (total of 26,147,674 addresses) are created on or after 14th of September 2012.
Perfect! Thank you very much! :-) Tore
participants (2)
-
Kaveh Ranjbar
-
Tore Anderson