RIPE Database history - let's make a decision
Colleagues "NWI-2 - displaying history for DB objects where available" has been open for almost 5 years. Every time we raise this we get 1 or 2 responses, but not enough to declare a consensus on. Although no one has spoken against the idea of making (moree) history available, there is no consensus on what and how to display historical data. So lets review where we are and where we 'could' go...then make some decisions. Available data: The RIPE Database keeps every version of every object since the current RPSL database was launched in April 2001. The data set is almost a complete set. Due to a technical issue a small amount of historical data was permanently lost from the database around 2003. This data does include all deleted objects and objects that have been deleted and recreated. The primary key of any object allows for the correlation of versions across a deletion/re-creation break. However it should be realised that, even if the primary key is the same, a re-created object may not be related to a previous version that was deleted. For example, an assignment may be made to company A. It is later deleted. At some point the same addresses may be assigned to company B. This 're-creates' an INET(6)NUM object with the same primary key as the deleted one. But the two sets of data are unrelated. All of this data is technically available. Excluded data: Privacy laws are quite strict now in the EU. The 'right to be forgotten' in an age where the internet forgets very little is also a significant concern these days. Because the RIPE Database is operated by the RIPE NCC, which is a company registered in the EU, all EU privacy laws apply to all the data held in the RIPE Database. This is a fact. There is no point having any further debate on this issue. So objects that contain only personal data will never be made available through any historical data interface. This includes the PERSON object. Because the ROLE object has been used interchangeably with the PERSON object over the last 20 years it has to be assumed that they may also contain mostly personal data. These objects are excluded from historical data. Operational objects include attributes that are considered to hold personal data. For example the "nic-hdl:" and various email, phone, address attributes are deemed to potentially hold personal data. This data is redacted from historical data in query results. A question that I don't believe has been put to the legal team is whether any case can be made to allow any of the redacted data to be legally included. In other words is there any gray area here? Can an argument be made that some data items, like say "nic-hdl:" are of sufficient interest to some groups, maybe investigators of hijacking resources or LEAs, that would overcome the EU's privacy concerns? Current service: Every time we discuss this issue some people keep turning the conversation onto the current service, who, why, what... It is what it is. It was explained in detail at the time in a RIPE Labs article: https://labs.ripe.net/Members/kranjbar/proposal-to-display-history-of-object... Lets move on and discuss what you want rather than what you now have. Possibilities: -The focus will be, as it is now, on operational object data. With the one possible caveat on redacted elements mentioned above. -We can remove the arbitrary limitation on presenting history beyond a delete/re-created boundary. -We can allow historical queries on deleted data that does not exist in the database now as a 'current' object. (deleted but not re-created). -How do you want to access historical data? *through all RIPE Database query interfaces *only through a visual representation in RIPEstat *a dedicated whowas service *should historical data be available in bulk *could historical data be a part of the NRTM (in theory if historical operational data is available through unlimited queries, including deleted data, then it would be possible to data mine the entire historical data set) With big data technology, it is technically possible to present a view across the database at a given point in time. This would allow you to query the version of all objects which existed at that point in time. Whether this could be technically managed as a public service I don't know. Anything else that you would like regarding historical data? We can make this as simple or complex as you wish...if you have a use case and subject to legal constraints and technical challenges. Discuss..... cheers denis co-chair DB-WG
did the db-tf really not look at this? randy --- randy@psg.com `gpg --locate-external-keys --auto-key-locate wkd randy@psg.com` signatures are back, thanks to dmarc header mangling
did the db-tf really not look at this?
since it seems not, perhaps the wg should not get too far out in front of the tf. randy
participants (2)
-
denis walker
-
Randy Bush