Re: discrepancies RIPE - InterNIC

10 Nov 1993

      Laurent & Enke:

First of all thanks for your work guys. Now for my opinion about how to 
sync the data:

First - when RIPE gives us their latest data run, you will see *no*
differences in the 193 data since we will throw out all 193 nets we
currently have. Second, if there are conflicts in the data, it will be on 
the 192 nets and class B's.  This would be a more interesting report.

To make this report even more interesting, there are various levels of errors 
that are not really told here that need to synced up. IHMO, here are my level 
of errors in severity (based on a ip network match):

1) Organizational name differences
*tied*
2) Organization addresses
2) netnames

One should easily be able to pull this from any database transfer file
(yours, ours, RIPEs).

Finally, I think the comparison should be between Merit data and InterNIC
data....

Thanks,
Mark

PS: About the weekly dumps, I think we need to first work out how to point
out the problems that need to be fixed and solve them. You have a good start
by our recent dump of the InterNIC database that can be easily separable.
...
Thanks to the InterNIC folks, the InterNIC data relative to
the Network numbers is available on a flat file on merit.edu. We ran
a comparison between the InterNIC and the Merit data, and between the
RIPE data and the InterNIC data. since RIPE is authoritative for the European
information and the InterNIC is authoritative for everything else, it seems
most important to address those discrepancies (RIPE/InterNIC).
Here is the result of the first comparison between RIPE and InterNIC:
(The data may be old for some networks, and our program may have some bugs...
The result for the Merit/InterNIC data are not provided)
Number of entries:      10423
Unregistered in NIC DB:  5298    (50%)
No difference:          108     (1%)
Small differences:	2047	(19%)
  Substrings:     1604    (15%)
  Typos:          510     (4%)
  Order:          356     (3%)
  Punctuation:    21      (0%)
  Abbreviations:  186     (1%)
  One Word only:  5       (0%)
Differences:            2970    (28%)
  100%:           758     (7%)
  80%:            498     (4%)
  60%:            494     (4%)
  40%:            723     (6%)
  20%:            497     (4%)
Remarks:
  - 10423 represents the number of networks and blocks in the RIPE DB;
each block is counted as 1 network.
  - The addresses for the RIPE networks are got from the 'Administrative
Contact' entries. For 10423 networks, 224 do not have a administrative
contact entry in the persons databases. In such a case we use the 'description'
attribute to get the address. Some entries are duplicated in the person DB,
which creates some strange addresses (e.g. 193.84.64.0 -> Alexandr Modry)
  - 'Unregistered in NIC DB' means the network is part of the RIPE block
in the NIC BD, but does not have its own entry.
  - 'Substrings' means all the words of one address are on the second
address (e.g. "Celisoft Data AB, Box 718, S-941 28 Piteaa, Sweden" and
"Celisoft Data AB, Celisoft Data AB, BOX 718, S-941 28 PITEAA, SWEDEN")
  - The differences are given in percentage of number
of different words. 100% means 80% to 100% of the word are differents.
We also take into account the typos, abbreviations, punctuation, etc...
which are not a 'difference' per se.
  - The program which compares the addresses takes about 20s for 10000
entries. The program which reads the InterNIC data and formats them for
comparison takes about 20 minutes.
We'd like to set up a plan with RIPE and the InterNIC to eliminate
the discrepancies. Merit has a list of the 5298 networks without entry in the
NIC DB and can provide a detailed list of the address differences for the
other inconsistences.
  Do you all have any suggestions as to how we can proceed to resolve
this problem?
Laurent & Enke.
PS: Could InterNIC provide your flat file periodicly? (once a week?)
PS: Any comments are wellcome.

Re: discrepancies RIPE - InterNIC

markk＠internic.net