Re: [db-wg] Internationalized domain names in the data abase?

6 Nov 2019

      In message <1AA95AD8-3729-4BBB-A921-1535429A9658@ripe.net>, 
Edward Shryane <eshryane@ripe.net> wrote:
...
...
...
DB-WG: should we allow non-ASCII addresses in the RIPE database?
Do you mean email addresses or street addresses as well?
I mean to continue to allow non-ASCII (i.e. Latin-1 encoded) IDN email
addresses, such as the example mentioned. Or, do we automatically encode
non-ASCII characters as punycode.
I want to be crystal clear here.  Street addresses, person names, city
names, or any other data value (except for ASNs, IP addresses, ISO 3166
country codes, and domain names) that are encoded in full 8-bit ISO-8859-1
within the data base do not present any terrific problems for me personally
because, generally speaking, I don't anticipate that I will ever be trying
to parse those person names, street names, city names, etc.  I will just
use them "as is" and in whatever encoding they happen to be in when I
receive them.

Quite certainly, within the RIPE region there are billions upon billions
of person names, street names, and city names that cannot be accurately
represented in US-ASCII, nor even, I must note, in ISO-8859-1.  (I am
thinking of your fellow RIPE members in places where cyrillic is used,
and also your fellow RIPE members in Israel and elsewhere.)

In ancient times (e.g. prior to the issuance of, for example, RFC3490
in March, 2003) 7-bit US-ASCII was used fairly exclusively within the
data bases of all of the Regional Internet Registries.  And I, for one,
am greatly appreciative of all of the effort and contortions, over so
many years, that so many people have gone through in order to try, as
best as they could, to anglicize person, street, and city names, especially
those that were not really amenable to that process, and to convert them
all into some 7-bit ASCII approximation of the actual "native" strings.
Even though this conversion process has often rendered thye resulting
anglicized versions substantially inaccurate, it has served to keep
processing code simple, at least up until now.

Now however I see that 8-bit ISO-8859-1 encodings are creaping in, at
least to the RIPE data base.  I am torn by this.  On the one hand this
new development augurs a sea change which will likely end by complicating
a lot of tools, and not only my own.  On the other hand, the benfits are
clear; more accurate representations of person, street, and city names
within the data base... BUT still quite limited to names that can be
accurately represented within ISO-8859-1, a character set which excludes
some very large swaths of RIPE territory.

Even at the risk of making my own life more complicated, I have to say
that I personally place a higher value on accuracy than I do on simplicity.
For this reason, it is my feeling that the data base should evolve in
the direction of UTF-8 and *not* in the rather different and far more
limiting direction of ISO-8859-1.

That having been said however, domain names are a really very special
and different concern.  I personally am not aware of any standard which
suggests that domain names should ever be written in ISO-8859-1.  Rather,
for domain names, the available choices of representation seem to be either
(a) 7-bit US-ASCII or else (b) punycode (RFC3492) or else (c) UTF-8.

Obviously, 7-bit US-ASCII is really no longer an option, and hasn't been
ever since the publication of RFC3490 in 2003.  At the present moment,
punycode can be used, and can represent all domain names with 100%
accuracy, even while allowing the evolution of the encoding of other
data base fields to proceed and to be debated independently.

The bottom line is that in the short term, and for the immediate future,
I believe that there is no other sensible choice except to decree that
all domain names within the data base shall be represented in punycode
form.
...
Whois could automatically translate to and from the punycode format, if
an IDN format address is encountered.
Yes, but please just leave this to the WHOIS *client* to handle.  It is
less desirable, I think, to perform this conversion on the server side.

Regards,
rfg

Re: [db-wg] Internationalized domain names in the data abase?

Ronald F. Guilmette