Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

24 Nov 2023

      Dear Edward,

On Fri, Nov 24, 2023 at 10:03:15AM +0100, Edward Shryane via db-wg wrote:
...
Currently the RIPE database only allows a subset of ASCII characters
in the "org-name:", "person:" and "role:" attributes, for a few
reasons including:
* These attributes are also a look-up key and the Whois protocol does
  not allow specifying character sets in queries.
* RPSL names are ASCII according to RFC2622
* Using a normalised name makes the object easier to query
* Reading a normalised name is easier to interpret
However there are some drawbacks to forcing names to only use a subset
of ASCII characters:
* Organisations, roles and persons cannot use their actual name if it
  includes characters outside this subset.
* Normalisation is not standard, but is an interpretation done by each
  maintainer, e.g. characters could be excluded or converted in
  different ways.
The above two points are key in making the RIPE database useful and
accessible to everyone, I too would love to see those points addressed.
...
Since we support the Latin-1 character set in the RIPE database, I
propose we also allow non-ASCII Latin-1 characters in these
attributes.
Querying for a name can be done either using the latin-1 characters
(proposed) or a normalised, ASCII representation (currently). The
normalised version will be generated by Whois and stored in a database
index for querying. The primary key will also be generated from the
normalised version.
Please let me know your feedback.
Wouldn't it be an opportune time to support UTF-8 instead of LATIN-1?
As I understand it, through the use of UTF-8 more languages could be
supported. UTF-8 seems to be the preferred character encoding in any new
IETF work (for good reason).

Have the effects of LATIN-1 on downstream applications such as NRTM v3
and NRTM v4 been considered?

You indicate that LATIN-1 already is supported in the RIPE database, so
I imagine you and the team already deliberated on the pro's and con's of
UTF-8 vs LATIN-1; and as such concluded with this particular
recommendation. I just wanted to make sure to raise these questions. :-)

Some interesting reading material on UTF-8 https://utf8everywhere.org/

Kind regards,

Job

Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

Job Snijders