Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

24 Nov 2023

      On 2023 Nov 24 (Fri) at 10:42:11 +0100 (+0100), Edward Shryane via db-wg wrote:
:> On 24 Nov 2023, at 10:21, Job Snijders <job@fastly.com> wrote:
:> On Fri, Nov 24, 2023 at 10:03:15AM +0100, Edward Shryane via db-wg wrote:
:I wrote an impact analysis on UTF-8 in the RIPE database last year:
:https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...
:
:We already support UTF-8 in the Whois REST API and on the website, but convert to/from latin-1 in the database.
:
:Switching to UTF-8 in the database is not technically difficult, but we need functional requirements from the community on where to allow UTF-8 characters.
:
:This proposal is only to support more Latin-1 characters to be supported in names, while preserving backwards compatibility for querying (by also doing normalisation to ASCII).
:
:> Have the effects of LATIN-1 on downstream applications such as NRTM v3
:> and NRTM v4 been considered?
:
:Allowing Latin-1 in these name attributes *does* impact NRTMv3 and NRTMv4 (as they will no longer be ASCII only), but these characters are already allowed elsewhere in RPSL (e.g. the workaround of putting the correct name in the "descr:" attribute). Also the object primary key will remain ASCII.
:
:> 
:> You indicate that LATIN-1 already is supported in the RIPE database, so
:> I imagine you and the team already deliberated on the pro's and con's of
:> UTF-8 vs LATIN-1; and as such concluded with this particular
:> recommendation. I just wanted to make sure to raise these questions. :-)
:> 
:
:We can switch to UTF-8, this proposal allows more characters in those attributes without needing to change the database character set.
:

I think it would be best if we migrated the entire database to UTF-8
first, upgrading all LATIN-1 attributes to UTF-8 at that time, then
change attributes to allow UTF-8 or keep them as ASCII.

I'd like to avoid adding more LATIN-1 if we can avoid it.

-- 
43rd Law of Computing:
	Anything that can go wr
fortune: Segmentation violation -- Core dumped

Re: [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes

Peter Hessler