Puny code or UTF-8 (or both)?

newer
NWI-11 Internationalised Domain...

ripedenis＠yahoo.co.uk

13 Jul 2020 13 Jul '20

8:12 a.m.

Colleagues After the recent discussion it seems their is support for the idea of UTF-8, whilst there is also support for Puny code as an interim step. To implement UTF-8 in the RIPE Database requires much more detailed discussions, including non technical aspects of the change. This is not likely to happen quickly. There has been support shown to introduce Puny code as a first step towards internationalisation of the data, which can be done quickly by the RIPE NCC. Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8? cheersdenis co-chair DB-WG

Attachments:

attachment.html (text/html — 1.3 KB)

Show replies by date

Cynthia Revström

13 Jul 13 Jul

8:18 a.m.

Hi Denis,

...

Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

I think this is probably the best solution. - Cynthia On Mon, Jul 13, 2020 at 2:13 PM ripedenis--- via db-wg <db-wg@ripe.net> wrote:

...

Colleagues

After the recent discussion it seems their is support for the idea of UTF-8, whilst there is also support for Puny code as an interim step.

To implement UTF-8 in the RIPE Database requires much more detailed discussions, including non technical aspects of the change. This is not likely to happen quickly.

There has been support shown to introduce Puny code as a first step towards internationalisation of the data, which can be done quickly by the RIPE NCC.

Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

cheers denis

co-chair DB-WG

Gert Doering

8:25 a.m.

hi, On Mon, Jul 13, 2020 at 12:12:42PM +0000, ripedenis--- via db-wg wrote:

...

Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

Works for me. Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Michael Emmer Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279

Nick Hilliard

8:33 a.m.

ripedenis--- via db-wg wrote on 13/07/2020 13:12:

...

There has been support shown to introduce Puny code as a first step towards internationalisation of the data, which can be done quickly by the RIPE NCC.

Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

re: punycode, there's nothing to stop anyone using the ascii form of punycode for email addresses today. So we need to ask what we're actually supporting here. Is this a simply that the whois server will automatically normalise all email addresses to a specific format, or are we talking about something more extensive than this? Either way, we also need to keep an eye on the road ahead. I.e. if there is work planned for future native utf8 support, do we aim for using utf8 as the normal form now, and ask the DB people to put in a utf->punycode translator for whois presentation. Does this also apply to RDAP / REST API? There are a bunch of open questions here. It would be great to get input from the NCC staff about their thoughts on this, because I'm sure it's something that they've given a good deal of consideration to. Nick

Peter Koch

9:03 a.m.

On Mon, Jul 13, 2020 at 12:12:42PM +0000, ripedenis--- via db-wg wrote:

...

Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

I'm not sure I understand the proposal. "punycode" is primarily IDNA2003 speak - and the initial suggestion by Ed was to apply an automatic conversion. How would that system deal with conversion failures and/or with ambiguities between IDNA2003 and IDNA2008? -Peter

Tony Finch

10:43 a.m.

Peter Koch via db-wg <db-wg@ripe.net> wrote:

...

I'm not sure I understand the proposal.

Me too :-)

...

"punycode" is primarily IDNA2003 speak

AFAIK IDNA2008 uses punycode in exactly the same way as IDNA2003. One of the major changes was to get rid of stringprep.

...

How would that system deal with conversion failures and/or with ambiguities between IDNA2003 and IDNA2008?

My understanding is that we want to support Unicode for lots of fields in the database, and the suggestion is that it might be easier to jam punycode into the existing ISO 8859-1 fields. I think this will be difficult if the database is going to use punycode for fields that aren't domain names or email addresses, and that don't have standard encoding rules. In particular I wonder how to handle spaces and upper/lower case. It might be easier to use base64 than punycode (but actually I think that's a terrible idea). There's also George Michaelson's point that the database should have both the original form of the field as well as a latin transliteration if necessary. And this is necessary regardless of how the original form is encoded (UTF-8, punycode, whatever). So I think it might be worth adding support for transcoding to/from punycode domain names and email addresses without waiting for full UTF-8 support, because that's likely to be useful in the long term. (Maybe something like the DENIC `-T ace` whois option?) But for other fields I doubt there is a stop-gap that will be easy and useful and not enormously regrettable in the future. Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ South Fitzroy: Northerly 5 to 7. Moderate or rough, becoming slight or moderate in northeast. Mainly fair. Good.

Edward Shryane

10:47 a.m.

Hello Peter, WG,

...

On 13 Jul 2020, at 15:03, Peter Koch via db-wg <db-wg@ripe.net> wrote:

On Mon, Jul 13, 2020 at 12:12:42PM +0000, ripedenis--- via db-wg wrote:

...
Do we therefore have support to introduce Puny code now and then consider how to move forward with UTF-8?

I'm not sure I understand the proposal. "punycode" is primarily IDNA2003 speak - and the initial suggestion by Ed was to apply an automatic conversion.

Correct, I propose to automatically convert non-ASCII IDN email address domains into Punycode, as per RFC 5891.

...

How would that system deal with conversion failures and/or with ambiguities between IDNA2003 and IDNA2008?

We plan to use a third-party Java library to convert Punycode, rather than implementing one from scratch (doing it ourselves would increase complexity and time taken). Any conversion failures will result in the attribute value being considered invalid. A workaround in that case is to manually convert the IDN to ASCII (Punycode).

...