New subject: Proposal to Allow UTF-8 in ???descr:??? and ???remarks??? Attributes

22 Oct 2025

      Dear colleagues,

As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.

Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback. 

Regards
Ed Shryane
RIPE NCC

Problem Definition
------------------

It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability.

Solution Definition
-------------------

In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy.

We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported.

Background
----------

Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes. 

"Proposal to allow UTF8 (April 2015)"
https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH...

In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database.
https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...

At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. 
https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT...
https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op...

Impact Analysis
---------------

Backwards Compatibility
UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters.

Personal Data
Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.

Interoperability
If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used.

Valid Codepoints
Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).

Guidelines for the Implementation of Internationalized Domain Names
https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en

Transliteration
Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is).

Impact on RIPE Database Services 
--------------------------------

Whois (Port 43) Query
* The “descr:” and “remarks:” attributes are returned by default on port 43 query responses.
* Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character.
* The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.

NRTMv3 (Port 4444)
* The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character.

NRTMv4
* No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files.

Whois REST API
* No impact. The Whois REST API already supports UTF-8.

RDAP
* No impact. The RDAP protocol already supports UTF-8.

Web Application
* UTF-8 encoding is already supported on the query page. 
* The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes.

Mailupdates
* No impact. UTF-8 encoding is supported.

Syncupdates
* No impact. UTF-8 encoding is supported.

Daily Database Dump and Split Files
* The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character.
* We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions.

New LIR Application
* No impact.

Registry Team
* No comments or conerns as changes are limited to descr and remarks attributes.

Proposal to Allow UTF-8 in “descr:” and “remarks” Attributes

Edward Shryane

Nick Hilliard

Randy Bush

Gert Doering

Marco d'Itri

Edward Shryane

Daniel Suchy

Ben Cartwright-Cox

Clement Cavadore

Peter Hessler

Janos Zsako

Peter Koch

Leo Vegoda

Peter Hessler

Piotr Strzyzewski

Ben Cartwright-Cox

Peter Hessler

denis walker

Edward Shryane

Mick O'Donovan

Cynthia Revström

Edward Shryane

tags

participants (16)