Proposal to Allow UTF-8 in “descr:” and “remarks” Attributes
Dear colleagues, As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes. Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback. Regards Ed Shryane RIPE NCC Problem Definition ------------------ It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability. Solution Definition ------------------- In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy. We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported. Background ---------- Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes. "Proposal to allow UTF8 (April 2015)" https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH... In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip... At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op... Impact Analysis --------------- Backwards Compatibility UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters. Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. Interoperability If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used. Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters). Guidelines for the Implementation of Internationalized Domain Names https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en Transliteration Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is). Impact on RIPE Database Services -------------------------------- Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. * Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done. NRTMv3 (Port 4444) * The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character. NRTMv4 * No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files. Whois REST API * No impact. The Whois REST API already supports UTF-8. RDAP * No impact. The RDAP protocol already supports UTF-8. Web Application * UTF-8 encoding is already supported on the query page. * The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes. Mailupdates * No impact. UTF-8 encoding is supported. Syncupdates * No impact. UTF-8 encoding is supported. Daily Database Dump and Split Files * The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. * We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions. New LIR Application * No impact. Registry Team * No comments or conerns as changes are limited to descr and remarks attributes.
Edward Shryane wrote on 22/10/2025 06:59:
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
yes, definitely. Gradual introduction sounds like a good plan. Nick
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
is this email from the 1990s? ランディ ブッシュ
Hi, On Wed, Oct 22, 2025 at 08:59:41AM +0300, Edward Shryane wrote:
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Support, as proposed, with whois output kept at Latin-1 unless a client signals "yes, I can handle UTF-8", and just using UTF-8 in the more modern protocols (RDAP, Web API, etc). Gert Doering -- NetMaster -- have you enabled IPv6 on something today...? SpaceNet AG Vorstand: Sebastian v. Bomhard, Karin Schuler, Sebastian Cler Joseph-Dollinger-Bogen 14 Aufsichtsratsvors.: A. Grundner-Culemann D-80807 Muenchen HRB: 136055 (AG Muenchen) Tel: +49 (0)89/32356-444 USt-IdNr.: DE813185279
On Oct 22, Gert Doering <gert@space.net> wrote:
Support, as proposed, with whois output kept at Latin-1 unless a client signals "yes, I can handle UTF-8", and just using UTF-8 in the more modern protocols (RDAP, Web API, etc). FWIW, my client started asking for UTF-8 output from the RIPE server since version 5.6.1. Older versions expect Latin-1 encoding as usual.
It is available in Debian 13, but most other distributions apparently do not care to keep the whois client up to date in stable releases: https://repology.org/project/whois/versions . -- ciao, Marco
Thanks Marco,
On 19 Nov 2025, at 12:08, Marco d'Itri <md@Linux.IT> wrote:
On Oct 22, Gert Doering <gert@space.net> wrote:
Support, as proposed, with whois output kept at Latin-1 unless a client signals "yes, I can handle UTF-8", and just using UTF-8 in the more modern protocols (RDAP, Web API, etc). FWIW, my client started asking for UTF-8 output from the RIPE server since version 5.6.1. Older versions expect Latin-1 encoding as usual.
It is available in Debian 13, but most other distributions apparently do not care to keep the whois client up to date in stable releases: https://repology.org/project/whois/versions .
I used Whois 5.6.3 on debian and confirmed the following flags were sent by default to the RIPE database server : "-V Md5.6.3 --charset UTF-8" Regards Ed Shryane RIPE NCC
Hello, yes, RIPE database should support UTF-8. - Daniel On 10/22/25 7:59 AM, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Regards Ed Shryane RIPE NCC
Problem Definition ------------------
It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability.
Solution Definition -------------------
In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy.
We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported.
Background ----------
Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes.
"Proposal to allow UTF8 (April 2015)" https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH...
In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...
At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op...
Impact Analysis ---------------
Backwards Compatibility UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters.
Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
Interoperability If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used.
Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).
Guidelines for the Implementation of Internationalized Domain Names https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en
Transliteration Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is).
Impact on RIPE Database Services --------------------------------
Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. * Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.
NRTMv3 (Port 4444) * The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character.
NRTMv4 * No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files.
Whois REST API * No impact. The Whois REST API already supports UTF-8.
RDAP * No impact. The RDAP protocol already supports UTF-8.
Web Application * UTF-8 encoding is already supported on the query page. * The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes.
Mailupdates * No impact. UTF-8 encoding is supported.
Syncupdates * No impact. UTF-8 encoding is supported.
Daily Database Dump and Split Files * The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. * We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions.
New LIR Application * No impact.
Registry Team * No comments or conerns as changes are limited to descr and remarks attributes.
----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
Sounds great! I strongly advocate for UTF-8 support in the ripe database On Wed, Oct 22, 2025, 11:25 Daniel Suchy via db-wg <db-wg@ripe.net> wrote:
Hello, yes, RIPE database should support UTF-8.
- Daniel
On 10/22/25 7:59 AM, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Regards Ed Shryane RIPE NCC
Problem Definition ------------------
It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability.
Solution Definition -------------------
In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy.
We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported.
Background ----------
Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes.
"Proposal to allow UTF8 (April 2015)"
https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH...
In May 2022, I published a RIPE Labs article on the impact analysis of
supporting UTF-8 in the RIPE database.
https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...
At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database
and asked for feedback.
https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT...
https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op...
Impact Analysis ---------------
Backwards Compatibility UTF-8 is backwards compatible with ASCII, in the same way as Latin-1.
Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters.
Personal Data Users must not add personal data in “remarks:” or “descr:” attributes,
as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
Interoperability If interoperability is a concern (i.e. a notice must be readable by a
wider community) then it is recommended that only ASCII values are used.
Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a
Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).
Guidelines for the Implementation of Internationalized Domain Names https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en
Transliteration Transliteration to Latin-1 is only done when necessary to match the
default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is).
Impact on RIPE Database Services --------------------------------
Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port
43 query responses.
* Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.
NRTMv3 (Port 4444) * The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character.
NRTMv4 * No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files.
Whois REST API * No impact. The Whois REST API already supports UTF-8.
RDAP * No impact. The RDAP protocol already supports UTF-8.
Web Application * UTF-8 encoding is already supported on the query page. * The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes.
Mailupdates * No impact. UTF-8 encoding is supported.
Syncupdates * No impact. UTF-8 encoding is supported.
Daily Database Dump and Split Files * The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. * We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions.
New LIR Application * No impact.
Registry Team * No comments or conerns as changes are limited to descr and remarks attributes.
----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
Hi all, ++ support from me... Kind regards, -- Clément Cavadore On Wed, 2025-10-22 at 08:59 +0300, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Regards Ed Shryane RIPE NCC
Problem Definition ------------------
It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability.
Solution Definition -------------------
In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy.
We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported.
Background ----------
Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes.
"Proposal to allow UTF8 (April 2015)" https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH...
In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...
At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op...
Impact Analysis ---------------
Backwards Compatibility UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters.
Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
Interoperability If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used.
Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).
Guidelines for the Implementation of Internationalized Domain Names https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en
Transliteration Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is).
Impact on RIPE Database Services --------------------------------
Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. * Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.
NRTMv3 (Port 4444) * The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character.
NRTMv4 * No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files.
Whois REST API * No impact. The Whois REST API already supports UTF-8.
RDAP * No impact. The RDAP protocol already supports UTF-8.
Web Application * UTF-8 encoding is already supported on the query page. * The create and update page validation will be changed to allow UTF- 8 characters in “descr:” and “remarks:” attributes.
Mailupdates * No impact. UTF-8 encoding is supported.
Syncupdates * No impact. UTF-8 encoding is supported.
Daily Database Dump and Split Files * The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. * We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions.
New LIR Application * No impact.
Registry Team * No comments or conerns as changes are limited to descr and remarks attributes.
----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
[speaking as myself, not as co-chair] I support this, and I like the planned limitations for now. We can always expand this in the future once we have more experience. -peter On 2025 Oct 22 (Wed) at 08:59:41 +0300 (+0300), Edward Shryane wrote: :Dear colleagues, : :As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes. : :Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback. : :Regards :Ed Shryane :RIPE NCC : : :Problem Definition :------------------ : :It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability. : :Solution Definition :------------------- : :In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy. : :We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported. : :Background :---------- : :Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes. : :"Proposal to allow UTF8 (April 2015)" :https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH... : :In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. :https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip... : :At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. :https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... :https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op... : :Impact Analysis :--------------- : :Backwards Compatibility :UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters. : :Personal Data :Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. : :Interoperability :If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used. : :Valid Codepoints :Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters). : :Guidelines for the Implementation of Internationalized Domain Names :https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en : :Transliteration :Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is). : :Impact on RIPE Database Services :-------------------------------- : :Whois (Port 43) Query :* The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. :* Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. :* The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done. : :NRTMv3 (Port 4444) :* The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character. : :NRTMv4 :* No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files. : :Whois REST API :* No impact. The Whois REST API already supports UTF-8. : :RDAP :* No impact. The RDAP protocol already supports UTF-8. : :Web Application :* UTF-8 encoding is already supported on the query page. :* The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes. : :Mailupdates :* No impact. UTF-8 encoding is supported. : :Syncupdates :* No impact. UTF-8 encoding is supported. : :Daily Database Dump and Split Files :* The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. :* We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions. : :New LIR Application :* No impact. : :Registry Team :* No comments or conerns as changes are limited to descr and remarks attributes. :
Dear all,
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
I definitely support this, especially as I see that the proposal already deals with my immediate concerns (see below). Best regards, Janos
Regards Ed Shryane RIPE NCC
Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. * Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.
Ed, all, On Wed, Oct 22, 2025 at 08:59:41AM +0300, Edward Shryane wrote:
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
makes sense to me.
Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).
limiting the code points is likely a good thing, finding the exact reference would benefit from a bit more thought, though. Later. Best Peter
On Wed, 22 Oct 2025 at 09:00, Edward Shryane <eshryane@ripe.net> wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
I support adding some support for UTF-8 as the first step on a path towards more UTF-8. [...]
Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
What happens if or when someone does include personal data here? Thanks, Leo
On 2025 Oct 22 (Wed) at 14:03:07 +0300 (+0300), Leo Vegoda wrote: :> Personal Data :> Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. : :What happens if or when someone does include personal data here? : :Thanks, : :Leo I believe this is something that we cannot enforce in the database, but merely advise users to not add this information. However, this is not changing from the pre-UTF-8 system. -peter -- Anyone who uses the phrase "easy as taking candy from a baby" has never tried taking candy from a baby. -- Robin Hood
Dear Ed, DB-WG, On Wed, Oct 22, 2025 at 08:59:41AM +0300, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Yes, this sounds good. Best, Piotr -- Piotr Strzyżewski
I must point out the perfect irony of the subject's reply to this email turning the "smart quotes" into ?'s Latin-1 will truly never die it seems On Wed, 22 Oct 2025 at 13:32, Piotr Strzyzewski <piotr@internetsailor.net> wrote:
Dear Ed, DB-WG,
On Wed, Oct 22, 2025 at 08:59:41AM +0300, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Yes, this sounds good.
Best, Piotr
-- Piotr Strzyżewski ----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
Hi Working Group, Based on the existing comments, the Chairs have decided to put a 2 week deadline for additional comments. Please send your spooky replies by EOD October 31, and we can judge consensus then. Peter Hessler, On Behalf of the Database WG Chairs. On 2025 Oct 22 (Wed) at 08:59:41 +0300 (+0300), Edward Shryane wrote: :Dear colleagues, : :As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes. : :Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback. : :Regards :Ed Shryane :RIPE NCC : : :Problem Definition :------------------ : :It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability. : :Solution Definition :------------------- : :In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy. : :We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported. : :Background :---------- : :Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes. : :"Proposal to allow UTF8 (April 2015)" :https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH... : :In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. :https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip... : :At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. :https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... :https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op... : :Impact Analysis :--------------- : :Backwards Compatibility :UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters. : :Personal Data :Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. : :Interoperability :If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used. : :Valid Codepoints :Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters). : :Guidelines for the Implementation of Internationalized Domain Names :https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en : :Transliteration :Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is). : :Impact on RIPE Database Services :-------------------------------- : :Whois (Port 43) Query :* The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. :* Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. :* The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done. : :NRTMv3 (Port 4444) :* The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character. : :NRTMv4 :* No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files. : :Whois REST API :* No impact. The Whois REST API already supports UTF-8. : :RDAP :* No impact. The RDAP protocol already supports UTF-8. : :Web Application :* UTF-8 encoding is already supported on the query page. :* The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes. : :Mailupdates :* No impact. UTF-8 encoding is supported. : :Syncupdates :* No impact. UTF-8 encoding is supported. : :Daily Database Dump and Split Files :* The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. :* We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions. : :New LIR Application :* No impact. : :Registry Team :* No comments or conerns as changes are limited to descr and remarks attributes.
Colleagues There is a reason why this took 11+ years. A lot of the work needed for the technical change was considered and in some areas applied to the database years ago. That is why we don't have an NWI on UTF8. What we could not agree on was the policy aspect. Which attributes to allow UTF8 to be used with. Whether we should just allow UTF8 or keep the attribute as Latin1/ASCII and add an optional duplicate attribute in UTF8. This was brought up many times and no one would commit to anything. Now we have an agreement on how to move this forward. But now we don't agree that it should be a policy. In general where we apply rules or affect behaviour or mindset, we have done it with a policy. If it is a straightforward technical tweak, we can do it with an NWI. I hear what Angela says. These issues with UTF8 and contact methods do not impact on, or require any changes to, existing policies. But that statement does not rule out creating a new RIPE Database Policy to define these rules and behaviours. We seem to be in a position now where any change to the RIPE Database is considered to be a technical tweak in complete isolation of anything else. So everything is an NWI. If that was true, why are the "status:" attribute values defined in the Address Policy? Status is a database thing. If you want to change it then it is just a technical tweak. When we recently added 'ALLOCATED-ASSIGNED' why did we have to change policy? It is just database semantics. Why do we have an Abuse-c Policy? Like "status:", "abuse-c:" is just a database attribute. Either these are all policy issues, or none of them are. Let's not pick and choose so you can rush something through quickly. Similarly, if we are going to allow users to define their preferred method of being contacted and maybe have a mandatory method like email, or suggest that email is always offered, then this should also be defined in a RIPE Database Policy. Again this is not a technical tweak. It is about rules and behaviour. All of these issues define how elements of the registry are managed and used. Even if they require some technical tweak in order to implement the rules or enforce some behaviour. Now let's look in a little more detail about what we are agreeing with UTF8. As with all aspects of life in the 2020s, everyone is in a hurry to just 'get things done'. Headlines and sound bites are what most people make decisions on. Very few people have time for detail. That is always something for other people to look at. But if you like the headlines and your heads start to nod, decisions are made. Then detail becomes irrelevant...to you. We are making a habit these days of looking at issues within small bubbles, in complete isolation of the bigger picture. The consequences of your change can reach far beyond your little bubble. With regard to "remarks:", this is, and always has been, defined as free text. Absolutely anything can be included here. It has been an attribute in the database since the beginning, about 36 years ago. For most of that time it was never said this should not include any personal data. Some of these may contain personal data in "remarks:" attributes. But data can be written in UTF8 regardless of the data content. So I see no problem allowing UTF8 in "remarks:". The "descr:" attribute is very different to "remarks:". In the Impact Analysis it was said: Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. In the Operational update at RIPE-90 it was said: Allow UTF-8 in “descr:” and “remarks:” Attributes -Names and addresses NOT affected It is not correct to say it is already the case that the "descr:" attribute must not include personal data. This is exactly what the "descr:" attribute is. Again this attribute has existed since the beginning of the database. One of the early definitions of it can be found in RIPE-050 RIPE Database Template For Networks from 01 Apr 1992: inetnum: descr: Description of the network. Give organisation and place. Postal address is not needed, this can be found via the contacts. You can't send postal mail to a bunch of routers and transceivers, can you? The country is given in country:. Format: free text, one line per entry, multiple lines in sequence Example: descr: Network Bugs Feeding Facility descr: Terabit Labs Inc. descr: Northtown Mandatory For the last 36 years, operators have been adding the End User's name and location details into the "descr:" attributes. If you check the database for INETNUM and INET6NUM objects created during October 2025, you will see that most still include the name and location in these attributes. Every object type except PERSON, ROLE, KEY-CERT and IRT includes the "descr:" attribute. Across the database, in the applicable objects, there are in total: objects: 6637650 descr: 7884770 So we have almost 8m "descr:" attributes largely containing name and location details of, mostly, End Users operating public networks. Now the problem I have with this discussion and the conclusions being drawn is the mixing of UTF8 issues with those of personal data and privacy concerns. If we are talking about allowing UTF8 then let's stick to that topic. Do not mix it with privacy concerns. They are completely separate issues. You can apply UTF8 to "descr:" regardless of the data content. The current definition of "descr:" in the database documentation simply says "A short description related to the object.". But for many years it was more like RIPE-050 above. So resource holders were required to include the name and location details of a network user. They are still doing that today. Some of that is personal information. Probably no one has any idea how much of the "descr:" data is personal information. If we start pushing new rules about not including personal information in these attributes, resource holders may stop putting this information in these attributes. That could be quite damaging for some of the stakeholders of the RIPE Database. Now some may say we should not include personal data in these attributes. But we do not have a Business Requirements Document defining the business case for operating a public registry in the 2020s. So it is impossible to say if any of the exceptions in GDPR allow the registry to process this personal information. So can I suggest that as this conversation continues, and in any conclusions that are drawn, we focus on UTF8 and leave privacy for another thread. So in conclusion, I agree with allowing UTF8 in "remarks:" and "descr:" attributes, regardless of the data content of those attributes, but I think it should be defined in a RIPE Database Policy. cheers denis On Thu, 23 Oct 2025 at 10:19, Peter Hessler <phessler@theapt.org> wrote:
Hi Working Group,
Based on the existing comments, the Chairs have decided to put a 2 week deadline for additional comments. Please send your spooky replies by EOD October 31, and we can judge consensus then.
Peter Hessler, On Behalf of the Database WG Chairs.
On 2025 Oct 22 (Wed) at 08:59:41 +0300 (+0300), Edward Shryane wrote: :Dear colleagues, : :As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes. : :Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback. : :Regards :Ed Shryane :RIPE NCC : : :Problem Definition :------------------ : :It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability. : :Solution Definition :------------------- : :In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy. : :We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported. : :Background :---------- : :Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes. : :"Proposal to allow UTF8 (April 2015)" : https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH... : :In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database. : https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip... : :At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback. : https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT... : https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op... : :Impact Analysis :--------------- : :Backwards Compatibility :UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters. : :Personal Data :Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope. : :Interoperability :If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used. : :Valid Codepoints :Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters). : :Guidelines for the Implementation of Internationalized Domain Names :https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en : :Transliteration :Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is). : :Impact on RIPE Database Services :-------------------------------- : :Whois (Port 43) Query :* The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. :* Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. :* The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done. : :NRTMv3 (Port 4444) :* The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character. : :NRTMv4 :* No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files. : :Whois REST API :* No impact. The Whois REST API already supports UTF-8. : :RDAP :* No impact. The RDAP protocol already supports UTF-8. : :Web Application :* UTF-8 encoding is already supported on the query page. :* The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes. : :Mailupdates :* No impact. UTF-8 encoding is supported. : :Syncupdates :* No impact. UTF-8 encoding is supported. : :Daily Database Dump and Split Files :* The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. :* We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions. : :New LIR Application :* No impact. : :Registry Team :* No comments or conerns as changes are limited to descr and remarks attributes. ----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
Dear Denis, Thanks for reviewing the UTF-8 proposal, and apologies for the late reply (I was out last week).
On 30 Oct 2025, at 08:33, denis walker <ripedenis@gmail.com> wrote:
Colleagues
There is a reason why this took 11+ years. A lot of the work needed for the technical change was considered and in some areas applied to the database years ago. That is why we don't have an NWI on UTF8. What we could not agree on was the policy aspect. Which attributes to allow UTF8 to be used with. Whether we should just allow UTF8 or keep the attribute as Latin1/ASCII and add an optional duplicate attribute in UTF8. This was brought up many times and no one would commit to anything. Now we have an agreement on how to move this forward. But now we don't agree that it should be a policy.
In general where we apply rules or affect behaviour or mindset, we have done it with a policy. If it is a straightforward technical tweak, we can do it with an NWI. I hear what Angela says. These issues with UTF8 and contact methods do not impact on, or require any changes to, existing policies. But that statement does not rule out creating a new RIPE Database Policy to define these rules and behaviours. We seem to be in a position now where any change to the RIPE Database is considered to be a technical tweak in complete isolation of anything else. So everything is an NWI. If that was true, why are the "status:" attribute values defined in the Address Policy? Status is a database thing. If you want to change it then it is just a technical tweak. When we recently added 'ALLOCATED-ASSIGNED' why did we have to change policy? It is just database semantics. Why do we have an Abuse-c Policy? Like "status:", "abuse-c:" is just a database attribute. Either these are all policy issues, or none of them are. Let's not pick and choose so you can rush something through quickly.
Similarly, if we are going to allow users to define their preferred method of being contacted and maybe have a mandatory method like email, or suggest that email is always offered, then this should also be defined in a RIPE Database Policy. Again this is not a technical tweak. It is about rules and behaviour.
All of these issues define how elements of the registry are managed and used. Even if they require some technical tweak in order to implement the rules or enforce some behaviour.
Now let's look in a little more detail about what we are agreeing with UTF8. As with all aspects of life in the 2020s, everyone is in a hurry to just 'get things done'. Headlines and sound bites are what most people make decisions on. Very few people have time for detail. That is always something for other people to look at. But if you like the headlines and your heads start to nod, decisions are made. Then detail becomes irrelevant...to you. We are making a habit these days of looking at issues within small bubbles, in complete isolation of the bigger picture. The consequences of your change can reach far beyond your little bubble.
With regard to "remarks:", this is, and always has been, defined as free text. Absolutely anything can be included here. It has been an attribute in the database since the beginning, about 36 years ago. For most of that time it was never said this should not include any personal data. Some of these may contain personal data in "remarks:" attributes. But data can be written in UTF8 regardless of the data content. So I see no problem allowing UTF8 in "remarks:".
The "descr:" attribute is very different to "remarks:".
In the Impact Analysis it was said: Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
In the Operational update at RIPE-90 it was said: Allow UTF-8 in “descr:” and “remarks:” Attributes -Names and addresses NOT affected
It is not correct to say it is already the case that the "descr:" attribute must not include personal data. This is exactly what the "descr:" attribute is. Again this attribute has existed since the beginning of the database. One of the early definitions of it can be found in RIPE-050 RIPE Database Template For Networks from 01 Apr 1992:
See this presentation from RIPE 80 for an example why storing personal data in "descr:" is not a good idea (and the same applies for "remarks:"): https://ripe80.ripe.net/presentations/39-RIPE-Database-and-GDPR-final.pdf I agree that anything at all can be included in "descr:" or "remarks:", but personal data in either attribute will not be subject to the daily limit or filtered in query responses.
inetnum:
descr: Description of the network. Give organisation and place. Postal address is not needed, this can be found via the contacts. You can't send postal mail to a bunch of routers and transceivers, can you? The country is given in country:.
Format: free text, one line per entry, multiple lines in sequence Example: descr: Network Bugs Feeding Facility descr: Terabit Labs Inc. descr: Northtown Mandatory
For the last 36 years, operators have been adding the End User's name and location details into the "descr:" attributes. If you check the database for INETNUM and INET6NUM objects created during October 2025, you will see that most still include the name and location in these attributes. Every object type except PERSON, ROLE, KEY-CERT and IRT includes the "descr:" attribute. Across the database, in the applicable objects, there are in total:
objects: 6637650 descr: 7884770
So we have almost 8m "descr:" attributes largely containing name and location details of, mostly, End Users operating public networks. Now the problem I have with this discussion and the conclusions being drawn is the mixing of UTF8 issues with those of personal data and privacy concerns. If we are talking about allowing UTF8 then let's stick to that topic. Do not mix it with privacy concerns. They are completely separate issues. You can apply UTF8 to "descr:" regardless of the data content.
This proposal is not intended to address issues regarding personal data. However the intended change may be used to add personal data where it's not protected (for example, non-latin names and addresses can be more accurately represented in UTF-8 rather than transliterated to latin-1). That is why the impact analysis states not to store personal data in those attributes. There are other more specific attributes better suited for personal data, and also subject to filtering and the daily query limit. I wanted to be clear that storing non-latin encoded personal data is out of scope, and needs to be considered in a separate proposal.
The current definition of "descr:" in the database documentation simply says "A short description related to the object.". But for many years it was more like RIPE-050 above. So resource holders were required to include the name and location details of a network user. They are still doing that today. Some of that is personal information. Probably no one has any idea how much of the "descr:" data is personal information. If we start pushing new rules about not including personal information in these attributes, resource holders may stop putting this information in these attributes. That could be quite damaging for some of the stakeholders of the RIPE Database. Now some may say we should not include personal data in these attributes. But we do not have a Business Requirements Document defining the business case for operating a public registry in the 2020s. So it is impossible to say if any of the exceptions in GDPR allow the registry to process this personal information. So can I suggest that as this conversation continues, and in any conclusions that are drawn, we focus on UTF8 and leave privacy for another thread.
So in conclusion, I agree with allowing UTF8 in "remarks:" and "descr:" attributes, regardless of the data content of those attributes, but I think it should be defined in a RIPE Database Policy.
Thanks for your feedback. Regards Ed Shryane RIPE NCC
cheers denis
Support from me - an obvious and required enhancement at this point! - Mick On 22/10/2025 08:59, Edward Shryane wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
-- Mick O’Donovan Senior Network Engineer HEAnet CLG Ireland's National Education and Research Network 3rd Floor, North Dock 2 | 93/94 North Wall Quay | Dublin D01 V8Y6 | Ireland +353 1 6609040 | mick.odonovan@heanet.ie | www.heanet.ie Registered in Ireland, No. 275301 | CRA No. 20036270
I'm for this proposal. I would like the NCC to clarify a bit more regarding the allowed code points as the IDNA is about domain names which differ from free form text. -Cynthia On Wed, 22 Oct 2025, 08:00 Edward Shryane, <eshryane@ripe.net> wrote:
Dear colleagues,
As I presented at RIPE 89 and RIPE 90, I'd like to propose to allow UTF-8 encoded characters in "descr:" and "remarks:" attributes.
Is there support for adding UTF-8 in the RIPE database? Please let me know your feedback.
Regards Ed Shryane RIPE NCC
Problem Definition ------------------
It is currently only possible to store Latin-1 encoded data in the RIPE database. This is an issue for the majority of the RIPE region whose native language is not supported by Latin-1. We should allow regional operators to add notices to their RIPE database objects in their native language, using UTF-8 encoded data, so long as this does not affect interoperability.
Solution Definition -------------------
In order to allow operators across the RIPE region to add notices in their own local language, we will allow UTF-8 characters in the “descr:” and “remarks:” attributes only. This change reduces the risk of impact to operators, users and the RIPE NCC, and does not affect existing RIPE policy.
We can extend support for UTF-8 in additional existing or new attributes in the future, once we have more operational experience with it, but for now, only “descr:” and “remarks:” will be supported.
Background ----------
Some work has already been done towards internationalization of the RIPE database. For example, in April 2015, Piotr Strzyzewski suggested to the DB-WG to support UTF-8 in free-text attributes.
"Proposal to allow UTF8 (April 2015)"
https://mailman.ripe.net/archives/list/db-wg@ripe.net/thread/QEYKOWZBCVA6HNH...
In May 2022, I published a RIPE Labs article on the impact analysis of supporting UTF-8 in the RIPE database.
https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-rip...
At RIPE 89 and RIPE 90 I proposed to support UTF-8 in the RIPE database and asked for feedback.
https://ripe89.ripe.net/wp-content/uploads/presentations/105-RIPE89-DB-WG-UT...
https://ripe90.ripe.net/wp-content/uploads/presentations/120-RIPE90-DB-WG-Op...
Impact Analysis ---------------
Backwards Compatibility UTF-8 is backwards compatible with ASCII, in the same way as Latin-1. Any RPSL objects solely using ASCII will be compatible with UTF-8 encoding. Approximately 99% of all objects in the RIPE database only contain ASCII characters.
Personal Data Users must not add personal data in “remarks:” or “descr:” attributes, as these attributes are not included in the daily limit accounting, are not validated as they contain free text, and are not filtered by default. This is already the case in the RIPE database and the introduction of UTF-8 encoding does not change this. Personal data with UTF-8 encoding is out of scope.
Interoperability If interoperability is a concern (i.e. a notice must be readable by a wider community) then it is recommended that only ASCII values are used.
Valid Codepoints Validate UTF-8 input with the IDNA 2008 standard to decide whether a Unicode codepoint is allowed (i.e. only allow protocol valid code points). This standard is used in the implementation of Internationalised Domain Names (IDNs). This allows for consistency (code points will be mapped to a specific set of characters) and improved security (using an inclusion model to only allow certain characters).
Guidelines for the Implementation of Internationalized Domain Names https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en
Transliteration Transliteration to Latin-1 is only done when necessary to match the default response encoding. Otherwise transliteration is not done (i.e. UTF-8 characters will be returned as-is).
Impact on RIPE Database Services --------------------------------
Whois (Port 43) Query * The “descr:” and “remarks:” attributes are returned by default on port 43 query responses. * Port 43 will continue to use Latin-1 by default. If so, any UTF-8 characters outside the ASCII character set will be transliterated to Latin-1 or will be substituted with a “?” character. * The client can specify the “-Z utf-8” flag to change the response encoding to UTF-8, then no transliteration will be done.
NRTMv3 (Port 4444) * The encoding used by NRTMv3 will continue to be Latin-1. As for port 43, any non Latin-1 characters will be substituted with a “?” character.
NRTMv4 * No impact. RPSL objects will continue to be returned in UTF-8 encoding in snapshot and delta files.
Whois REST API * No impact. The Whois REST API already supports UTF-8.
RDAP * No impact. The RDAP protocol already supports UTF-8.
Web Application * UTF-8 encoding is already supported on the query page. * The create and update page validation will be changed to allow UTF-8 characters in “descr:” and “remarks:” attributes.
Mailupdates * No impact. UTF-8 encoding is supported.
Syncupdates * No impact. UTF-8 encoding is supported.
Daily Database Dump and Split Files * The encoding of the database dump and split files remains Latin-1. The “descr:” and “remarks:” attributes are included unfiltered. Any non-Latin-1 UTF-8 characters will be substituted with a “?” character. * We will provide a separate UTF-8 encoded database dump and split files, which will include “descr:” and “remarks:” attributes without substitutions.
New LIR Application * No impact.
Registry Team * No comments or conerns as changes are limited to descr and remarks attributes.
----- To unsubscribe from this mailing list or change your subscription options, please visit: https://mailman.ripe.net/mailman3/lists/db-wg.ripe.net/ As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings. More details at: https://www.ripe.net/membership/mail/mailman-3-migration/
Dear Cynthia,
On 4 Nov 2025, at 18:00, Cynthia Revström <me@cynthia.re> wrote:
I'm for this proposal.
I would like the NCC to clarify a bit more regarding the allowed code points as the IDNA is about domain names which differ from free form text.
Choosing IDNA2008 seems like a reasonable starting point to handle UTF-8, including normalisation and excluding invalid characters. A lot of work has already been done for IDNs (see RFC9233) that we can benefit from. We plan to pass any UTF-8 input through IDNA2008 and accept only "protocol valid" code points. There is good library support for IDNA2008, which will save us time rather than implementing something similar ourselves from scratch. We will need to turn off some features (we don't want case folding for example). So we will need to chose a library that gives us some flexibility. IDNA2008 also allows us to use punycode encoding to ASCII for compatibility, like we did for IDN in email addresses (see NWI-11), but return UTF-8 where supported. I suggest we implement UTF-8 support using IDNA2008, and if something is lacking that the community needs, then we adjust the implementation accordingly. Regards Ed Shryane RIPE NCC
-Cynthia
participants (16)
-
Ben Cartwright-Cox -
Clement Cavadore -
Cynthia Revström -
Daniel Suchy -
denis walker -
Edward Shryane -
Gert Doering -
Janos Zsako -
Leo Vegoda -
Marco d'Itri -
Mick O'Donovan -
Nick Hilliard -
Peter Hessler -
Peter Koch -
Piotr Strzyzewski -
Randy Bush