Dear colleagues, Based on the discussion regarding UTF-8 in the RIPE database during the interim meeting yesterday, I suggest that we implement support for UTF-8 in the database (i.e. convert the schema and add a flag to allow a client to choose a character set), but we do not allow additional characters for now, pending further DB-WG discussion. Our intention is to lay the groundwork for future support, without breaking existing functionality. If you have any concerns or objections please let me know. We will now prepare an implementation plan / impact analysis of these changes. Regards Ed Shryane RIPE NCC
On 24 Nov 2023, at 10:03, Edward Shryane via db-wg <db-wg@ripe.net> wrote:
Dear colleagues,
Currently the RIPE database only allows a subset of ASCII characters in the "org-name:", "person:" and "role:" attributes, for a few reasons including:
* These attributes are also a look-up key and the Whois protocol does not allow specifying character sets in queries. * RPSL names are ASCII according to RFC2622 * Using a normalised name makes the object easier to query * Reading a normalised name is easier to interpret
However there are some drawbacks to forcing names to only use a subset of ASCII characters:
* Organisations, roles and persons cannot use their actual name if it includes characters outside this subset. * Normalisation is not standard, but is an interpretation done by each maintainer, e.g. characters could be excluded or converted in different ways.
Since we support the Latin-1 character set in the RIPE database, I propose we also allow non-ASCII Latin-1 characters in these attributes.
Querying for a name can be done either using the latin-1 characters (proposed) or a normalised, ASCII representation (currently). The normalised version will be generated by Whois and stored in a database index for querying. The primary key will also be generated from the normalised version.
Please let me know your feedback.
Regards Ed Shryane RIPE NCC
---
Whois attribute verbose description (copied from the help text).
org-name -------- Specifies the name of the organisation that this organisation object represents in the RIPE Database. This is an ASCII-only text attribute. The restriction is because this attribute is a look-up key and the whois protocol does not allow specifying character sets in queries. The user can put the name of the organisation in non-ASCII character sets in the "descr:" attribute if required.
A list of 1 to 30 words separated by white space. A word is made up of ASCII alphanumeric characters and additionally: ][)(._"*@,&:!'`+/- A word may have up to 64 characters and is not case sensitive. Each word can have any combination of the above characters with no restriction on the start or end of a word.
person ------ Specifies the full name of an administrative, technical or zone contact person for other objects in the database.
It should contain 2 to 10 words. A word is made up of ASCII alphanumeric characters and additionally: .`'_- The first word should begin with a letter. At least one other word should also begin with a letter. Max 64 characters can be used in each word.
role ---- Specifies the full name of a role entity, e.g. RIPE DBM.
A list of 1 to 30 words separated by white space. A word is made up of ASCII alphanumeric characters and additionally: ][)(._"*@,&:!'`+/- A word may have up to 64 characters and is not case sensitive. Each word can have any combination of the above characters with no restriction on the start or end of a word.
--
To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/db-wg