Re: [db-wg] [opensource-wg] Question for RPSL Parser
Shane is right, I don’t have the intention to shuffle the project with Bison and Flex because 1) I don’t have the knowledge of these technologies 2) I don’t have the time to learn them 3) I want to Pythonize as much as I can and be independent as much as possible (OK maybe a little bit dependent from other Python Libraries but this is the boundary). I am sure that you guys made a great job 14 years before but now or soon RPSL will be replaced with something else so I still want my toolset to be alive by easily and quickly replacing 1-2 Python libraries. What my prototype does now is extract the attributes and values by using Regular Expressions, a solution that has already been proposed in this discussion and I think this is the way to go. In theory, it can be an easy parser based on Python’s RE but complexity starts when -for example- RPSL says “accept as-set BLABLA” instead of “accept 10.100.200.0/16" and that as-set needs to be resolved (another query to RIPE DB). But anyway, thanks for your ideas. Crazy or not, I don’t mind everyone is welcome. If I still need to reply to Tomas last e-mail but I am quite busy today. Kind Regards Stavros
On 20 Aug 2015, at 13:05, Shane Kerr <shane@time-travellers.org> wrote:
Denis,
There are two problems with this:
1. Lex and YACC (actually flex and bison for the RIPE Database) are tightly integrated into C (or C++ if you are feeling brave).
2. Stavros does state a goal of avoiding reuse of past or legacy software, which surely means the RIPE Database code. (Okay, I think it was only 12 or 14 years ago when we wrote these parts, but that surely counts as "past"...)
:)
Cheers,
-- Shane
On Thu, 20 Aug 2015 10:56:20 +0000 (UTC) denis walker <dw100uk@yahoo.co.uk> wrote:
HI The RIPE Database I believe still uses LEX and YACC to parse the routing policy information from the RPSL attribute/value pairs. Is it possible to just 'drop' this into a python script and extend it to do any extra parsing you want? (I know nothing about python so I may be way off the mark here :) but it is just a thought) cheersdenis From: Shane Kerr <shane@time-travellers.org> To: Stavros Konstantaras <stavros@nlnetlabs.nl> Cc: db-wg@ripe.net Sent: Thursday, 20 August 2015, 5:21 Subject: Re: [db-wg] [opensource-wg] Question for RPSL Parser
Stavros,
On Wed, 19 Aug 2015 16:00:59 +0200 Stavros Konstantaras <stavros@nlnetlabs.nl> wrote:
To make things more clear, our project at NLnet Labs is the development of "a modern IRRtoolset" written in Python and targeting any operator, not just another homemade tool for our own needs. Or to say in more detail the creation of a tool that is able to configure BGP routers directly+automatically by extracting the policies from RIPE DB. This means that I am struggling to avoid the (re)use of any past or legacy software that exists, otherwise we loose independency and inherit restrictions from the past.
I don't know of any Python RPSL parser you can use.
Your goals make a certain amount of sense. Unfortunately RPSL is an ugly language... it's actually more like several languages in one, all bad. :P
* Start with simple attribute/value, one per line
* Oh, but then "RPSL-ize", which allows line continuation and end-of-line comments
* Oh, actually we want to make a separate grammar for each attribute
* But be sure to make all of this "extensible", to insure maximum confusion!
My approach in the past has been to start with a simple generic parser that split text into objects, objects into attributes, extracts attribute name & values (handling line continuation and end-of-line comments). You can do all of this with Python regular expressions, something like r"\n(?![ \t+])" to split objects into attributes; cleaning end-of-line comments & line continuation characters is straightforward. Then you can deal with the grammar for the attributes you know, and ignore the rest. You can extend the parsing to be more and more comprehensive until you are able to parse the data set you worry about.
I'm not sure this is worth doing... but if there was a good, easy-to-use Python library then maybe interesting things would result?
Cheers,
-- Shane
On 20/08/2015 13:11, Stavros Konstantaras wrote:
What my prototype does now is extract the attributes and values by using Regular Expressions
rpsl is afaik a chomsky type 1 or 2 grammar and regular expressions are a chomsky type 3 grammar. Not wanting to rain on anyone's parade, you cannot parse a type 1 or type 2 chomsky grammar with a type 3 grammar. It's a bit like this, except worse:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtm...
A partial solution is obviously possible, however it will only ever be an approximation. Nick
participants (2)
-
Nick Hilliard
-
Stavros Konstantaras