boost spirit for antlr for qif parse?

cbbrowne@acm.org cbbrowne@acm.org
Sun, 20 Oct 2002 18:46:45 -0400


On Sun, 20 Oct 2002 14:31:01 MDT, the world broke into rejoicing as
Larry Evans <jcampbell3@prodigy.net>  said:
> >Larry Evans wrote:
> [snip]
> >Both of those parsers are written in totally different languages from those 
> >used for GnuCash.  It would be equally appropriate to propose parsers 
> >requiring reimplementing GnuCash in Objective CAML, Java, or Ada.

> c++ is not as inappropriate as Java or Ada. After all, the current
> qif-import files use c code and c++ can certainly do the same with hardly
> any effort.

It /is/ just as inappropriate, as none of these languages (C++, Java,
Ada) are languages used for implementing GnuCash.  There are already
quite enough problems getting GnuCash to build without introducing
another language to the mix.

C++ is no more an implementation language for GnuCash than Java or Ada.
Ada may be a frivolous comparison, but Java certainly isn't, as Java
has the same sort of "C-like" syntax of C++.  But both are demonstrably
not C, and it is significantly challenging to link C++ or Java code to
applications written in C.

> >In any case, the challenges in parsing QIF files would not be addressed by 
> >looking into a full-scale LALR-1 style parser.  The "complexities" of QIF 
> >could more than likely be fully plumbed with nothing more sophisticated than
 
> >FLEX.

> Can FLEX handle the type of ambiguity handled by parse-number/format?
> Wouldn't some back-tracking be required in case decimal radix was tried
> and found not to work? The "dynamic parsing" shown at the post:
> http://aspn.activestate.com/ASPN/Mail/Message/1396631
> reminded me of the parse-number/format. After all, you could parse
> a number with either "comma" or "decimal" radix and decide whether
> it was ambiguous similar to the way it's currently done in the calls to
> check-and-parse-field in qif-parse-fields in qif-file.scm.

The parsing can frankly be done on a line-by-line basis, with no need
to worry about backtracking.

If it takes 72 iterations of reading and re-reading through a line to
get the results, that's not fundamentally any problem.  (Assuming
performance doesn't suffer too badly :-).)

The big point is that with the simple format of QIF, recursive
descent is completely unnecessary.  All that is needed is a sort of
"tokenization," which is what FLEX is about.
--
(reverse (concatenate 'string "ac.notelrac.teneerf@" "454aa"))
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
Rules of the  Evil Overlord #43. "I will maintain  a healthy amount of
skepticism when  I capture the beautiful  rebel and she  claims she is
attracted  to my  power  and good  looks  and will  gladly betray  her
companions if I just let her in on my plans."
<http://www.eviloverlord.com/>