Bug 756469

Geert Janssens geert.gnucash at kobaltwit.be
Tue Jan 12 11:48:58 EST 2016


On Tuesday 12 January 2016 10:11:25 Derek Atkins wrote:
> Hi,
> 
> On Tue, January 12, 2016 9:52 am, Mike Evans wrote:
> > Hi Geert.
> > 
> > I'd appreciate some advice on this bug, since you were that last
> > person to touch the (makes my head hurt) regex.
> > 
> > In file dialog-bi-import-gui.c line 328 The regex for description,
> > and notes is currently:
> > 
> > ((?<desc>[^\",]*)|\"(?<desc>[^\"]*)\")\"
> 
> This regex is basically looking for anything within double-quotes,
> except for another double-quote.
> 
> The issue would be handling something like:
> 
>   "<some text>""<more text>"
> 
> I.e., in order to escape a double-quote you use a double-double-quote.
> This regex does not handle that case.  So it's basically saying "get
> me everything between the double quotes (without acknowledging the
> double-double-quote scenario.
> 
> > I'm not a regex guru but it seems to me that losing the [^\"] part
> > and just using . would accept the problem lines. This wouldn't
> > strip the extra " from the escaped quote, but it would at least be
> > imported and editable later.  I'd have thought that just accepting
> > everything inside the quoted field would be the correct behaviour?
> 
> Unfortunately I don't think that would work.  The construct:
> 
>   [^\"]*
> 
> says to match anything but a double-quote.  More likely we need to
> change it to:
> 
>   (?<desc>([^\"]|\"\")*)
> 
> I think this will tell it to match anything but a double-quote, or a
> double-double-quote, as many times as they occur.
> 
> Can you try this?
> 
> > Mike E
> 
> -derek

Wow Derek, you're fast... I saw your response on the list before I even received Mike's original 
question...

Anyway, I would also go for your suggestion. Simply replacing [^\"] with a "." could cause the 
rexexp to match too much.

Regards,

Geert


More information about the gnucash-devel mailing list