CSV Import Development Summary

Benjamin Sperisen lasindi at gmail.com
Sat Jun 9 09:24:01 EDT 2007

On 6/4/07, Josh Sled <jsled at asynchronous.org> wrote:
> My go-to modeling here would have been something more like...
>     struct GncDelimParsingResult
>     {
>       const char *file_name;
>       GList /* <GncDelimLine*> */ *lines;
>       GList /* <GncDelimColumnType> */ *column_disposition;
>     }
>     enum GncDelimColumnType { date, description, amount };
>     struct GncDelimLine
>     {
>       int line_number;
>       GList /* <GncDelimCell*> */ *cells;
>     }
>     struct GncDelimCell
>     {
>       int cell_num;
>       char *content;
>     }

I think this could also work. The main reason I said I would use an
array of char*s is that that's what STF's function hands me. I'm not
sure if transforming that array into linked lists is necessary, since
I don't think there will be any insertion and deletion of cells. I do
think that including the "disposition" is a good idea. This would be
useful if the file has a heading at the top of a column like "Date"
which could be used to make a guess at the disposition. This would
probably require adding a "g_boolean HasHeaders" parameter to

> Moreover, I'm hopeful to see some separation between all the ui-independent
> application logic you (just) described and the particular widgets and
> controller used to present that application flow to the user.  It'd be nice
> if there was two clear layers: the gnucash-stf-parser-model (probably using
> or derived from utility structures like mentioned above) and the
> gnucash-stf-parser-ui.  They will probably be pretty intimately related, but
> we win (code clarity, bug solubility, testability, &c.) if they're explicitly
> separate.

Yes, I think that's a good idea.

> It'd be nice if all of the failed transaction creations were collected
> together and presented to the user at once ... to prevent the dreaded
> "import, error, back, fix, error again, back again, fix again, ..." cycle.

This is also a very good idea.

On 6/8/07, Christian Stimming <stimming at tuhh.de> wrote:
> I'd love to see your code committed to SVN;
> once the svn access for you is up and running, I'd encourage you to commit
> your code into the gnucash svn early and often. You will probably get a
> branch of your own for now, but this gives you the benefit that you really
> don't have to watch out for any other pitfalls - just commit your code :-).

Okay! I'd definitely prefer to work on a branch since I don't want to
screw up the rest of the tree.

> I wonder how some of the more weird error conditions need to be checked for.
> Examples: A completely binary file (no table/CSV structure at all), or one
> where the user has no read permission.  Some instructive error messages might
> be helpful for those cases (as opposed to simply showing an empty GtkTreeView
> widget). Of course this can be deferred to later in the summer.

That's a good point; I overlooked problems with simply opening the
file in the first place - I'd have to add a "GError** error" parameter
to gnc_csv_parse for that, and I could do that without messing with
STF code because I have to do the actual file opening before calling

I took a bit of a closer look at the STF parsing function I rely on.
It appears that there is one way for it to fail (besides calling it
with meaningless parameters like NULL data): it tests for valid UTF-8
data and returns NULL if it doesn't. However, besides this precaution,
it appears to take any file, even without any CSV structure and
produce some kind table, meaningless or not. As a test, I tried to
import a PNG image as a text file into Gnumeric, and it came out with
a gibberish preview, but it still produced something. So, unless I
start seriously customizing the STF code, I'm doubtful that I can
protect the user against binary files, besides a preview showing
meaningless data.

> One more feature that comes to mind is how one selection of parsing options
> could be cached or saved for later re-use. Just imagine people who get one
> particular kind of CSV files regularly from other sources (their bank or
> other finance software).

Yes, I think this would be a useful feature. The main challenge for
this would be storing the options to a configuration file and then
loading it into the GUI, but as you said, this can be done once the
core functionality is in place.

Thank you both for these great suggestions!


More information about the gnucash-devel mailing list