CSV Import Development Summary

Benjamin Sperisen lasindi at gmail.com
Mon Jun 4 14:54:49 EDT 2007

Below is the summary of what I've done so far and plan to do on the
CSV importer that you asked for. Sorry that it ended up being a bit
long; I guess there was just a lot of stuff to cram in. :-)

The Gnumeric code that handles the actual parsing of the file, the STF
library, will be placed in gnucash/lib/stf.The CSV importer will be
loaded as an optional module, and a menu entry, "File -> Import ->
Import CSV/Fixed-Width ...", will be added. Clicking on this entry
will call the function
    void gnc_file_csv_import(void).
This function will first prompt the user to select a file. The
filename will be passed to the function
    GPtrArray* gnc_csv_parse(char* filename, StfParseOptions_t*).
StfParseOptions_t is a data type from the STF library that contains
settings for parsing files, e.g., which delimiter to use. Initially, a
default set of options will be passed to gnc_csv_parse, which then
makes the appropriate call to the STF library. gnc_csv_parse will
return a two-dimensional array of char*s. So far as I can tell, there
isn't a way for this parsing to "fail," in the sense that even if the
array isn't what the user wants there will be some way for STF to
parse it (no matter how badly it comes out).

After receiving the data from gnc_csv_parse, a dialog will be shown.
This dialog will contain widgets to configure the parsing of the file,
e.g. radio buttons for fixed-width versus CSV. It will also contain a
widget (which I'll call the "preview widget") that shows the parsed
data in columns. If the parsing isn't correct, the user can change one
of the configuration widgets appropriately. This will trigger calls to
STF to edit the StfParseOptions_t struct, which will be passed to
gnc_csv_parse again, and the preview widget will be updated with the
new data that gnc_csv_parse returns. The user also needs to be able to
specify the type of each column (which columns contain the date,
description, amount, etc.) in the file. A row of combo boxes would
appear above the columns, one for each column. The user could select a
type in each, which will be stored in an array.

The preview widget was the cause of some disagreement with my
application. I think it should eventually be a custom widget, but the
very valid point was raised that a GtkTreeView would work. Here is
what the preview widget needs to do:
(1) Display the parsed data in columns.
(2) If the file is fixed-width, the user needs to be able to split and
merge columns at specific locations in the file.
A tree view can do (1) virtually as well as any custom widget could,
but using it for (2) seems a bit more difficult for the user. It can
be done, as Gnumeric's importer shows, but, IMHO, it can be done
better by a custom widget as OpenOffice.org's feature has done.
Gnumeric's importer requires to kind of guess which columns you are
splitting when you double-click, whereas OOo draws a convenient
vertical line when you mouse over and even shows you column numbers.
It's also slightly easier to merge columns in OOo. The disadvantage of
writing a custom widget is, of course, it would take a lot more effort
and almost certainly be more bug-prone. In the interest of having
something that works sooner, I will use a tree view first. Moreover,
I'm pretty sure far more people have CSV files than fixed-width files,
so for most people this decision will have little impact (besides
maybe getting the feature sooner). If I finish with a lot of time left
before the end of summer of code, I think it might be worth spending
that time working on a custom widget.

When the user has finished configuring the parsing of the file and
clicks OK, the user is then prompted to select an account. At this
point, each row of the parsed data will be passed to the function
Transaction* gnc_csv_row_trans(GPtrArray* row, ColType* type_array,
GError **err).
(ColType will be an enumeration containing all the possible column
types.) If gnc_csv_row_trans fails, it returns NULL, and the user will
be warned with a dialog that the transactions couldn't be understood
and why (the user can then either ignore this or go back to the
parsing dialog). Each of the (successful) transactions is then passed
to the generic import GUI using gnc_gen_trans_list_add_trans. The user
can then finally select the destination accounts for each transaction
and finally commit them to the account.

So far, I have completed the extraction of STF from Gnumeric, created
a CSV module that adds a menu entry, and gotten the importer to import
a test CSV file with hard-coded meanings for each of the columns. It
appears to work fine (with the exception of the possibility that
unbalanced transactions getting through, discussed in this thread:
I'm now starting work on the parsing dialog.

Let me know if you have any comments or questions!


More information about the gnucash-devel mailing list