[PATCH] Repair automatch_store_transactions

Simon Ruggier simon80 at gmail.com
Sun Feb 8 18:07:46 EST 2009


On Sun, Feb 8, 2009 at 3:51 PM, David Reiser <dbreiser at earthlink.net> wrote:
> This patch has some nasty side effects. It tries to change assigned target
> accounts on the fly within a single import. If I have several ATM deposits
> in one import, most of them will come up assigned to the last assignment
> from a prior import. When I change the first one to its proper account, your
> patch results in the rest of the deposits also being assigned to the account
> just entered. That's not so good, but not the end of the world (and
> definitely not what gnucash has done in the past).

So basically, before, you'd have to be doing a new import for the
automatching to happen, where now it happens each time the user
updates a transaction split destination. The current implementation of
the function makes no sense, since it tries to automatch the account
that was just set by the user, which should end up doing nothing,
since it was manually set. The reason for my choice of words is that
the automatch_store_transactions function is named and commented in a
way that implies that it should iterate over all the rows, and in a
previous commit, this seems to be what it did. Revision 14382 does
this to it:

 /* Iterate through the rows of the clist and try to automatch each of them */
-static void automatch_clist_transactions(GNCImportMainMatcher *info,
GtkCList *clist, int starting_row)
+static void
+automatch_store_transactions (GNCImportMainMatcher *info,
+			      GtkTreeModel *model,
+			      GtkTreeIter *iter,
+			      GNCImportTransInfo *trans_info)
 {
-  int row;
-  GNCImportTransInfo *trans_info;
-
-  gtk_clist_freeze(clist);	/* prevent a lot of visual updates at once */
-  for(row = starting_row+1; row < clist->rows; row++)
-    {
-      trans_info = gtk_clist_get_row_data(clist, row);
-
       /* returns TRUE if we changed this row, so update it */
       if(gnc_import_TransInfo_refresh_destacc(trans_info, NULL))
 	{
-	  refresh_clist_row(info, row, trans_info);
+	  refresh_model_row(info, model, iter, trans_info);
 	}
-    }
-  gtk_clist_thaw(clist);	/* let all the updates be shown */
 }

We can clearly see that  a change that was supposed to port to a newer
API actually inadvertently leaves out the for loop and only does the
current row.

> The problem comes when I
> try to correct the account matched to the second deposit. Sometimes (most
> but not all) the matcher goes back and reassigns the first deposit to the
> wrong account.
>
> I don't think the problem is locked in a circular change cycle, but it sure
> makes the matching process a pain.

Are you sure that the first deposit gets reassigned in the case where
it is a manual deposit? The matching code clearly attempts to avoid
matching transactions that were manually set (in import-backend.c at
gnc_import_TransInfo_refresh_destacc).

There definitely is a problem here with automatched transactions that
the user thinks are correct. When the user manually sets a split, a
formerly correct automatch may get changed because gnucash no longer
thinks it is the best match. This is less of a problem if the
automatching code is picking good matches, but the behaviour could be
changed to support a less error-prone workflow. The import dialog
shouldn't show automatches with a green background (i.e. they could be
in between yellow and green, to indicate that they're not imbalanced,
but not set in stone either). This would encourage the user to verify
them and set them manually. To make it usable, there should be a
clickable element to allow the user to quickly sign off on an
automatch, turning it into a manual selection. A new control could be
added, or perhaps the add checkbox could have tristate behaviour
(cycling between skip, auto, and manual).

Another issue is the weird behaviour of the automatching code. It
maintains counts of associations between tokens and accounts (at least
in the bayes case) in each account kvp tree that are incremented
often, but never decremented. If you open the import dialog, select
something, and then hit cancel, future automatch behaviour is changed,
because the selection causes a bunch of
/IMAP_FRAME_BAYES/<token>/<destaccountname> KvpValues of type gint64
to get incremented in the import account's KvpFrame, without
decrementing them when the import is cancelled. The counts are also
incremented multiple times if you manually select the same account
multiple times. This seems to suggest that whenever a split changes,
the token counts should be decremented for the account that is no
longer the destination of the split. This behaviour is no more
expensive, but it results in each account's /IMAP_FRAME* KvpFrame
being synchronized with the transaction data in that account. Another
possibility would be to generate these trees of counts while
automatching (obviously not once per row automatch, but enough that
the counts don't persist as the user manually sets a split), instead
of caching them all the time, but that wouldn't scale well, so it may
not be a good idea.


More information about the gnucash-devel mailing list