[PATCH] Repair automatch_store_transactions

David Reiser dbreiser at earthlink.net
Sun Feb 8 19:40:00 EST 2009


On Feb 8, 2009, at 6:07 PM, Simon Ruggier wrote:

> On Sun, Feb 8, 2009 at 3:51 PM, David Reiser  
> <dbreiser at earthlink.net> wrote:
>> This patch has some nasty side effects. It tries to change assigned  
>> target
>> accounts on the fly within a single import. If I have several ATM  
>> deposits
>> in one import, most of them will come up assigned to the last  
>> assignment
>> from a prior import. When I change the first one to its proper  
>> account, your
>> patch results in the rest of the deposits also being assigned to  
>> the account
>> just entered. That's not so good, but not the end of the world (and
>> definitely not what gnucash has done in the past).
>
> So basically, before, you'd have to be doing a new import for the
> automatching to happen,

That isn't what happens on my system, but my data file has been  
accumulating for so long, I'm not sure what was already there when the  
switch to the GTKTreeModel happened.

> where now it happens each time the user
> updates a transaction split destination. The current implementation of
> the function makes no sense, since it tries to automatch the account
> that was just set by the user, which should end up doing nothing,
> since it was manually set. The reason for my choice of words is that
> the automatch_store_transactions function is named and commented in a
> way that implies that it should iterate over all the rows, and in a
> previous commit, this seems to be what it did. Revision 14382 does
> this to it:
>
> /* Iterate through the rows of the clist and try to automatch each  
> of them */
> -static void automatch_clist_transactions(GNCImportMainMatcher *info,
> GtkCList *clist, int starting_row)
> +static void
> +automatch_store_transactions (GNCImportMainMatcher *info,
> +			      GtkTreeModel *model,
> +			      GtkTreeIter *iter,
> +			      GNCImportTransInfo *trans_info)
> {
> -  int row;
> -  GNCImportTransInfo *trans_info;
> -
> -  gtk_clist_freeze(clist);	/* prevent a lot of visual updates at  
> once */
> -  for(row = starting_row+1; row < clist->rows; row++)
> -    {
> -      trans_info = gtk_clist_get_row_data(clist, row);
> -
>       /* returns TRUE if we changed this row, so update it */
>       if(gnc_import_TransInfo_refresh_destacc(trans_info, NULL))
> 	{
> -	  refresh_clist_row(info, row, trans_info);
> +	  refresh_model_row(info, model, iter, trans_info);
> 	}
> -    }
> -  gtk_clist_thaw(clist);	/* let all the updates be shown */
> }
>
> We can clearly see that  a change that was supposed to port to a newer
> API actually inadvertently leaves out the for loop and only does the
> current row.
>
>> The problem comes when I
>> try to correct the account matched to the second deposit. Sometimes  
>> (most
>> but not all) the matcher goes back and reassigns the first deposit  
>> to the
>> wrong account.
>>
>> I don't think the problem is locked in a circular change cycle, but  
>> it sure
>> makes the matching process a pain.
>
> Are you sure that the first deposit gets reassigned in the case where
> it is a manual deposit? The matching code clearly attempts to avoid
> matching transactions that were manually set (in import-backend.c at
> gnc_import_TransInfo_refresh_destacc).

Yes, I'm sure. I watched it happen in an import.

>
>
> There definitely is a problem here with automatched transactions that
> the user thinks are correct. When the user manually sets a split, a
> formerly correct automatch may get changed because gnucash no longer
> thinks it is the best match. This is less of a problem if the
> automatching code is picking good matches, but the behaviour could be
> changed to support a less error-prone workflow. The import dialog
> shouldn't show automatches with a green background (i.e. they could be
> in between yellow and green, to indicate that they're not imbalanced,
> but not set in stone either). This would encourage the user to verify
> them and set them manually. To make it usable, there should be a
> clickable element to allow the user to quickly sign off on an
> automatch, turning it into a manual selection. A new control could be
> added, or perhaps the add checkbox could have tristate behaviour
> (cycling between skip, auto, and manual).
>
>
> Another issue is the weird behaviour of the automatching code. It
> maintains counts of associations between tokens and accounts (at least
> in the bayes case) in each account kvp tree that are incremented
> often, but never decremented. If you open the import dialog, select
> something, and then hit cancel, future automatch behaviour is changed,
> because the selection causes a bunch of
> /IMAP_FRAME_BAYES/<token>/<destaccountname> KvpValues of type gint64
> to get incremented in the import account's KvpFrame, without
> decrementing them when the import is cancelled. The counts are also
> incremented multiple times if you manually select the same account
> multiple times. This seems to suggest that whenever a split changes,
> the token counts should be decremented for the account that is no
> longer the destination of the split. This behaviour is no more
> expensive, but it results in each account's /IMAP_FRAME* KvpFrame
> being synchronized with the transaction data in that account. Another
> possibility would be to generate these trees of counts while
> automatching (obviously not once per row automatch, but enough that
> the counts don't persist as the user manually sets a split), instead
> of caching them all the time, but that wouldn't scale well, so it may
> not be a good idea.

I certainly would like to see better matching during imports. The code  
is way over my head, though.

--
David Reiser
dbreiser at earthlink.net






More information about the gnucash-devel mailing list