Generic transaction import
Benoit Grégoire
bock@step.polymtl.ca
Thu, 6 Jun 2002 18:24:15 -0400
As I discussed with several people on #gnucash, this is the draft of my design
document for a generic import. Cemments will be appreciated from everyone,
especially in 3 areas:
-Specific needs for HBCI and QIF (I have OFX covered ;))
-Improvements for transaction matching, or pathological real world cases I
might have overlooked.
-Describing how we want to complete an unbalanced split. Might be as simple
as describing the current gnucash functionnality.
Time is becoming a factor here, since both OFX (me for now) and HBCI
(Christian et al) are going to need this real soon now, and I would be a pity
to duplicate efforts.
----------------------------------------------------------------------------------------------
This a draft of a design proposal for a generic import architecture. The
objective is to maximize code sharing between the QIF, HBCI and OFX modules.
The most important area of potential code sharing is the account and
transaction matching code. This code has 3 distinct roles:
-Find the source account.
-Find and eliminate transaction downloaded twice in the source account.
-Find the destination account(s), and find the matching tansactions(s) if
it/they exist(s).
The Online System specific, in addition to any steps necessary for obtaining
and processing the data, should be responsible for:
-Identifying and if necessary creating the source account: The account is
identified using the account number for ofx and HBCI, and the account
description for qif, if available. This identifier is stored in a kvp_string
with key account_online_id. The format of this string is
bankID/branchID/accountID for a bank account. If one of these fields isn't
present, it is ommited, so for a credit card, we would have //cardnumber. If
no account is found with a matching online_id, the user is offered to select
an account, or create a new one. The account_online_id is then stored in the
selected or created account's kvp_frame.
-Creating transaction and adding the source split (associated with the source
account, possibly created above), and filling it with as much information as
it has as it's disposal (much info is available for ofx, little for qif). If
a unique transaction id is available from the online system, is is stored in
the splits kvp_frame, using key transaction_online_id. No transaction
matching is done at this stage.
The generic module receives the Transaction for the online system specific
module using function:
void gnc_import_add_trans(TRansaction *trans);
(We do not use GUID, because in all cases, the transaction was just created)
The functions defines the following enum:
enum gnc_match_probability{
CERTAIN,
PROBABLE,
LIKELY,
POSSIBLE,
UNLIKELY,
IMPOSSIBLE
}
Here is the pseudocode of the gnc_import_add_trans function):
Variables: matches (a list of possible match with likelyhood)
split_to_match = trans's first split.
In split_to_match's parent account; for each split where date >=
split_to_match.date - 2 months:
if transaction_online_id match
add to matches using CERTAIN
if preferences dictate: end search here
if amount match
if memo match and date within 4 days
add to matches using PROBABLE
else if date within 24 hours
add to matches using LIKELY
else if date within 10 days
add to matches using POSSIBLE
else
add to matches using UNLIKELY
Present the list of matches to the user in decreasing order of likelyhood.
User has the option of selecting one of the match or creating a new
transaction.
Add transaction_online_id to selected split
Erase from other CERTAIN splits
if transaction not balanced
TODO: gnc_balance_transaction(Transaction *trans)
commit changes
return
gnc_balance_transaction((Transaction *trans) add's or matches other splits
until the transaction is balanced, using whatever user interaction and
heuristics are appropriate. Since I haven't really used gnucash's current
transaction matching before, I would like someone else to contribute the
description of the process to match unbalanced transactions.
Remarks and things to remember:
-Credit card transactions cans sometimes appear over a month after the
purchase (clerk lost the paper, international transaction not always fast,
etc.)
-void gnc_import_add_trans(Transaction *trans) should return as soon as
possible (BEFORE user interaction) so that for systems that maintain a
connection (such as HBCI) the user won't run into timeouts. For example,
gnc_import_add_trans could check if it's main dialog is open, and open it if
it isn't, add to the list and return immediately. The dialog is closed
automatically once the list is empty.
-We may want to implement the function in such a way that it won't match any
transaction that have been added as part of the current import process (flag
them volatile or something). This will solve the problems of multiple
interac withdrawals in the same day for QIF, (possibly HBCI too?).
-The transaction passed to gnc_import_add_trans will have only one split for
OFX and HBCI, but 1 or more for QIF.
----------------------------------------------------------------------------------------
/me need feedback...
--
Benoit Grégoire
LibOFX http://step.polymtl.ca/~bock/libofx/