need help understanding import options

rs rs123 at rochester.rr.com
Sun Jun 7 19:03:42 EDT 2009


thanks Derek.   definitely helpful.    I see there are occasional .txt  
files in the src directory which may help a bit as well.

As for sending in a patch -- you wouldn't find that useful, as I  
haven't written C in 22 years.   If, on the other hand, there's a  
place to discuss _possible_ requirements, rather than coding details,  
I could possibly take a high-level shot at it.





On Jun 5, 2009, at 8:45 AM, Derek Atkins wrote:

> Hi,
>
> rs <rs123 at rochester.rr.com> writes:
>
>> 1.  There are two potential match issues:  a) matching new
>> transactions in the downloaded file to already existing transactions,
>> and b) deciding what income/expense category a new transaction  
>> belongs
>> to.   Which of these matching problems (or both?) is the QIF or
>> generic (bayesian (or not)) matcher concerned with?  Which, or both,
>> are the bayesian config options concerned with?
>
> The bayesian matching is only concerned about account mapping.
>
> The duplicate matching is done later in the process and uses
> the account, amount, and date (and the FITID in OFX).
>
> In QIF there is a mapping from Payee/Memo to GnuCash account for
> transactions that don't have a Category or QIF Account attached to
> them.  The importer remembers your mappings and re-applies them
> on future imports, however the matching to previous imports is done
> on the FULL TEXT of the Payee or Memo.
>
> OFX w/o Bayesian matching does effectively the same thing.  However
> if you turn ON Bayesian matching then instead of using the full string
> the importer breaks it up into different tokens and performs matching
> based on the filtering of the tokens.  When you manually map a
> tranasction to an account gnucash increases the values of the mappings
> for each token to that account.  On future imports it performs an
> algorithm that computes the likliest mapping based on the various  
> values
> for each token and if the match % is high enough suggests the same
> target account.  This works much better in cases where you have
> consistent partial payee info with a variable tag, e.g.:
>
>  WHOLEFOODS #1523 20090523
>
>> 2.  Is there any consensus as to which matcher is best:  QIF, QFX
>> without bayesian, or QFX with bayesian?
>
> Depends what you're trying to do.  I'd ignore OFX w/o Bayesian  
> matching.
> In fact I think in 2.4.x we should turn on Bayesian matching by  
> default.
>
>> 3.  What are these matchers keying on?
>
> Depends what you're talking about, but generally the payee and/or  
> memo.
>
>>   a) Do they ignore the transaction ID numbers for assigning income/
>> expense categories (as they should )?  For instance, when i look at
>> the QIF files, for example, i see (useful) text as well as long
>> transaction ID numbers that are different for every transaction and
>> thus useless.
>
> Yes.
>
>>   b) When I download a credit card QIF, it has categories (e.g.,
>> "restaurant") imbedded in the file.  Are these categories part of the
>> match?  the QFX version for my credit card does not include this
>> category field, unfortunately.
>
> The QIF Importer lets you map the QIF Category to a GnuCash account.
> Then later it uses that mapping as part of the duplicate checking.
>
>> 5.  Has there been any serious consideration to allowing the user to
>> specify the rules of income/expense category assignment [e.g.  
>> anything
>> called "restaurant" should match to the "dining out" category]?  I
>> would certainly prefer that, in most cases, to hoping the software  
>> can
>> figure it out -- and confine the automatic matching (of whatever  
>> type)
>> to transactions without rules and to the problem of identifying
>> existing transactions.
>
> Sure, send in a patch.
>
>> Any help would be appreciated.  If someone makes the effort to answer
>> some of these questions well, it would really help if the answers
>> found their way to the documentation.  NOTE:   if the best
>> documentation is the code -->  that's a problem.  But if the code
>> documentation is readable (and doesn't require deep knowledge of C to
>> read it), that might be an adequate answer for now.   where would I
>> find the appropriate code for the matchers?
>
> src/import-export
>
>> thanks much.
>>
>> - Rick
>
>> Please remember to CC this list on all your replies.
>> You can do this by using Reply-To-List or Reply-All.
>
> -derek
>
> -- 
>       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>       Member, MIT Student Information Processing Board  (SIPB)
>       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>       warlord at MIT.EDU                        PGP key available



More information about the gnucash-user mailing list