[GNC-dev] Normalizing live data, a suggestion for discussion

Wm wm_o_o_o at yahoo.co.uk
Sat Feb 2 16:36:18 EST 2019

On 02/02/2019 15:24, Geert Janssens wrote:

> Yes, if you use business features, you may have entered business identifying
> data in File->Properties. It think that's what David is referring to.

I agree, the third party should not be identified.

> Similarly there may be customer and vendor data (names addresses) in the book
> that should equally be obfuscated. Just random data is fine.


Geert, at the moment I am putting guid in place of random, do you think 
that is a wrong way to approach this?

Actually, the nearer we get to complete random the less useful the file 
becomes.  Actual random data is harder than most people think and pretty 
much defeats the purpose if you think about it.

> Continuing on that vein, if you have bills and invoices, aside from
> randomizing the transaction's split amounts and values you'll also have to do
> the same for invoice entries.

I don't think that is true in most situations and even if what you say 
is true, I don't see it as a good argument against *attempting* a 
normalized book for most people.

> And to make the book useful for detecting
> business data bugs this should happen in such a way that invoice tax and
> discount amounts remain consistent after multiplying with random numbers *and*
> that the invoice totals continue to match the business transactions amounts in
> AR/AP accounts.

There will be situations that involve the person doing the triage 
needing to see actual transactions, I have already commented on that.

> And to make that one level more complicated, after that the payment
> transactions *also* have to continue to match the new randomized invoice
> amount (if the invoice was paid in full).

Ummmm, I don't think that is true.  If the munged numbers match (and 
they will, that is what the script will do) the transaction stream will 
be OK.

It is possible I have missed your point, Geert, but I think it is 
looking like I understand the contents of the gnc files better than you :(

> It doesn't end there, payments can be split over multiple invoices, so again
> when one randomizes invoice amounts care must be taken to adjust the payments
> in proportion to the invoice amount change or fully paid invoices suddenly can
> become partially paid or overpaid.

Not true.

Geert, I don't want to say this but I believe you are actually wrong, 
for once.

> While this is probably all possible I believe the resulting script will be so
> complex that it will become a source of bugs in itself which would divert
> developer time to debugging and maintaining this script rather than working on
> the effectively reported bug for which a sample data file was asked in the
> first place...

Hmmmm, I accept your point and disagree.

> Up until a book with only transactions, no business data at all it sounded
> like a useful tool.

Be a brave man, Geert, most people don't use the business functions :)

> Oh and we haven't mentioned SXs and budgets yet...

Unless they are material to the file being investigated I suggest we 
just delete all SXs and budget stuff.

> As for Colin's question: on Windows and MacOS sqlite is supported out of the
> box. On linux it may require the additional installation of a libdbi driver.
> Most distros I know have packages for this driver but they may not be
> installed by default.

It would be an odd distro that excluded SQLite, it is a requisite for a 
lot of other stuff like browsers.  Thinking aloud: maybe a server only 
install might not have it or someone stupid enough to put their data on 
Amazon might not have it available.  The question then becomes, why was 
the person so stupid?

As far as I am concerned this conversation is ongoing, if only because 
Geert says he still needs a file from me to replicate a basic problem 
that I don't think needs any data from me at all.


More information about the gnucash-devel mailing list