[GNC-dev] Normalizing live data, a suggestion for discussion

Sat Feb 2 10:24:11 EST 2019

Op zaterdag 2 februari 2019 10:19:02 CET schreef Wm via gnucash-devel:
> On 02/02/2019 00:16, David Cousens wrote:
> > As well as the account names you might also want to munge data in the
> > description/memo fields. This can contain identifying information for
> > customers/vendors.
> 
> How about we just zap the stuff in description/memo fields by default?
> They're not mathematically significant and rarely cause double entry
> problems unless someone introduces unusual UI stuff in which case they
> should be able to provide an example.
> 
> > Also possible any data relating to the owner of the file
> > which is stored in the file/database.
> 
> Does your file/database have an obvious owner?  Mine doesn't apart from
> the name of the file which is the first and obvious thing to change
> before you send it off for someone else to look at.
> 
> If you mean bits of text in reports they wouldn't be included in an
> SQLite file.
> 
> If you mean bits of text in outbound documents I think we've already
> zapped them.
> 
> Have I missed your point?
> 

Yes, if you use business features, you may have entered business identifying 
data in File->Properties. It think that's what David is referring to. 
Similarly there may be customer and vendor data (names addresses) in the book 
that should equally be obfuscated. Just random data is fine.

Continuing on that vein, if you have bills and invoices, aside from 
randomizing the transaction's split amounts and values you'll also have to do 
the same for invoice entries. And to make the book useful for detecting 
business data bugs this should happen in such a way that invoice tax and 
discount amounts remain consistent after multiplying with random numbers *and* 
that the invoice totals continue to match the business transactions amounts in 
AR/AP accounts.

And to make that one level more complicated, after that the payment 
transactions *also* have to continue to match the new randomized invoice 
amount (if the invoice was paid in full).

It doesn't end there, payments can be split over multiple invoices, so again 
when one randomizes invoice amounts care must be taken to adjust the payments 
in proportion to the invoice amount change or fully paid invoices suddenly can 
become partially paid or overpaid.

While this is probably all possible I believe the resulting script will be so 
complex that it will become a source of bugs in itself which would divert 
developer time to debugging and maintaining this script rather than working on 
the effectively reported bug for which a sample data file was asked in the 
first place...

Up until a book with only transactions, no business data at all it sounded 
like a useful tool.

Oh and we haven't mentioned SXs and budgets yet...

As for Colin's question: on Windows and MacOS sqlite is supported out of the 
box. On linux it may require the additional installation of a libdbi driver. 
Most distros I know have packages for this driver but they may not be 
installed by default.

Geert