[GNC-dev] Normalizing live data, a suggestion for discussion
Geert Janssens
geert.gnucash at kobaltwit.be
Sat Feb 2 10:24:11 EST 2019
Op zaterdag 2 februari 2019 10:19:02 CET schreef Wm via gnucash-devel:
> On 02/02/2019 00:16, David Cousens wrote:
> > As well as the account names you might also want to munge data in the
> > description/memo fields. This can contain identifying information for
> > customers/vendors.
>
> How about we just zap the stuff in description/memo fields by default?
> They're not mathematically significant and rarely cause double entry
> problems unless someone introduces unusual UI stuff in which case they
> should be able to provide an example.
>
> > Also possible any data relating to the owner of the file
> > which is stored in the file/database.
>
> Does your file/database have an obvious owner? Mine doesn't apart from
> the name of the file which is the first and obvious thing to change
> before you send it off for someone else to look at.
>
> If you mean bits of text in reports they wouldn't be included in an
> SQLite file.
>
> If you mean bits of text in outbound documents I think we've already
> zapped them.
>
> Have I missed your point?
>
Yes, if you use business features, you may have entered business identifying
data in File->Properties. It think that's what David is referring to.
Similarly there may be customer and vendor data (names addresses) in the book
that should equally be obfuscated. Just random data is fine.
Continuing on that vein, if you have bills and invoices, aside from
randomizing the transaction's split amounts and values you'll also have to do
the same for invoice entries. And to make the book useful for detecting
business data bugs this should happen in such a way that invoice tax and
discount amounts remain consistent after multiplying with random numbers *and*
that the invoice totals continue to match the business transactions amounts in
AR/AP accounts.
And to make that one level more complicated, after that the payment
transactions *also* have to continue to match the new randomized invoice
amount (if the invoice was paid in full).
It doesn't end there, payments can be split over multiple invoices, so again
when one randomizes invoice amounts care must be taken to adjust the payments
in proportion to the invoice amount change or fully paid invoices suddenly can
become partially paid or overpaid.
While this is probably all possible I believe the resulting script will be so
complex that it will become a source of bugs in itself which would divert
developer time to debugging and maintaining this script rather than working on
the effectively reported bug for which a sample data file was asked in the
first place...
Up until a book with only transactions, no business data at all it sounded
like a useful tool.
Oh and we haven't mentioned SXs and budgets yet...
As for Colin's question: on Windows and MacOS sqlite is supported out of the
box. On linux it may require the additional installation of a libdbi driver.
Most distros I know have packages for this driver but they may not be
installed by default.
Geert
More information about the gnucash-devel
mailing list