[GNC-dev] Normalizing live data, a suggestion for discussion
wm_o_o_o at yahoo.co.uk
Fri Feb 1 08:36:52 EST 2019
Situation: someone reports a problem with gnc, at triage it becomes
clear some data is going to be required to identify or solve the
problem. Normal question? Can you give us a file.
Problem: for any number of reasons ranging from plain old personal
privacy through to people that live in supposed liberal societies
avoiding tax and people in supposed conservative societies avoiding
persecution, sending live data isn't always appropriate. The USA has
become very weird about this and most of our development people are in
the USA so hopefully they'll understand the politics of privacy, eventually.
Suggestion: we try to make providing a file easier for people.
My suggestion is we ask people to save a *copy* of their data in SQLite
and they then run a script across that copy that munges and obfuscates
1. account names 
2. numbers 
 people following this will probably be aware that gnc doesn't know
about account names much beyond broad classes in spite of providing lots
of names and not accommodating other accounting concepts such as the
fact there is a level one up  My point here is that account names
are important to people but not gnc so why not just randomize them?
Obvious way? copy the actual account name (the guid) to the user visible
one. this is a one way change unless someone has unusual settings on
their SQLite file, if someone has those settings it seems reasonable to
presume they also know how to turn them off and save the file again.
 as long as the transaction stream balances the actual numbers don't
matter (their will be occasions where the numbers are important but
these tend to be number extremes related to commodities rather than
anyone using gnc to do a Mr Putin vs Mr Trump sports bet). In most
cases multiplying any matching numbers by the same semi-random should
produce a good file for examination so long as it is done consistently 
 that is a long argument I am interested in conceptually rather than
personally, it doesn't affect me as a UK person but makes me think
 I don't think a reductive discussion of true vs near true random 
is appropriate, the significant point is the person viewing the data
won't be able to work out the original number without significant effort
and in most cases simply won't be able to work it out at all, we're
talking computing assets I doubt anyone here has access to in order to
get back *and* I believe the gnc people are actually motivated by
solving problems, belief in the project and ordinary stuff like that so
they won't even be looking.
 Random is fun if only because there are so many ways of doing it.
Questions: why SQLite rather than XML? Because if a person runs an
agreed script across their file we can be sure of an outcome. Editing
an XML file informally is scary, it immediately raises questions about
consistency of data. Other SQL formats are not widely used, my proposal
is we go for LCD where we can achieve normalization.
Normalization will have to be balanced: privacy vs contribution to the
I definitely want contribution from other people that work well with
SQL, let's think about this together, people, I have written some
scripts that confuse *my* data and I know that Geert is still waiting
for me to send him a file.
Geert is a good person, I just don't want to show him very personal
stuff in my file.
I have a plan for making showing a file easier, is anyone interested?
This is the *start* of a conversation, I welcome thoughts.
More information about the gnucash-devel