Limits of gnucash

Wed Jan 11 10:25:48 EST 2017

On 1/10/2017 8:16 PM, AC wrote:
> On 2017-01-10 13:01, Mike or Penny Novack wrote:

>> A GOOD encryption algorithm would result in data that was not
>> compressible (or barely so). The compressibility of data is inversely
>> related to its randomness. Random data is not compressible at all.
>> Encrypted data will be at least pseudo-random.
> Almost.  This would apply to compression using data from within a single
> file to generate the compression table.  However, when using a
> system-wide compression algorithm, encryption does not affect
> compression efficiency that much.  The reason is that the algorithm has
> the ability to pull from many files across the entire system where the
> chances of repeated sequences is greater.  Note that this does not
> necessarily mean that the repeated sequences represent the same clear
> text, that depends on the specific encryption algorithm.
Oh dear. Look, this is really for the computer science folks among us, 
and I am not really one of those since I never went back for that 
degree. Doesn't mean that I didn't keep up. And when self learning C, 
chose as my "case problem" a direct from definition finite 
implementation of LZ2 (2nd Lempel-Zev universal compression algorithm). 
Finite because the theoretical definition has no cut off on length of 
string matched and I was using 64K to keep the size of the dictionary 
finite*. Back in my working days, one of the things I was responsible to 
maintain was a CUSTOM compression routine << by having "knowledge" of 
the data being compressed, could do a FAR better job of compression than 
any general purpose compression routine >>

YES, the larger the amount of data, the more likely multiple matches for 
any given string. But that only helps (making the data compressible) if 
MORE likely than in that amount of random data. That's why I said a GOOD 
encryption algorithm. If an encryption algorithm does not have the 
property that its output (the encrypted data) has a distribution that 
appears random, it would be crackable based on the repeated patterns.

Michael

* But that also made my design problem more interesting, since I had to 
include processes to keep updating the the dictionary to remove the 
least recently used when adding new strings. Infinite LZ2 is provably 
"perfect" while my finite version would take infinitely compressible 
data (say an infinite string of a single character) down to 3/64K of its 
size.