GDA: string lengths (was Re: GDA save missing records)

Phil Longstaff plongstaff at rogers.com
Mon Feb 18 13:19:04 EST 2008


Graham Leggett wrote:
> Keith Bellairs wrote:
>
>> Speaking as a user and not someone busting his butt on this, I hate 
>> the idea of "unlimited" everything when we go to a DB. Most of our 
>> databases have a mechanism (BLOB/CLOB) to store really big things, 
>> usually at the cost of indexing or searching (other than with special 
>> hacks -- Oracle Text, for example).
>>
>> gnc is not, and should not be, a doc mgmt system. I want fast, fast 
>> retrieval and summarization. Having a place to store a reference to a 
>> doc is a great idea; plugging up the data with the docs, not so much.
>>
>> Of course, it is unforgiveable to just drop rows. Even silently 
>> truncating data is pretty dubious. Don't know Postgres and Mysql; 
>> can't we throw an exception so we have a chance to do the right thing 
>> (what the user needs)?
>>
>> I'd ask the developers to pick some reasonable size for each column. 
>> Then publish the schema. Granted this is a big change from the 
>> unlimited everything, but it seems necessary. If I don't like your 
>> column size, I should be able to ALTER TABLE and set my own 
>> favorites, so please do not hard-code the column sizes into the code.
>
> The problem with this is that it introduces inconsistency into the 
> code. The XML backend has no concept of line lengths, and is so 
> "unlimited". The problem was originally found when an attempt was made 
> to import this "unlimited" data into a "limited" system, such as the 
> current DB system.
>
> Suddenly we have introduced the possibility that perfectly valid data 
> in one backend is no longer valid in another. Add to that a user 
> ability to change the line lengths and suddenly all bets are off.
>
> Fixed length string widths are an optimisation that helps if you are 
> manipulating fixed length strings, but if you aren't - such as with a 
> description in a register - the fixed length serves no purpose at all.
>
> As someone who spends a lot of time tracking down nasty problems in 
> software, I can tell you that this is exactly one of those seemingly 
> harmless issues that can cause some very difficult to find, and 
> therefore very expensive bugs in systems. In this case, it was only 
> found because mysql and postgresql have different behaviour when 
> string lengths are too long, and that was found by a very lucky accident.

Well, as I originally said, I can use a TEXT type which allows up to 64K 
byte strings.  Although not unlimited, I assume this is long enough for 
everyone's purposes.  MySQL stores them as 2byte length + chars.  I will 
need to check that that libgda has some good method of creating them.  
Of course, I could also just try varchar(2048) instead of varchar(50), 
which should also be sufficient.  I assume that the db tries to optimize 
space so that storing a 1000 char string and storing a 1 char string in 
a varchar(2048) don't use the same amount of space.

Phil



More information about the gnucash-devel mailing list