GDA: string lengths (was Re: GDA save missing records)
Phil Longstaff
plongstaff at rogers.com
Mon Feb 18 13:19:04 EST 2008
Graham Leggett wrote:
> Keith Bellairs wrote:
>
>> Speaking as a user and not someone busting his butt on this, I hate
>> the idea of "unlimited" everything when we go to a DB. Most of our
>> databases have a mechanism (BLOB/CLOB) to store really big things,
>> usually at the cost of indexing or searching (other than with special
>> hacks -- Oracle Text, for example).
>>
>> gnc is not, and should not be, a doc mgmt system. I want fast, fast
>> retrieval and summarization. Having a place to store a reference to a
>> doc is a great idea; plugging up the data with the docs, not so much.
>>
>> Of course, it is unforgiveable to just drop rows. Even silently
>> truncating data is pretty dubious. Don't know Postgres and Mysql;
>> can't we throw an exception so we have a chance to do the right thing
>> (what the user needs)?
>>
>> I'd ask the developers to pick some reasonable size for each column.
>> Then publish the schema. Granted this is a big change from the
>> unlimited everything, but it seems necessary. If I don't like your
>> column size, I should be able to ALTER TABLE and set my own
>> favorites, so please do not hard-code the column sizes into the code.
>
> The problem with this is that it introduces inconsistency into the
> code. The XML backend has no concept of line lengths, and is so
> "unlimited". The problem was originally found when an attempt was made
> to import this "unlimited" data into a "limited" system, such as the
> current DB system.
>
> Suddenly we have introduced the possibility that perfectly valid data
> in one backend is no longer valid in another. Add to that a user
> ability to change the line lengths and suddenly all bets are off.
>
> Fixed length string widths are an optimisation that helps if you are
> manipulating fixed length strings, but if you aren't - such as with a
> description in a register - the fixed length serves no purpose at all.
>
> As someone who spends a lot of time tracking down nasty problems in
> software, I can tell you that this is exactly one of those seemingly
> harmless issues that can cause some very difficult to find, and
> therefore very expensive bugs in systems. In this case, it was only
> found because mysql and postgresql have different behaviour when
> string lengths are too long, and that was found by a very lucky accident.
Well, as I originally said, I can use a TEXT type which allows up to 64K
byte strings. Although not unlimited, I assume this is long enough for
everyone's purposes. MySQL stores them as 2byte length + chars. I will
need to check that that libgda has some good method of creating them.
Of course, I could also just try varchar(2048) instead of varchar(50),
which should also be sufficient. I assume that the db tries to optimize
space so that storing a 1000 char string and storing a 1 char string in
a varchar(2048) don't use the same amount of space.
Phil
More information about the gnucash-devel
mailing list