spam,Re: spam,Re: Testing reports

Colin Scott gnucash at double-bars.net
Tue Apr 10 01:53:00 EDT 2012


Yes, I accept that one might argue about whether the output is the HTML or the report.  I would incline to the former, on the grounds that if the HTML doesn't change then neither does the report, and if the HTML does then the report might (or might not).  I guess it depends, as you suggest, on how you create the HTML - and in general I don't disagree with anything you say.  I am sure you similarly understand where I am coming from ...  :-)

If my approach seems simplistic, then it is deliberately so.  I found over many years that it is unhelpful to introduce needless complexities, and that it is always easier to start simple and add the complexities one finds to be necessary, rather than starting complex and then finding that some of the initial complexity is superfluous.  This approach generally gave me more robust results, and got them quicker ...

Colin

-------- Original Message --------

*Subject:* spam,Re: spam,Re: Testing reports
*From:* Colin Law <clanlaw at googlemail.com>
*To:* gnucash at double-bars.net, gnucash-user at gnucash.org
*Date:* Tue, 10 Apr 2012 13:26:49 +0100

On 10 April 2012 04:32, Colin Scott <gnucash at double-bars.net> wrote:
>
>> Can one not find an html parser that will read the html into a DOM
>> tree?  Then one could walk the tree comparing the tags, attributes,
>> and contents with a DOM from a reference page.  That way minor
>> changes in the html that do not change the resulting displayed page
>> will be ignored.
>
> I expect you are right, but I've never worked with HTML at that sort of level, so please pardon my ignorance!
>
> Besides, I was making two points.  The first was that one needs to weigh the cost of what one is doing against the benefits.  If it's easy, then maybe it's worth doing - but maybe not, read on!  :-)
>
> The second was that *ANY* change to the HTML output needs to be flagged and checked - if you have something that works, it shouldn't be changed at all unless there is a very good reason makding the change.  If the output changes, and the changed HTML is determined to be correct, then it should become the reference against which future test output is compared.  IMHO one should *NEVER* automatically accept a change in output without it having been thoroughly scrutinised and checked by a human!

It is a bit of a philosophical point really, but one could argue that
the output from the software is a report rather than a chunk of html.
What matters is what the report looks like, and if the Document Object
Model has not changed then it will look the same whatever the details
of the underlying html.  Using the DOM allows the test to relate to
the end result, with tests such as "there should be a paragraph of a
particular class containing particular text", rather then "there
should be a string of the form <p class =......".
I don't know how the html is generated in the code, if it uses an html
library (or if at some point in the future it used an html library)
then another issue could be that a newer version of the library might,
for example, change the order of attributes in a tag.  A DOM based
test would not care about this but for a text based one it would be a
disaster.

Colin


>
> Colin
>
> -------- Original Message --------
>
> *Subject:* spam,Re: Testing reports
> *From:* "Colin Scott" <gnucash at double-bars.net>
> *To:* warlord at MIT.EDU
> *CC:* jralls at ceridwen.us, yawar.amin at gmail.com, gnucash-user at gnucash.org, gnucash at double-bars.net
> *Date:* Mon, 9 Apr 2012 10:00 +0100 (BST)
>
> Sorry for the delay - I've been away.
>
>> Define "html text"?  For example, would you consider this:
>>
>>   <A HREF="foo">foo</A>
>>
>> and this:
>>
>>   <a href="foo">foo</a>
>
> For the purposes of the exercise in hand, I would consider them to be different.  Working out whether the difference is significant is not something I would want to leave to a robot - not because it isn't perfectly possible to do robotically, but because it simply isn't worth devoting that much time and effort to inventing a robot that will do it reliably.  After all, even such an apparently minor change as a case inversion shouldn't be made without a damn' good reason!
>
>>  What about:
>>   <a href="foo">foo</a>    and    <a href="bar">foo</a>
>> ?
>
> Different under *any* circumstances!  :-)
>
> Colin
>
> -------- Original Message --------
>
> *Subject:* Re: Testing reports
> *From:* Derek Atkins <warlord at MIT.EDU>
> *To:* spam at spambayes.invalid,gnucash at double-bars.net
> *CC:* jralls at ceridwen.us, yawar.amin at gmail.com, gnucash-user at gnucash.org
> *Date:* Mon, 02 Apr 2012 09:19:14 -0400
>
> "Colin Scott" <gnucash at double-bars.net> writes:
>
>>> So basically what you would like to test is that the normalized HTML
>>> output is "the same".  This would require:
>>
>> Actually, no.  Were I doing this I would automate a test for whether
>> the HTML text has changed.  At the point a change is found, it would
>> probably require human intervention to see what has changed and why -
>> *any* change is an error unless there is a legitimate reason for it,
>> and automating the legitimacy test is probably too expensive a task to
>> be worth doing.
>
> Define "html text"?  For example, would you consider this:
>
>  <A HREF="foo">foo</A>
>
> and this:
>
>  <a href="foo">foo</a>
>
> to be the same or different?   What about:
>  <a href="foo">foo</a>    and    <a href="bar">foo</a>
> ?
>
> To me I would normalize in such a way that the first two would be
> considered "the same" but the second two would be considered
> "different".
>
> I'm not sure how I would implement that, tho.  But those would be my
> requirements.  Similarly, I would want these two to result in a "match":
>
>  <p>This is a paragraph</p>
>
> and
>
>  <p>This
>  is
>  a parahraph</p>
>
> -derek
> --
>       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
>       Member, MIT Student Information Processing Board  (SIPB)
>       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
>       warlord at MIT.EDU                        PGP key available
>
>



-- 
gplus.to/clanlaw



More information about the gnucash-user mailing list