Image enabling gnucash?

teri teri@superlink.net
Fri, 23 Feb 101 16:09:48 -0500 (EST)


> 
> ...
> Also, there isn't any open-source OCR software available AFAIK, and it 
> would be most preferable (both for technical and philosophical 
> reasons) to base this system completely around open source, rather 
> than have a key component proprietary. 

Well, the available OCR software might not be up to par,  but there is
something out there:

    http://www.cfar.umd.edu/~kia/ocr-faq.html
    http://www.geocities.com/Athens/Olympus/8087/ocr/ocr_resource.html
    http://http.cs.berkeley.edu/~fateman/kathey/ocrchie.html
    http://www.socr.org/
    http://documents.cfar.umd.edu/ocr/      Public Domain OCR Software

Note the title of the last one...

Note that we're not talking here of understanding hand-written text that
is totally unstructured.  Receipts are usually well structured and only
some relevant parts (that can be identified through manual training) need
to be fed to the OCR software.  Something else related to this: how to
identify receipts from the same establishment/chain without OCR:

    http://www.kudla.org/raindog/perl/findimagedupes-0.1.3.tar.gz

A template could be created the first time a particular receipt is
entered that contains only the non-changeable parts such as the name
of the business, address, etc... and then subsequent receipts are
compared as they are scanned.  In this way, without OCR, some parts of
the transaction can be deduced.  When you add the OCR to the actual
lines with money amounts in them...

I realize that all this is pie in the sky/castles in Spain/totally
unrealistic vaporware, but hey! just throwing some ideas out.

> If you *are* looking for a challenging project to make your name, 
> open-source OCR is probably a good one :) 

I wish I had unlimited time...

A.