Receipt scanners, recommendations?

G. Paul Ziemba pz-gnucash-user at ziemba.us
Fri Dec 22 00:22:08 EST 2017


beastmaster126 at hotmail.com (jeffrey black) writes:

> I am buried up to my ears in receipts and would like to go paperless

> budget would have to be a maximum of about $400 USD. I need to scan
> everything from 2 inch wide thermal receipts up to to full size
> 8 1/2 X 11  inch receipts.

I use a Fujitsu scansnap. It retails for about US $400. Double-sided, ADF.

Receipts are often printed on tissue-thin paper and I have to use care
to feed them into the scanner. The scanner can handle very long receipts
but the operator has to hold the button for a long time (15 sec?) to
tell the scanner not to abort after 20 inches or so.

I wish I had a fully unix/linux workflow, but alas, have not been able to
assemble a reliable auto-crop, auto-rotate, OCR pipeline that functions
as well as the Windows drivers/applications that ship with the scanner.

My current setup has W2K (!) running in a VirtualBox VM with the scanner
USB forwarded from the unix host into the W2k VM. I installed the
Fujitsu-provided driver and OCR to the VM client. The VM is configured
to make a host directory visible to the VM client as a windows "share".

In the VM, scansnap and the bundled OCR software is configured to save
output to the shared folder, i.e., files end up on the unix host. The
bundled OCR software outputs PDF.

The windows driver maintains a continual conversation with the scanner
even when it is not scanning. Excessive latency over USB seems to interrupt
this conversation, resulting in the driver declaring the scanner "gone"
forevermore until VM client reboot. Setting the VM to "realtime" priority
on the host (FreeBSD) seems to have ameliorated the problem.

As for integration with gnucash, I have a clunky homegrown PDF browser tool
that more or less runs pdfgrep on the OCR'ed files to guess at transaction
fields and then presents them to me for editing alongside a rendering of the
image.

I edit the description, category, etc. and indicate "done", and the
tool will generate a QIF transaction record and move the PDF file to a
directory/filename based on the transaction info. After I have done a
bunch of receipts this way, I then import the QIF file into gnucash.

It is somewhat cumbersome, but being able to process the text from OCR
of the receipts enables some automation of the translation into canonical
transactions. Some cleverness is required to match up credit card numbers
on receipts (which might have only the last 4 digits) or to parse the
transaction amount, and so far it needs a lot of manual oversight.

Wouldn't it be nice if there were a standard barcode on receipts that
encoded the relevant information? Unfortunately, we are probably too
far along into digital/online transactions for any innovation in printed
receipts.
-- 
G. Paul Ziemba
FreeBSD unix:
 9:21PM  up 85 days,  8:09, 21 users, load averages: 0.21, 0.32, 0.37


More information about the gnucash-user mailing list