[GNC-dev] Import PDF to GnuCash

Adrien Monteleone adrien.monteleone at lusfiber.net
Mon Aug 6 06:19:41 EDT 2018

A company I work with just started using QB Pro 2018, so I’ll check on this feature, but a web search turned up this forum topic: https://quickbooks.intuit.com/community/Do-more-with-QuickBooks/Pdf-Conversion-to-QBO/td-p/145466

which seems to indicate that it’s nothing new. You need 3rd party software to convert an *electronically generated* pdf to QBO format which QuickBooks can then upload. Scans of paper are not recommended due to the OCR issues, so OCR isn’t their method. They seem to be ’scanning’ the text of the file, but they specify ALL of the text has to be selectable and they recommend only PDF statements generated by the bank. So most likely, these are programmatically generated plain text files that have styling and formatting applied and shipped in a PDF container. The rather expensive software, reverses the process back to plain text and then interprets what the transactions are. I guess it’s possible the banks are doing EDI and simply offering customers a ‘pretty printable version’ in PDF format but with the EDI fields embedded so the file could still be used with EDI and this special software is just taking advantage of that to generate an importable format.

Other solutions mentioned for various incarnations of Intuit software is to skip the QBO step and go to CSV, which puts us in GnuCash territory. But I’d bet dimes to dollars, you don’t need $100+ software to accomplish that task if OCR isn’t part of the workflow.


> On Aug 6, 2018, at 4:47 AM, c.holtermann at gmx.de wrote:
> Am 2018-07-26 21:56, schrieb deltatango:
>> Hello,
>> Very interested in the possibility of importing PDF statements into GnuCash.
>> I know Quickbooks now has this functionality.
>> I searched online and found a few clunky possibilities that would convert
>> the data into excel which can then be converted to csv and then imported
>> into GnuCash.
>> I was envisioning a system where you select a PDF statement to be imported.
>> The program then asks you to select the area of the statement which contains
>> the transactions, much like a photoshop selection. (And perhaps you could
>> save templates of selections for different statements).
>> Then some kind of OCR scanning reads the columns and data and convert it to
>> columns/rows.
>> Is this in the realm of possibility for some future release?
>> It is so common now that exporting csv or qfx ,etc files from your bank only
>> go so far back and you have to download PDFs instead...
>> I dream, I hope...
>> But in vain I wish not...
>> --
>> Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
>> _______________________________________________
>> gnucash-devel mailing list
>> gnucash-devel at gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
> Hello !
> I haven't heard of PDF statements before. Is it some sort of embedded data ?
> Quick googling led me to https://pdftables.com/blog/convert-bank-statement.
> It seems to be some sort of standard embedding mechanism. Am I right here ?
> Anyway the way would be to extract the data from the PDF. That would either be
> through this statement data or through OCR. The data could be converted to CSV
> and be imported to gnucash.
> The missing link seems to be extraction of data from PDFs.
> Is there a FOSS tool to extract statement data and convert it to CSV ?
> Or when we go OCR. Is there a tool capable of extracting tables ?
> With OCR you usually only get a text file. It does not recognize that it is
> structured table data. At least with the software I used some years ago that
> had been the case.
> The OCR way would be interesting if there are possible right issues as John
> has pointed out about statement data or reverse engineering.
> regards,
> Christoph
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel

More information about the gnucash-devel mailing list