[GNC-dev] Import PDF to GnuCash

c.holtermann at gmx.de c.holtermann at gmx.de
Mon Aug 6 05:47:16 EDT 2018

Am 2018-07-26 21:56, schrieb deltatango:
> Hello,
> Very interested in the possibility of importing PDF statements into 
> GnuCash.
> I know Quickbooks now has this functionality.
> I searched online and found a few clunky possibilities that would 
> convert
> the data into excel which can then be converted to csv and then 
> imported
> into GnuCash.
> I was envisioning a system where you select a PDF statement to be 
> imported.
> The program then asks you to select the area of the statement which 
> contains
> the transactions, much like a photoshop selection. (And perhaps you 
> could
> save templates of selections for different statements).
> Then some kind of OCR scanning reads the columns and data and convert 
> it to
> columns/rows.
> Is this in the realm of possibility for some future release?
> It is so common now that exporting csv or qfx ,etc files from your bank 
> only
> go so far back and you have to download PDFs instead...
> I dream, I hope...
> But in vain I wish not...
> --
> Sent from: 
> http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html
> _______________________________________________
> gnucash-devel mailing list
> gnucash-devel at gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel

Hello !

I haven't heard of PDF statements before. Is it some sort of embedded 
data ?
Quick googling led me to 
It seems to be some sort of standard embedding mechanism. Am I right 
here ?

Anyway the way would be to extract the data from the PDF. That would 
either be
through this statement data or through OCR. The data could be converted 
to CSV
and be imported to gnucash.

The missing link seems to be extraction of data from PDFs.

Is there a FOSS tool to extract statement data and convert it to CSV ?
Or when we go OCR. Is there a tool capable of extracting tables ?

With OCR you usually only get a text file. It does not recognize that it 
structured table data. At least with the software I used some years ago 
had been the case.

The OCR way would be interesting if there are possible right issues as 
has pointed out about statement data or reverse engineering.



More information about the gnucash-devel mailing list