Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify a dewarping algorithm / library #2

Open
slifty opened this issue Feb 7, 2020 · 7 comments
Open

Identify a dewarping algorithm / library #2

slifty opened this issue Feb 7, 2020 · 7 comments
Assignees
Labels
discussion The conversation is the point

Comments

@slifty
Copy link
Collaborator

slifty commented Feb 7, 2020

One of the exports of this project is a de-warped version of each page photographed. This will be used to improve the OCR as well.

This is not out-of-the box functionality! Lets find some existing libraries and algorithms that do this (ideally in Java, but if there is no other choice it might be OK to have it in another language and find a way to run it from within the app).

This issue is ultimately a research issue, to capture and log resources as I find them.

@slifty
Copy link
Collaborator Author

slifty commented Feb 7, 2020

here does not appear to be a perfect solution for de-warping, and that will be a risk to the project overall (but one we will better understand once the MVP is complete). The two risks are:

  1. Computational intensity (since this is for a mobile device)
  2. Quality of the final result.

Here is a short thread on the DIY Bookscanner forums of someone who appears to have tried to make exactly what we're talking about making here. They are pointed to a few resources, though I am a bit wary of going down the path of completely implementing something from scratch based on academic papers.

That thread does note that there is no single way to do it because it is really just a heuristic. Even the best algorithms produce odd or bogus results with a fair bit of frequency. Especially on pages where the typology does not match the assumptions above. Say, on a map or title page.

Which does raise my concerns -- we may find that some pages simply cannot be reliably scanned / dewarped. Again, MVP will expose that challenge.

Approach A: Modify an existing algoirthm

This guide from 2016 and the accompanying code offers what appears to be a fairly compelling de-warping algorithm, though it is in python. This algorithm takes around 30 seconds to de-warp a page on a 2012 Macbook Pro.

Approach B: OpenCV

There are a few projects (such as OpenNoteScanner) which appear to use OpenCV to handle de-warping.

This article from 2014 shows an example in Python which could be modified to Java.

Approach C: TensorFlow

This blog post from 2019 talks about the use of TensorFlow / Machine Learning. Unfortunately they note that Geometric correction in the second step requires massive computational power, and it is not feasible to conduct it solely on-device at the moment.

@slifty slifty self-assigned this Feb 10, 2020
@slifty slifty added the discussion The conversation is the point label Feb 25, 2020
@slifty
Copy link
Collaborator Author

slifty commented Feb 25, 2020

I spoke with @kfogel on this item and it is understood that (1) dewarping is a preprocessing step that is going to improve the outcome of OCR and (2) neither dewarping nor OCR is a perfectly solved problem.

To that end, we are going to follow the 80/20 rule and see what comes from an initial implementation with the understanding that there will be room for significant improvement, but that improvement should be explored after that first iteration.

@kfogel
Copy link
Member

kfogel commented Apr 10, 2020

Just heard about another project that might have some useful references or code: https://gitlab.com/rstocker/scanner

@slifty, if there's some place (other than this issue) where you'd like me to put information about related projects, please let me know. We could create a separate document in the tree for that, or make a section in an existing document later, or whatever. I don't want these notes to be distracting, I just want to have a place to keep possibly-useful references. Even after we evaluate them, it's good to keep a record of what we evaluated, so that neither we nor others need to retrace those steps later.

@kfogel
Copy link
Member

kfogel commented May 22, 2020

Ask HN: OCR framework for extracting formatted text has a lot of links too.

@slifty
Copy link
Collaborator Author

slifty commented May 24, 2020

Awesome thank you for these @kfogel -- this is a fine place for them for now, and we can make another place for related projects later.

@kfogel
Copy link
Member

kfogel commented Jan 3, 2021

One more: https://github.com/Ethereal-Developers-Inc/OpenScan:

"An open source app that enables users to scan hardcopies of documents or notes and convert it to a PDF file. No ads. No data collection. We respect your privacy."

(They don't say anything about OCR; not sure if that's included, or planned for the roadmap, or just not something they're doing.)

@kfogel
Copy link
Member

kfogel commented Jan 19, 2021

One more: https://wiki.gnome.org/Apps/OCRFeeder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion The conversation is the point
Projects
None yet
Development

No branches or pull requests

2 participants