How to combine the results of multiple OCR tools to get better text recognitionHow to get Indexing Service and MODI to produce Full-text over OCR?OCR combined with font recognition?JavaScript text recognition and OCR on <canvas>Text Recognition using ocr of MatlabHow to improve text recognition using Microsoft OCR?The OCR text recognition camera is not launchingFull Page Text Recognition Dataset CreationGetting better results with OCRFloor Plan Text Recognition & OCROCR text recognition
How do discovery writers hibernate?
Python π = 1 + (1/2) + (1/3) + (1/4) - (1/5) + (1/6) + (1/7) + (1/8) + (1/9) - (1/10) ...1748 Euler
How do I safety check that there is no light in Darkroom / Darkbag?
Base Current vs Emitter Base voltage
Can I shorten this filter, that finds disk sizes over 100G?
Is Norway in the Single Market?
Adding a (stair/baby) gate without facing walls
Oath of redemption: Does Emmissary of Peace reflect damage taken from Aura of the Guardian?
Is the EU really banning "toxic propellants" in 2020? How is that going to work?
Can the additional attack from a Samurai's Rapid Strike have advantage?
Should students have access to past exams or an exam bank?
Does the problem of P vs NP come under the category of Operational Research?
Using Python in a Bash Script
How to let cacti grow even if no player is near?
Conflict between senior and junior members
Is it moral to remove/hide certain parts of a photo, as a photographer?
Is this mechanically safe?
Feedback diagram
Could flaps be raised upward to serve as spoilers / lift dumpers?
Were there any unmanned expeditions to the moon that returned to Earth prior to Apollo?
Can black block with a hanging piece in a back rank mate situation?
Please explain the difference in the order of naming Tzelafchad's daughters
What is the most 'environmentally friendly' way to learn to fly?
How to derive trigonometric Cartesian equation from parametric
How to combine the results of multiple OCR tools to get better text recognition
How to get Indexing Service and MODI to produce Full-text over OCR?OCR combined with font recognition?JavaScript text recognition and OCR on <canvas>Text Recognition using ocr of MatlabHow to improve text recognition using Microsoft OCR?The OCR text recognition camera is not launchingFull Page Text Recognition Dataset CreationGetting better results with OCRFloor Plan Text Recognition & OCROCR text recognition
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?
Example:
Actual text
§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.
OCR tool 1
5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.
OCR tool 2
§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS
OCR tool 3
§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.
What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?
My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.
For example with window size 3:
[5 5.1 The]
[§5.1: The contract]
[§ 5.1: The]
As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).
Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.
nlp computer-vision ocr sensor-fusion
add a comment |
Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?
Example:
Actual text
§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.
OCR tool 1
5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.
OCR tool 2
§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS
OCR tool 3
§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.
What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?
My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.
For example with window size 3:
[5 5.1 The]
[§5.1: The contract]
[§ 5.1: The]
As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).
Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.
nlp computer-vision ocr sensor-fusion
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45
add a comment |
Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?
Example:
Actual text
§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.
OCR tool 1
5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.
OCR tool 2
§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS
OCR tool 3
§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.
What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?
My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.
For example with window size 3:
[5 5.1 The]
[§5.1: The contract]
[§ 5.1: The]
As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).
Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.
nlp computer-vision ocr sensor-fusion
Imagine, you have different OCR tools to read text from images but none of them gives you a 100% accurate output. Combined however, the result could come very close to the ground truth - What would be the best technique to "fuse" the text together to get good results?
Example:
Actual text
§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.
OCR tool 1
5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.
OCR tool 2
§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS
OCR tool 3
§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.
What could be a promising algorithm to fuse OCR 1, 2 and 3 to get the actual text?
My first idea was creating a "tumbling window" of an arbitrary length, compare the words in the window and take the words 2 out of 3 tools predict for every position.
For example with window size 3:
[5 5.1 The]
[§5.1: The contract]
[§ 5.1: The]
As you see, the algorithm doesn't work as all three tools have different candidates for position one (5, §5.1: and §).
Of course it would be possible to add some tricks like Levenshtein distance to allow some deviations but I fear this will not really be robust enough.
nlp computer-vision ocr sensor-fusion
nlp computer-vision ocr sensor-fusion
edited Mar 26 at 23:37
Simon Nobel
asked Mar 26 at 23:28
Simon NobelSimon Nobel
1071 silver badge11 bronze badges
1071 silver badge11 bronze badges
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45
add a comment |
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55367637%2fhow-to-combine-the-results-of-multiple-ocr-tools-to-get-better-text-recognition%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Is this question similar to what you get asked at work? Learn more about asking and sharing private information with your coworkers using Stack Overflow for Teams.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55367637%2fhow-to-combine-the-results-of-multiple-ocr-tools-to-get-better-text-recognition%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Might be helpful to view this as a merging problem. Not a trivial topic, though.
– afarley
Mar 26 at 23:45