Open Source Image to Text Conversion Solution

In summary, there are various open source solutions available for reading in a black and white image and converting it to text, such as Gimp, unix command line utilities, and freeware OCR programs. However, some of these solutions may not be very portable. Additionally, there is a need for a program that specifically converts a 2-color image into a text file with binary or hexadecimal representation. This could potentially be achieved by using the Python Imaging Library or ImageJ, but may require some customization. The ultimate goal is to create a text file with 1's representing black areas and 0's representing white areas.
  • #1
Maxwell
513
0
Hey guys,

I was wondering if anyone knows of an open source solution for reading in a black and white image and converting it to text (preferably in a text file)?

I've tried Google, but I could only find online generators.

Thanks.
 
Technology news on Phys.org
  • #2
Will Gimp do what you want?

Otherwise, I have a vague recollection (+10 yrs ago) of using some command line utils in unix like png2pbm. No idea if that kind of stuff is still around.
 
  • #3
Some things that come up on a google search of: freeware ocr

http://www.simpleocr.com/
http://www.download.com/SimpleOCR/3640-2070_4-10152129.html

http://www.inftyproject.org/en/software.html#InftyReader
http://www.sciaccess.net/en/InftyReader/index.html

http://www.gnu.org/software/ocrad/ocrad.html

http://documents.cfar.umd.edu/ [Broken] (repository)
http://www.adams1.com/pub/russadam/ocr.html [Broken]
http://www.heatonresearch.com/articles/42/page1.html (project)
http://www.codeproject.com/dotnet/simple_ocr.asp [Broken] (project)
http://code.google.com/p/ocropus/
 
Last edited by a moderator:
  • #4
I've tried some of the opensource alternatives, and I must say was disapointed. :( I't might be some settings that needed to be done though. I gave up after a while
 
  • #5
Thank you for those links, but they are kind different from the type of manipulation I need.

Aren't there any algorithms that just straight take in a black & white image and turn it into binary?
 
  • #6
What is the format of the input image?
Once the image is read in, it shouldn't be too hard to compete a short computer program in (say) perl, python, c, java...
 
  • #7
The images can either be PNG or TIFF files. If there is code that only works for one of the previously mentioned file types, that is fine.
 
  • #8
If I had to quickly write something to do this, I would personally choose one of these apparoaches:

write a program in Python using the Python Imaging Library http://www.pythonware.com/products/pil/

write a program in Java using ImageJ and some selection from its plugins http://rsb.info.nih.gov/ij/ http://rsb.info.nih.gov/ij/plugins/index.html

The above, however, would not be very portable... except to other computers with these installed.

As I hinted above, the hardest part would be to read in the image file... probably best handled by someone else's routines that the rest of your program would call.
 
Last edited by a moderator:
  • #9
robphy said:
What is the format of the input image?
Once the image is read in, it shouldn't be too hard to compete a short computer program in (say) perl, python, c, java...
Converting an image to text can obviously not done with a short computer program. It is actually quite complex.

- Oh, never mind, you were commenting on converting the image to a binary.

What I am waiting for is for some student to write an OCR program that converts formulas into latex. Would be a nice project and would benefit many.
 
Last edited:
  • #10
MeJennifer said:
Converting an image to text can obviously not done with a short computer program. It is actually quite complex.

The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.

MeJennifer said:
What I am waiting for is for some student to write an OCR program that converts formulas into latex. Would be a nice project and would benefit many.

Did you see the InftyReader project in the links I posted above?
Here are some samples: http://www.inftyproject.org/en/demo.html#0002
 
  • #11
robphy said:
The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.
I see sorry for the confusion.


robphy said:
Did you see the InftyReader project in the links I posted above?
Here are some samples: http://www.inftyproject.org/en/demo.html#0002
Heh, interesting! I am going to check it out!
 
  • #12
robphy said:
The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.

Exactly. This conversion from a 2 color image to text is only a small part of an overall project, so if I don't need to mess around with writing the program myself, and could perhaps use an open source solution, that would be fantastic.
 
  • #13
Maxwell said:
Exactly. This conversion from a 2 color image to text is only a small part of an overall project, so if I don't need to mess around with writing the program myself, and could perhaps use an open source solution, that would be fantastic.

The Python PIL and ImageJ solutions are open source platforms... But I doubt you will find an already written program that does what you want. (However, see below.) If this is part of a larger project, these solutions above might be worth looking into.

Here is one question though... are you looking to create a text file comprised of only "0" and "1" corresponding to the 2 colors of an image? [Rather than (say) a text file with a hexadecimal representation, each line corresponding to eight rows of the image.] In other words, are you looking for something like [but not precisely] this: http://www.text-image.com/ or http://ascii.dyne.org/ ?
 
Last edited by a moderator:
  • #14
robphy said:
The Python PIL and ImageJ solutions are open source platforms... But I doubt you will find an already written program that does what you want. (However, see below.) If this is part of a larger project, these solutions above might be worth looking into.

Here is one question though... are you looking to create a text file comprised of only "0" and "1" corresponding to the 2 colors of an image? [Rather than (say) a text file with a hexadecimal representation, each line corresponding to eight rows of the image.] In other words, are you looking for something like [but not precisely] this: http://www.text-image.com/ or http://ascii.dyne.org/ ?

I'm not sure either really capture what I'm looking for, but if I had to choose one, I'd say the second link depicts what I'm trying to do better.

For the first link, the images are converted to 1's and 0's, but it doesn't seem like those values represent anything.

What I'm looking for is to take a black and white image in, and have the black areas represented by 1's and the white areas represented by 0's.
 
Last edited by a moderator:

1. What is an "Open Source Image to Text Conversion Solution"?

An "Open Source Image to Text Conversion Solution" is a software program that allows users to convert images, such as scanned documents or photographs, into editable text. It is open source, meaning that the source code is freely available for anyone to use, modify, and distribute.

2. How does an "Open Source Image to Text Conversion Solution" work?

An "Open Source Image to Text Conversion Solution" uses optical character recognition (OCR) technology to scan an image and identify the characters and words within it. It then converts these characters into editable text, which can be saved as a document or copied and pasted into other programs.

3. What are the benefits of using an "Open Source Image to Text Conversion Solution"?

One of the main benefits of using an "Open Source Image to Text Conversion Solution" is cost savings, as it is typically free to download and use. It also offers convenience, as it allows for easy and quick conversion of images into text, eliminating the need for manual typing. Additionally, open source software often has a strong community of developers who continuously improve and update the program.

4. Are there any limitations to using an "Open Source Image to Text Conversion Solution"?

While "Open Source Image to Text Conversion Solutions" have come a long way in terms of accuracy, they may still struggle with certain fonts, handwriting, or complex layouts. It is also important to note that open source software may not have the same level of technical support as paid software.

5. Can I contribute to the development of an "Open Source Image to Text Conversion Solution"?

Yes, as an open source program, anyone can contribute to the development of an "Open Source Image to Text Conversion Solution." This can include reporting bugs, suggesting improvements, or even writing code. This collaborative effort helps to continuously improve the software and make it more effective for users.

Similar threads

  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
2
Replies
65
Views
2K
  • Programming and Computer Science
Replies
29
Views
2K
  • Programming and Computer Science
Replies
1
Views
3K
  • Programming and Computer Science
Replies
19
Views
2K
  • Programming and Computer Science
Replies
8
Views
824
  • Programming and Computer Science
Replies
9
Views
2K
  • Programming and Computer Science
Replies
7
Views
1K
  • Programming and Computer Science
Replies
5
Views
2K
  • Programming and Computer Science
Replies
4
Views
5K
Back
Top