Open Source Image to Text Conversion Solution

  • Thread starter Thread starter Maxwell
  • Start date Start date
  • Tags Tags
    Image Reading
Click For Summary
An open-source solution is sought for converting black and white images into a text file representation, specifically in binary or hexadecimal format. The discussion highlights various tools and libraries, including GIMP, Python Imaging Library (PIL), and ImageJ, as potential options for handling image input formats like PNG or TIFF. Participants note that while some existing OCR tools were mentioned, they do not meet the specific need for a simple binary representation of the image. The goal is to create a text file where black areas are represented by 1's and white areas by 0's, emphasizing that this task is part of a larger project. There is skepticism about finding a pre-existing program that fulfills these requirements, suggesting that custom programming may be necessary.
Maxwell
Messages
511
Reaction score
0
Hey guys,

I was wondering if anyone knows of an open source solution for reading in a black and white image and converting it to text (preferably in a text file)?

I've tried Google, but I could only find online generators.

Thanks.
 
Technology news on Phys.org
Will Gimp do what you want?

Otherwise, I have a vague recollection (+10 yrs ago) of using some command line utils in unix like png2pbm. No idea if that kind of stuff is still around.
 
Some things that come up on a google search of: freeware ocr

http://www.simpleocr.com/
http://www.download.com/SimpleOCR/3640-2070_4-10152129.html

http://www.inftyproject.org/en/software.html#InftyReader
http://www.sciaccess.net/en/InftyReader/index.html

http://www.gnu.org/software/ocrad/ocrad.html

http://documents.cfar.umd.edu/ (repository)
http://www.adams1.com/pub/russadam/ocr.html
http://www.heatonresearch.com/articles/42/page1.html (project)
http://www.codeproject.com/dotnet/simple_ocr.asp (project)
http://code.google.com/p/ocropus/
 
Last edited by a moderator:
I've tried some of the opensource alternatives, and I must say was disapointed. :( I't might be some settings that needed to be done though. I gave up after a while
 
Thank you for those links, but they are kind different from the type of manipulation I need.

Aren't there any algorithms that just straight take in a black & white image and turn it into binary?
 
What is the format of the input image?
Once the image is read in, it shouldn't be too hard to compete a short computer program in (say) perl, python, c, java...
 
The images can either be PNG or TIFF files. If there is code that only works for one of the previously mentioned file types, that is fine.
 
If I had to quickly write something to do this, I would personally choose one of these apparoaches:

write a program in Python using the Python Imaging Library http://www.pythonware.com/products/pil/

write a program in Java using ImageJ and some selection from its plugins http://rsb.info.nih.gov/ij/ http://rsb.info.nih.gov/ij/plugins/index.html

The above, however, would not be very portable... except to other computers with these installed.

As I hinted above, the hardest part would be to read in the image file... probably best handled by someone else's routines that the rest of your program would call.
 
Last edited by a moderator:
robphy said:
What is the format of the input image?
Once the image is read in, it shouldn't be too hard to compete a short computer program in (say) perl, python, c, java...
Converting an image to text can obviously not done with a short computer program. It is actually quite complex.

- Oh, never mind, you were commenting on converting the image to a binary.

What I am waiting for is for some student to write an OCR program that converts formulas into latex. Would be a nice project and would benefit many.
 
Last edited:
  • #10
MeJennifer said:
Converting an image to text can obviously not done with a short computer program. It is actually quite complex.

The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.

MeJennifer said:
What I am waiting for is for some student to write an OCR program that converts formulas into latex. Would be a nice project and would benefit many.

Did you see the InftyReader project in the links I posted above?
Here are some samples: http://www.inftyproject.org/en/demo.html#0002
 
  • #11
robphy said:
The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.
I see sorry for the confusion.


robphy said:
Did you see the InftyReader project in the links I posted above?
Here are some samples: http://www.inftyproject.org/en/demo.html#0002
Heh, interesting! I am going to check it out!
 
  • #12
robphy said:
The OP is not looking for an OCR program... but a program which converts a 2-color image into some kind of text file with a binary or hexadecimal representation of the image.

Exactly. This conversion from a 2 color image to text is only a small part of an overall project, so if I don't need to mess around with writing the program myself, and could perhaps use an open source solution, that would be fantastic.
 
  • #13
Maxwell said:
Exactly. This conversion from a 2 color image to text is only a small part of an overall project, so if I don't need to mess around with writing the program myself, and could perhaps use an open source solution, that would be fantastic.

The Python PIL and ImageJ solutions are open source platforms... But I doubt you will find an already written program that does what you want. (However, see below.) If this is part of a larger project, these solutions above might be worth looking into.

Here is one question though... are you looking to create a text file comprised of only "0" and "1" corresponding to the 2 colors of an image? [Rather than (say) a text file with a hexadecimal representation, each line corresponding to eight rows of the image.] In other words, are you looking for something like [but not precisely] this: http://www.text-image.com/ or http://ascii.dyne.org/ ?
 
Last edited by a moderator:
  • #14
robphy said:
The Python PIL and ImageJ solutions are open source platforms... But I doubt you will find an already written program that does what you want. (However, see below.) If this is part of a larger project, these solutions above might be worth looking into.

Here is one question though... are you looking to create a text file comprised of only "0" and "1" corresponding to the 2 colors of an image? [Rather than (say) a text file with a hexadecimal representation, each line corresponding to eight rows of the image.] In other words, are you looking for something like [but not precisely] this: http://www.text-image.com/ or http://ascii.dyne.org/ ?

I'm not sure either really capture what I'm looking for, but if I had to choose one, I'd say the second link depicts what I'm trying to do better.

For the first link, the images are converted to 1's and 0's, but it doesn't seem like those values represent anything.

What I'm looking for is to take a black and white image in, and have the black areas represented by 1's and the white areas represented by 0's.
 
Last edited by a moderator:

Similar threads

  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
65
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 29 ·
Replies
29
Views
3K
  • · Replies 19 ·
Replies
19
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
9
Views
3K