Text color detection algorithm giving pixels of text

In summary, The speaker is trying to create an image processing software that can accurately identify the color of text in a given region. They have tried using simple strategies such as averaging or choosing the most common pixel, but these methods have produced unreliable results. They are seeking a more foolproof algorithm and considering factors such as subpixel rendering and OCR libraries.
  • #1
I'm creating a type of image processing software and I have a need to get the color that best represents text once I have identified the pixels in a region of text. I've tried using simple strategies like "averaging" the pixels or taking the most common pixel, but these produces bad results

An example I can show is a screenshot of this very message board post. :cool:


Suppose I want to get the color of "PF thrives ...". A good representation is the hex value #4C4C35 which I found out using color picker in MSPaint and clicking on a pixel that appeared to be the same color as the text appears to the human eye.

However, that color would be chosen by "most common color" just barely, and as you can see, there are many other colors (turqoise, purpose, tan, etc.) that might've won if the text was a little thinner. I need a more foolproof algorithm because mine fails in many cases.


Technology news on Phys.org
  • #2
If the regions you talk about are known to be text only, then it seems logical that the most common color (or hue) would be that of the background and the second most common one would be the "real" color of the text with smaller peaks due to foreground color mixing with background color due to the anti-aliazing rendering of fonts. Perhaps, you can even somehow "filter out" those minor peaks on a second pass (when you know first and second peak) if their hue the lies "between" the background and foreground hue.

I am not sure if you also want to pick of if only some of the text is rendered in a different color, say if a few words are red?
  • #3
Filip Larsen said:
If the regions you talk about are known to be text only, then it seems logical that the most common color (or hue) would be that of the background

No, I'm saying that I have the coordinates of the pixels, e.g. a 2-d array of booleans where true is a text pixel and false is a background of the text, and choosing the 1st most common color among the true coordinates within a region containing text.

Below is a better example of the pixels of small, thin text on this page when I take a screenshot. The most common colors are actually a blue or gold instead of black as intended. So most common color is not a reliable formula.

  • #4
I'd do some reading about how text is composited onto a background. Presumably you can identify the background colour - given that, can you invert the compositing process?
  • #5
Alternatively, measure the spatial distance of each text pixel from the nearest "pure background" pixel, order by distance, keep the top 10%, and take the modal or median colour of those.

Muck around with the distance measure (Euclidean, anisotropic Euclidean, taxicab, etc), percentage to keep, and averaging methodology to see if you can find decent performance.
  • #6
Apparently subpixel rendering has been invented since the last time I did any work on font rendering (back when only non-colored anti-aliasing was used), so I will venture a guess that the colors you see is an artifact of that particular rendering technique. It is not clear to me how colored text is rendered with this technique, but perhaps its still possible to apply some kind of averaging filter that extracts the original color of each letter.

Or perhaps you can look into if any OCR libraries or similar have put efforts into adjusting for this effect when "reading" off a screen. A quick search gave a link a github project with an accompanying blog post.
  • #7
A somewhat obvious simplification would be to first convert to grey-scale, then perhaps thresholding to convert to Black and White.

Be aware that the 'characters' will not be perfectly formed no matter the method of obtaining them. There are always missing and extraneous pixels to contend with. Your recognition routine will have to operate on the 'nearest match' approach.

Last edited:

Related to Text color detection algorithm giving pixels of text

1. What is a text color detection algorithm?

A text color detection algorithm is a computer program designed to identify the pixels of a text within an image or document. It uses a series of mathematical calculations and color analysis to determine the color of the text.

2. How does a text color detection algorithm work?

A text color detection algorithm works by first isolating the text from the rest of the image or document. It then analyzes the color values of each pixel within the text and compares them to known color ranges for different text colors. The algorithm then assigns the most likely color to each pixel and creates a color map of the text.

3. What are the applications of a text color detection algorithm?

A text color detection algorithm has various applications, such as digital image processing, document analysis, and optical character recognition. It can also be used in web development to automatically adjust text color based on the background color of a webpage.

4. How accurate is a text color detection algorithm?

The accuracy of a text color detection algorithm depends on various factors such as the quality of the image or document, the complexity of the text, and the algorithm's design. Generally, a well-designed algorithm can achieve high accuracy in identifying text colors.

5. Can a text color detection algorithm be improved?

Yes, a text color detection algorithm can be improved by constantly testing and refining its design and by incorporating new techniques and technologies. Additionally, training the algorithm with a large and diverse dataset can also improve its accuracy.

Similar threads

  • Programming and Computer Science
  • Math Proof Training and Practice
  • Sci-Fi Writing and World Building
  • MATLAB, Maple, Mathematica, LaTeX
  • Programming and Computer Science
  • Astronomy and Astrophysics
  • Quantum Physics
  • Sticky
  • Engineering and Comp Sci Homework Help
  • Sticky
  • Biology and Chemistry Homework Help