Text color detection algorithm giving pixels of text

Click For Summary

Discussion Overview

The discussion revolves around developing an algorithm for accurately detecting the color of text pixels in images, particularly in the context of image processing software. Participants explore various strategies for identifying the most representative color of text, considering challenges such as anti-aliasing and background interference.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant suggests that averaging pixel colors or selecting the most common color yields unreliable results due to background interference and anti-aliasing effects.
  • Another participant proposes that the most common color detected might actually be the background color, with the text color being a secondary peak that could be filtered out.
  • A different viewpoint emphasizes the need for a more robust algorithm that accounts for the spatial distribution of text pixels relative to background pixels.
  • One suggestion involves inverting the compositing process of text over a background to identify the original text color.
  • Another participant recommends measuring the distance from text pixels to the nearest background pixel and using this information to refine color detection.
  • There is mention of subpixel rendering techniques affecting the perceived color of text, suggesting that averaging filters might help recover the original text color.
  • A simpler approach is proposed, involving converting the image to grayscale and applying thresholding to isolate text characters, though this may introduce its own challenges.

Areas of Agreement / Disagreement

Participants express various competing views on the best methods for detecting text color, with no consensus reached on a single effective approach. The discussion remains unresolved regarding the optimal algorithm for this task.

Contextual Notes

Participants acknowledge limitations such as the influence of anti-aliasing, the presence of mixed colors, and the challenges posed by imperfect character formation in text recognition.

SlurrerOfSpeech
Messages
141
Reaction score
11
I'm creating a type of image processing software and I have a need to get the color that best represents text once I have identified the pixels in a region of text. I've tried using simple strategies like "averaging" the pixels or taking the most common pixel, but these produces bad results

An example I can show is a screenshot of this very message board post. :cool:

pf_forum.PNG


Suppose I want to get the color of "PF thrives ...". A good representation is the hex value #4C4C35 which I found out using color picker in MSPaint and clicking on a pixel that appeared to be the same color as the text appears to the human eye.

However, that color would be chosen by "most common color" just barely, and as you can see, there are many other colors (turqoise, purpose, tan, etc.) that might've won if the text was a little thinner. I need a more foolproof algorithm because mine fails in many cases.

pf_zoomed_in.png


.
 
Technology news on Phys.org
If the regions you talk about are known to be text only, then it seems logical that the most common color (or hue) would be that of the background and the second most common one would be the "real" color of the text with smaller peaks due to foreground color mixing with background color due to the anti-aliazing rendering of fonts. Perhaps, you can even somehow "filter out" those minor peaks on a second pass (when you know first and second peak) if their hue the lies "between" the background and foreground hue.

I am not sure if you also want to pick of if only some of the text is rendered in a different color, say if a few words are red?
 
Filip Larsen said:
If the regions you talk about are known to be text only, then it seems logical that the most common color (or hue) would be that of the background

No, I'm saying that I have the coordinates of the pixels, e.g. a 2-d array of booleans where true is a text pixel and false is a background of the text, and choosing the 1st most common color among the true coordinates within a region containing text.

Below is a better example of the pixels of small, thin text on this page when I take a screenshot. The most common colors are actually a blue or gold instead of black as intended. So most common color is not a reliable formula.

w.png
 
I'd do some reading about how text is composited onto a background. Presumably you can identify the background colour - given that, can you invert the compositing process?
 
Alternatively, measure the spatial distance of each text pixel from the nearest "pure background" pixel, order by distance, keep the top 10%, and take the modal or median colour of those.

Muck around with the distance measure (Euclidean, anisotropic Euclidean, taxicab, etc), percentage to keep, and averaging methodology to see if you can find decent performance.
 
Apparently subpixel rendering has been invented since the last time I did any work on font rendering (back when only non-colored anti-aliasing was used), so I will venture a guess that the colors you see is an artifact of that particular rendering technique. It is not clear to me how colored text is rendered with this technique, but perhaps its still possible to apply some kind of averaging filter that extracts the original color of each letter.

Or perhaps you can look into if any OCR libraries or similar have put efforts into adjusting for this effect when "reading" off a screen. A quick search gave a link a github project with an accompanying blog post.
 
A somewhat obvious simplification would be to first convert to grey-scale, then perhaps thresholding to convert to Black and White.

Be aware that the 'characters' will not be perfectly formed no matter the method of obtaining them. There are always missing and extraneous pixels to contend with. Your recognition routine will have to operate on the 'nearest match' approach.

Cheers,
Tom
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 15 ·
Replies
15
Views
6K
  • · Replies 4 ·
Replies
4
Views
16K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 13 ·
Replies
13
Views
8K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 216 ·
8
Replies
216
Views
30K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
23K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
18K
  • Sticky
  • · Replies 1 ·
Replies
1
Views
26K