No one has made a program to solve these yet?

  • Thread starter Thread starter Jamin2112
  • Start date Start date
  • Tags Tags
    Program
Click For Summary

Discussion Overview

The discussion revolves around the challenges of developing a program to solve captchas, particularly those that involve identifying letters from noisy images. Participants explore various methodologies, including the use of correlation coefficients, image processing techniques, and machine learning approaches.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant proposes a method involving the calculation of correlation coefficients between binary-masked arrays and known letter images to identify letters.
  • Another participant questions the feasibility of accurately identifying letters given the challenges of spacing, size, and location in the presence of noise and distortions.
  • Some participants note that while humans can easily solve captchas, current algorithms struggle with them, suggesting that it is a complex problem.
  • A participant mentions using image erosion techniques to identify spaces between letters and discusses the difficulties posed by skewed or tilted images.
  • There is mention of using neural networks for optical character recognition (OCR) as a potential approach to solving captchas.
  • One participant highlights that the problem is compounded by the need for robust representations of images that can handle transformations like resizing and skewing.
  • Another participant reflects on the idea that spammers may exploit human solvers through man-in-the-middle attacks instead of attempting to solve captchas algorithmically.

Areas of Agreement / Disagreement

Participants express a range of views on the difficulty of solving captchas, with some acknowledging the challenges while others propose various methods. There is no consensus on a definitive solution or approach, indicating ongoing disagreement and exploration of ideas.

Contextual Notes

Participants note limitations related to identifying letters in images with unknown quantities and distortions, as well as the need for robust image processing techniques. The discussion reflects the complexity of the problem without resolving the various technical challenges presented.

Jamin2112
Messages
973
Reaction score
12
I'm going to try and be of aid to spammers and hackers by making a program that solves these:


GmQH0g1.png



Can't be too hard. Here's a procedure I'm going to implement:

Given an m x n array of pixels known to contain a letter among a bunch of noise, convert the pixels to their RGB values, give the array a binary mask, then calculate the correlation coefficient between it and each of the m x n binary arrays that would represent images of upper and lower case letters. The one that yields the greatest correlation coefficient will be assumed to be the letter.

Make sense?
 
Technology news on Phys.org
For instance, I look at the correlation coefficient between the values of binary-masked arrays representing

zmdiAUS.png


and

http://www.ourdesigns.com/sites/odi/images/fullsize/REX00LB1S.jpg


after resizing, of course.
 
It's hard. That's why spammers just use man-in-the-middle attacks to solve these.
(I.E. they set up a website where people have to solve these to download pirated movies or something.)
 
  • Like
Likes   Reactions: 1 person
A priori it is easy to solve captchas - for humans. We are are hard-wired (by evolution of predator avoidance strategies) to see things that are not the way they are normally presented.

As it stands now, captchas are not even remotely simple to solve by any known algorithms. It is a very hard problem. Please try. Maybe you can find something in graph theory that does it.
 
How do you know the spacing, size, and location of the letters so that you can pick out mxn arrays that contain only 1 letter and not multiple letters, no letters, and partial letters? I think identifying a letter out of an image that is known to only contain 1 letter and noise is easier than taking an image that contains an unknown number of letters at unknown locations and in the presence of noise and intentional size and alignment distortions dividing that image up into groups of single letters.
 
Last edited:
http://deathbycaptcha.com
http://decaptcha.biz/
http://decaptcha.net/

I've also worked on one myself using mathematica. Its not easy. REALLY. Some use neural networks to learn characters for the OCR.

Floid said:
How do you know the spacing, size, and location of the letters so that you can pick out mxn arrays that contain only 1 letter and not multiple letters, no letters, and partial letters? I think identifying a letter out of an image that is known to only contain 1 letter and noise is easier than taking an image that contains an unknown number of letters at unknown locations and in the presence of noise and intentional size and alignment distortions dividing that image up into groups of single letters.


Thats one of the first problems. there's many ways. I wrote something that erodes the image to find spaces, looks at average distances between spaces, disregards those beyond some standard deviation from the mean, and compute the average character width, and try to split it up.

But then if the image is skewed or tilted, yoou have to fix that first.
 
If someone could make one, i guess its AI :confused:
 
To do that you'd first have to identify the sub-matrices that may contain letters, which may not be easy. Assuming that you could separate letters with high accuracy, I think you're solution would perform perform poorly on most captchas. Other problems are that the letters are usually skewed and put very close together where a sub-matrix of a letter may contain parts of other letters. You need a more robust representation of the images that is invariant to transformations such as resizing and skewing.

This is a computer vision problem, and I think the current best performing methods are convolutional networks with other tricks like dropout, so I'd probably try train those for this problem.

It's a pretty hard problem. I have trouble deciphering captchas myself.
 
DavidSnider said:
It's hard. That's why spammers just use man-in-the-middle attacks to solve these.
(I.E. they set up a website where people have to solve these to download pirated movies or something.)
Thanks for the epiphany. I never heard of this but it makes perfect sense. Why try to solve something difficult when you can let someone else unknowingly solve it for you? :cool:
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 4 ·
Replies
4
Views
8K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 18 ·
Replies
18
Views
6K
  • · Replies 19 ·
Replies
19
Views
7K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 49 ·
2
Replies
49
Views
12K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 5 ·
Replies
5
Views
6K
  • · Replies 1 ·
Replies
1
Views
4K