How can I use a statistical approach to match patterns in a 3x3 grid?

In summary, the conversation is about matching a new sequence to a pre-defined pattern, specifically in the context of optical character recognition. The suggested method is to calculate the distance between the candidate pattern and the pre-defined pattern, and select the candidate pattern with the smallest distance as the best match.
  • #1
squaremeplz
124
0

Homework Statement



If I have a 3*3 grid, or 3*3 matrix, which records clicked points.
I.e.

Pattern T =

[1,1] = 1
[1,2] = 1
[1,3] = 1
[2,1] = -1
[2,2] = 1
[2,3] = -1
[3,1] = -1
[3,2] = 1
[3,3] = -1

or

1 1 1
-1 1 -1
-1 1 -1

What is the best way to statistically match x = 1 vs x = not 1
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
squaremeplease said:

Homework Statement



If I have a 3*3 grid, or 3*3 matrix, which records clicked points.
I.e.

Pattern T =

[1,1] = 1
[1,2] = 1
[1,3] = 1
[2,1] = -1
[2,2] = 1
[2,3] = -1
[3,1] = -1
[3,2] = 1
[3,3] = -1

or

1 1 1
-1 1 -1
-1 1 -1

What is the best way to statistically match x = 1 vs x = not 1

What do you mean "statistically match"?
 
Last edited:
  • #3
I.e.

A new sequence is entered and we wish to identify group membership to T or C based on sample data for T and C. SInce T and C won't necessarily consist of 2 example database( 1 for each) but rather 1000 for T and 1000 C examples, how does one accurately calculate a match?
 
Last edited by a moderator:
  • #4
I still don't have a clue what you're trying to do.
 
Last edited:
  • #5
My question is related to optical character recognition. So if I draw a T, how do I match it to T that accurately reflects the entire training data?

I.e. if I have pattern T1, pattern T2, .., are the sequences for n T's.
C1, C2,..,CN is C training points for n C's.

If I feed a single C sequence in now, how do I weigh my decision most accurately?
 
Last edited by a moderator:
  • #6
OK, now I understand. At the risk of oversimplification, let me define T this way:
Pattern T =

1 1 1
0 1 0
0 1 0

To match this pattern, a candidate pattern A should have a distance of 0 from this pattern, with distance calculated as
[tex]\sqrt{(a_0 - t_0)^2 + (a_1 - t_1)^2 + (a_2 - t_2)^2 + ... + (a_8 - t_8)^2}[/tex]

In this formula I have flattened out your matrix to a one-dimensional array. The ti values are from pattern T, and the ai values are from the candidate pattern A.

If you get a "distance" of 0, the two patterns match exactly. If you have several candidate patterns with nonzero distances, pick the one with the smaller "distance."

That's how I would approach it.
 

What is a Support Vector Machine (SVM)?

A Support Vector Machine is a type of supervised machine learning algorithm used for classification and regression analysis. It works by finding the best possible boundary or hyperplane that can separate data points into different classes.

How does a Support Vector Machine work?

A Support Vector Machine works by mapping data points into a high-dimensional feature space where it can find a hyperplane that best separates the data into different classes. It then uses this hyperplane to classify new data points. The goal of the algorithm is to find the hyperplane with the largest margin, which is the distance between the hyperplane and the nearest data points of each class. This allows for better generalization and reduces the likelihood of overfitting.

What are the advantages of using a Support Vector Machine?

Some advantages of using a Support Vector Machine include its ability to handle high-dimensional data, its effectiveness in dealing with nonlinear data, and its ability to handle large datasets. It also has a strong theoretical foundation and is less prone to overfitting compared to other machine learning algorithms.

What are the limitations of a Support Vector Machine?

While Support Vector Machines have many advantages, they also have some limitations. These include their sensitivity to noise, the need for proper scaling of data, and the potential for long training times with large datasets. They also do not perform well on datasets with overlapping classes.

How can I choose the right parameters for my Support Vector Machine model?

Choosing the right parameters for a Support Vector Machine model can be done through a process called hyperparameter tuning. This involves testing different combinations of parameters and selecting the ones that give the best performance on a validation dataset. Common parameters to tune include the type of kernel used, the kernel function's parameters, and the regularization parameter.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
4K
  • Calculus and Beyond Homework Help
Replies
2
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
126
  • Programming and Computer Science
Replies
3
Views
3K
  • Programming and Computer Science
Replies
4
Views
7K
  • Calculus and Beyond Homework Help
Replies
1
Views
2K
Replies
20
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
3K
  • Programming and Computer Science
2
Replies
54
Views
4K
Replies
4
Views
2K
Back
Top