Help with this notation -- some sort of norm?

Click For Summary
SUMMARY

This discussion centers on understanding mathematical notation in the context of Machine Learning, specifically the squared 2-norm and its application in minimizing the Euclidean length of a vector. The notation ##\mathbf b = \mathbf{y - Xw}## is introduced, leading to the formulation of the Normal Equations for ordinary least squares estimations. The conversation emphasizes the importance of consistent notation, recommending the book "Learning From Data" as a foundational resource for self-learners in Machine Learning.

PREREQUISITES
  • Understanding of linear algebra concepts, particularly vectors and matrices.
  • Familiarity with ordinary least squares (OLS) regression techniques.
  • Basic knowledge of Machine Learning principles and terminology.
  • Ability to interpret mathematical notation commonly used in statistics and data analysis.
NEXT STEPS
  • Study the Normal Equations in the context of ordinary least squares regression.
  • Learn about the properties and applications of the squared 2-norm in optimization problems.
  • Explore the book "Learning From Data" and its associated resources for a structured approach to Machine Learning.
  • Research the differences between various mathematical notations used in statistics and Machine Learning literature.
USEFUL FOR

Students and professionals in Machine Learning, data scientists, and anyone seeking to clarify mathematical notation and concepts related to regression analysis and optimization techniques.

SELFMADE
Messages
80
Reaction score
0
I need help understanding this notation, what does this mean?

Squared of 2-norm?

1. Homework Statement

4Xjhm1A.jpg


Thanks
 
Physics news on Phys.org
yes. Say ##\mathbf b = \mathbf{y - Xw}##. I assume that b is real valued for this example. Say you want to minimize the euclidean length of ##\mathbf b##. You write that as ##min \big(\mathbf b^T \mathbf b\big)^\frac{1}{2}##. But square roots are unpleasant to work with, so you then recognize that if you minimize the squared euclidean length of ##\mathbf b## then that also must minimize the euclidean length of ##\mathbf b##. (Why must this be the case?). Hence you recover the problem above that reads as: ##min \big(\mathbf b^T \mathbf b\big)##.

What you have there is the setup for the Normal Equations, and doing ordinary least squares estimations. There are two approaches to deriving the solution for an over-determined system of equations -- one involves calculus and the other involves wielding orthogonality. Both approaches are worth understanding and thinking on.
 
  • Like
Likes   Reactions: SELFMADE
Thank you for your reply. So my hunch was right.

I am learning Machine Learning by myself. I have BSEE but I am encountering many symbols/notations that I don't understand.

For example, what does the "1 with a vertical line through its back" mean?

I know as far as E stands for expected value.

ZET1L38.jpg


Thanks
 
You'
SELFMADE said:
Thank you for your reply. So my hunch was right.

I am learning Machine Learning by myself. I have BSEE but I am encountering many symbols/notations that I don't understand.

For example, what does the "1 with a vertical line through its back" mean?

I know as far as E stands for expected value.

ZET1L38.jpg


Thanks

My recommendation is to find a good text like "Learning From Data" and learn from that text (plus its echapters). (The book is quite cheap at $30 in the US, though due to peculiarities with licensing, $100 in Canada?) There is an associated free course with the same title at work.caltech.edu, and also on itunes store.

More to the point: a good text will have an appendix that lists and defines all the notation that it uses. Unfortunately notation is not standardized or uniform between authors.

A ##\mathbf 1## will tend to mean a ones vector or an indicator function or sometimes even the identity matrix. Here it is an indicator function. I personally prefer a ##\mathbf 1## to mean ones vector, ##\mathbf I## to mean identity matrix, and ##\mathbb I(Y = 1)## to denote an indicator function, but the fundamental issues is non-homogeneity of notation in the space --- again my solution is that you can homogenize things when starting off by picking one good source to learn from that has its own consistent notation. (Then once you've mastered that one source, you can much more easily infer / guess other people's notation as your branch out.)

Good luck.
 
  • Like
Likes   Reactions: SELFMADE

Similar threads

  • · Replies 26 ·
Replies
26
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
1K
  • · Replies 43 ·
2
Replies
43
Views
5K
Replies
1
Views
1K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 5 ·
Replies
5
Views
2K