# Dictionary Learning and Bases

• I
• fog37

#### fog37

TL;DR Summary
Dictionary Learning and Bases
Hello Forum,

I am trying to get a grasp of the topic (new to me) of dictionary and dictionary learning. In general, we express a vector ##A## using a basis.
A basis is a complete set of vectors that we can use to expand any other vector as a linear combination of the basis vectors. For example, a signal ##x(t)## can be expanded in the Fourier basis, in the Dirac delta basis, etc. These are examples of orthogonal bases (any pair of the basis vectors are uncorrelated with each other). The same signal can also be expanded using wavelets (wavelet transform) or cosines (cosine transform), etc.

Now on to dictionaries: a dictionary is an overcomplete set of vectors. We can use the dictionary atoms to write an arbitrary vector as a weighted sum of the atoms. In general, we always seek bases because we can get the expansion coefficients via inner products.

What is the point of a dictionary where some of the vectors are correlated and the set is overcomplete?
Does the inner product still provides the coefficients?

Say we have vector ##A## in ##R2##. All the possible bases will have exactly two basis vectors. We use the basis ##B= (B1, B2)##. We can write $$A= b1 B1 + b2 B2$$
Say we have a dictionary ##D= (d1,d2,d3,d4)## which can also be used to express $$A= d1 D1 + d2 D2+ d3 D3+d4 D4$$

Is that correct?

What is dictionary "learning" then?

Say we have a dictionary ##D= (d1,d2,d3,d4)## which can also be used to express $$A= d1 D1 + d2 D2+ d3 D3+d4 D4$$

Is that correct?
Yes, although for an overcomplete dictionary (where the number of atoms is greater than the number of dimensions) we are usually interested in sparse (not necessarily exact) representations where (using your notation) the number of non-zero ## d_k ## values that represent each ## A ## is less than the number of dimensions.

What is dictionary "learning" then?
Creating a dictionary that for a given number of atoms optimizes the sparsity and the accuracy of the representations.

Last edited:
sysprog
Summary:: Dictionary Learning and Bases

Hello Forum,

I am trying to get a grasp of the topic (new to me) of dictionary and dictionary learning. In general, we express a vector ##A## using a basis.
A basis is a complete set of vectors that we can use to expand any other vector as a linear combination of the basis vectors. For example, a signal ##x(t)## can be expanded in the Fourier basis, in the Dirac delta basis, etc. These are examples of orthogonal bases (any pair of the basis vectors are uncorrelated with each other). The same signal can also be expanded using wavelets (wavelet transform) or cosines (cosine transform), etc.

Now on to dictionaries: a dictionary is an overcomplete set of vectors. We can use the dictionary atoms to write an arbitrary vector as a weighted sum of the atoms. In general, we always seek bases because we can get the expansion coefficients via inner products.

What is the point of a dictionary where some of the vectors are correlated and the set is overcomplete?
Does the inner product still provides the coefficients?

Say we have vector ##A## in ##R2##. All the possible bases will have exactly two basis vectors. We use the basis ##B= (B1, B2)##. We can write $$A= b1 B1 + b2 B2$$
Say we have a dictionary ##D= (d1,d2,d3,d4)## which can also be used to express $$A= d1 D1 + d2 D2+ d3 D3+d4 D4$$

Is that correct?

What is dictionary "learning" then?
Re Fourier series, you may want to consider that they use infinite bases and you're dealing with convergence issues, rather than equality here (i.e., a Schauder basis, rather than your Hamel basis in finite dimension).

pbuk
Hello again,
I found this at resource on frame and dictionaries at https://www.egr.msu.edu/~aviyente/ece802lecture9.pdf

In what sense are larger dictionary (overcomplete set of building vectors called atoms) more suitable for representing in a sparse way signals that have lots of structure?

Could anybody providing some clarifications? I am stuck with the idea of a complete basis as the best choice to represent an arbitrary signal...

Representing a signal using a frame, instead of a basis, leads us to several possible linear representations of the same signal which does not have a single unique representation. Apparently this is useful to create a simpler, more sparse representation of a signal...but why?

You are confusing the two parts of the notes you linked. The first part is about frames and ends with the summary on slide 9:
• Frames are an overcomplete version of a basis set, and tight frames are an overcomplete version of an orthogonal basis set.
• The frames and tight frames have a certain amount of redundancy. In some cases, redundancy is desirable giving a robustness to the representation. In other cases, redundancy is an inefficiency.
• In finite dimensions, vectors can be removed from a frame to get a bases, but in infinite dimensions, that is not possible.

The second part starts with the slide you quoted and is about overcomplete dictionaries: these are not the same thing as frames.

A simple example of an overcomplete dictionary is a postal code: I am going to use the UK postcode system. We can represent the location of 10 Downing Street in eight characters using its postcode SW1A 2AA. This is more compact than using the orthoganal basis of latitude and longitude 51.5033N0.1277W.

fog37
You are confusing the two parts of the notes you linked. The first part is about frames and ends with the summary on slide 9:

The second part starts with the slide you quoted and is about overcomplete dictionaries: these are not the same thing as frames.

A simple example of an overcomplete dictionary is a postal code: I am going to use the UK postcode system. We can represent the location of 10 Downing Street in eight characters using its postcode SW1A 2AA. This is more compact than using the orthoganal basis of latitude and longitude 51.5033N0.1277W.
Thank you pbuk! I really like your postcode example. Just checking my understanding, hopefully useful for other beginners like me who don't fully appreciate the power of frame and dictionaries...
• From what I understand, a frame is like the generalized version of a basis. A basis, a particular type of frame, is a complete set of independent vectors. Not all frames are bases but all bases are frames. A frame is generally a complete set of dependent vectors (essentially a non-orthogonal basis).
• "Complete" means that the basis or frame has as many basis/frame vectors as the dimension of the space a vector lives in: if our vector ##X## lives in ##R3##, then any complete frame or basis would have 3 building block vectors.
• Using an analogy, frames are like the letters of an alphabet (and there are many alphabets).
• When we use complete but non-orthogonal bases, a hypothetical vector ##X## does still has a unique representation (linear superposition of the non orthogonal basis vectors) as it happens when we use bases but the coefficients cannot be easily calculated via dot products.
• My understanding is that "dictionary" and "frame" are synonyms.
• An overcomplete dictionary seems to be a frame with a number of dependent vectors that is larger than the dimensionality of the vectors we are interested in representing. Using an overcomplete dictionary, a vector ##X## does NOT have a unique representation but multiple valid representations (redundancy). Also, we don't have to use all the atoms in the dictionary to represent the vector (something we need to do when we use a basis).
The interesting part (at least for me), is that certain types of vectors can have a sparser representation in an overcomplete dictionary than in a complete basis or frame. For example, let's consider 2 different alphabets.
Alphabet 1 has 5 letters and is an overcomplete dictionary with the following atoms \$,@,%,&,*
Alphabet 2 has letters and is a complete basis with the atoms/letters O,U,W.

We could write the same word using either alphabet but we would, somehow, get both a redundant (i.e. multiple linear expansions are possible) and sparser representation using alphabet 1 even if alphabet 2 is shorter. That is cool.

Thank you!