Build a Natural Language Processing Transformer from Scratch

jonjacson · Jun 12, 2023

I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.

I wonder if anybody knows how to build and train one from scratch or if there is any book, video, or website explaining it.

Thanks

Baluncore · Jun 12, 2023

https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)

jonjacson · Jun 12, 2023

Baluncore said:

https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)

But I don't see a python implementation, just the theory.

Baluncore · Jun 12, 2023

jonjacson said:

I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.

Then you need to understand the theory.

jonjacson said:

But I don't see a python implementation, just the theory.

You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.

pbuk · Jun 12, 2023

jonjacson said:

But I don't see a python implementation, just the theory.

But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?

jonjacson · Jun 13, 2023

Baluncore said:

Then you need to understand the theory.You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.

I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.

pbuk said:

But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?

I don't want to use libraries.

PeterDonis · Jun 13, 2023

jonjacson said:

I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.

Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.

jonjacson · Jun 13, 2023

PeterDonis said:

Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.

The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.

At the end I guess this will be a whole bunch of vectors, matrices and operations on them.

Hopefully now you understand what I want to know.

PeterDonis · Jun 13, 2023

jonjacson said:

The problem is that this looks like a magic thing

That problem doesn't look to me like a "find Python code" problem. It looks to me like an "learn and understand the theory" problem, as @Baluncore has already pointed out.

Baluncore · Jun 13, 2023

jonjacson said:

The problem is that this looks like a magic thing, ...

“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.

jonjacson · Jun 14, 2023

Baluncore said:

“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.

Nice, but still there is no basic example of this anywhere.

Baluncore · Jun 14, 2023

jonjacson said:

Nice, but still there is no basic example of this anywhere.

It is only magic because you do not yet understand the theory. If you were given some version of the Python code, you would still not understand the theory. It would still be magic, and a danger to the uninitiated.

pbuk · Jun 14, 2023

jonjacson said:

The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.

For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.

jonjacson said:

Hopefully now you understand what I want to know.

Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/

jonjacson · Jun 14, 2023

pbuk said:

For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/

I am not saying that theory is bad or unnecessary. What I am looking for is a numerical example.

Schrödinger equation is fine, but once you compute the orbitals of the hydrogen atom you get a better understanding.

I don't understand why it is bad to ask for numerical examples and numbers.

Your edit was great and it is what I was looking for, I add the link you have at the end of that article:

https://jalammar.github.io/illustrated-transformer/

And something I just found:

https://e2eml.school/transformers.html

I hope this helps anybody interested on this topic.

Thanks to all for your replies.

Edit:

This may be good too:

Build a Natural Language Processing Transformer from Scratch

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

A Crisis for Newly Minted CompSci Majors -- entry level jobs gone

Learning Assembly and computer architecture for x86

Use of AI (ML/DL) in Science

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers