Build a Natural Language Processing Transformer from Scratch

Click For Summary
SUMMARY

The forum discussion revolves around building a Natural Language Processing (NLP) transformer from scratch using pure Python, without relying on libraries like TensorFlow or PyTorch. Participants emphasize the importance of understanding the underlying theory, which includes concepts such as multi-head self-attention, encoder-decoder architecture, and layer normalization. Several resources were shared, including articles that provide numerical examples and visual explanations of transformers. The consensus is that while theory is crucial, practical examples are necessary for a comprehensive understanding.

PREREQUISITES
  • Understanding of NLP concepts and terminology
  • Familiarity with transformer architecture and its components
  • Basic knowledge of Python programming
  • Awareness of machine learning principles
NEXT STEPS
  • Study the theory behind transformers, focusing on components like multi-head self-attention and encoder-decoder structures
  • Explore the resource "Illustrated Transformer" for visual explanations
  • Read "E2E ML School: Transformers" for practical insights and examples
  • Investigate pure Python implementations of transformers to understand the coding aspect
USEFUL FOR

Machine learning enthusiasts, NLP researchers, and Python developers interested in understanding and implementing transformers without relying on external libraries.

jonjacson
Messages
450
Reaction score
38
TL;DR
I wonder if anybody knows how to build and train one from scratch or if there is any book, video, or website explaining it.
I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.

I wonder if anybody knows how to build and train one from scratch or if there is any book, video, or website explaining it.

Thanks
 
Technology news on Phys.org
jonjacson said:
I have read that transformers are the key behind recent success in artificial intelligence but the problem is that it is quite opaque.
Then you need to understand the theory.

jonjacson said:
But I don't see a python implementation, just the theory.
You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.
 
jonjacson said:
But I don't see a python implementation, just the theory.
But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?
 
  • Informative
Likes   Reactions: berkeman
Baluncore said:
Then you need to understand the theory.You did not ask for python code.
Google: python code for NPL transformer

There will be more answers from others.
I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.

pbuk said:
But you didn't ask for a Python implementation, you asked about building one from scratch!

If I wanted to find a Python machine learning algorithm related to [X] I would input "Tensorflow X" into a search engine. Have you tried this?
I don't want to use libraries.
 
jonjacson said:
I see answers but they use libraries like pytorch or tensorflow. I mean from scratch, pure python.
Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.
 
PeterDonis said:
Even if you don't use libraries, looking at the source code for the libraries might be a good way of learning how these things are done in Python.

If searching the web doesn't turn up any Python implementations that don't use libraries, that's probably a clue that everyone else who has tried what you are trying has found it easier to use the well-tested implementations in the libraries than to try and roll their own.

The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.

At the end I guess this will be a whole bunch of vectors, matrices and operations on them.

Hopefully now you understand what I want to know.
 
jonjacson said:
The problem is that this looks like a magic thing
That problem doesn't look to me like a "find Python code" problem. It looks to me like an "learn and understand the theory" problem, as @Baluncore has already pointed out.
 
  • Like
Likes   Reactions: russ_watters, pbuk, Tom.G and 1 other person
  • #10
jonjacson said:
The problem is that this looks like a magic thing, ...
“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.
 
  • Like
Likes   Reactions: russ_watters
  • #11
Baluncore said:
“Any sufficiently advanced technology is indistinguishable from magic”.
Arthur C. Clarke's third law.

Nice, but still there is no basic example of this anywhere.
 
  • #12
jonjacson said:
Nice, but still there is no basic example of this anywhere.
It is only magic because you do not yet understand the theory. If you were given some version of the Python code, you would still not understand the theory. It would still be magic, and a danger to the uninitiated.
 
  • Like
Likes   Reactions: russ_watters, PeterDonis and pbuk
  • #13
jonjacson said:
The problem is that this looks like a magic thing, I don't know why is it "hidden" behind the bogus language "deep learning", "encoder", "decoder", "tokeninez input embeeding", "multi head self attention", "layer normalization", "feed forward network", "residual connection".... and all that stuff.
For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.

jonjacson said:
Hopefully now you understand what I want to know.
Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/
 
  • #14
pbuk said:
For the same reason that quantum mechanics is hidden behind the bogus language "complex projective space", "Hermitian operators", "Hamiltonians", "eigenstates", "superpositions" and all that stuff.

At the end this is just a whole bunch of vectors, matrices and operations on them.Yes, you want to do QM without learning the theory. Good luck.

Edit: or is this the kind of thing you are looking for: https://habr.com/en/companies/ods/articles/708672/

I am not saying that theory is bad or unnecessary. What I am looking for is a numerical example.

Schrödinger equation is fine, but once you compute the orbitals of the hydrogen atom you get a better understanding.

I don't understand why it is bad to ask for numerical examples and numbers.

Your edit was great and it is what I was looking for, I add the link you have at the end of that article:

https://jalammar.github.io/illustrated-transformer/

And something I just found:

https://e2eml.school/transformers.html

I hope this helps anybody interested on this topic.

Thanks to all for your replies.

Edit:

This may be good too:

 
Last edited:

Similar threads

Replies
4
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 35 ·
2
Replies
35
Views
5K
  • · Replies 12 ·
Replies
12
Views
2K
Replies
2
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 7 ·
Replies
7
Views
3K