Well, to put it more simply, quantum field theory uses a bunch of operators on a Hilbert space to describe "fields", such as the electromagnetic field etc, which give rise to "particles". In essence all these are partial differential equations, whose solutions are promoted to linear operators, via "quantization". That's all fine when the underlying PDE is simple, as is the case with the Klein-Gordon or Dirac equations. These can generally be solved, and the appropriate Hilbert space to study their "quantized" version is called a Fock space. We generally know how their solutions (called free fields) act on the Fock space.
Unfortunately, when you want to have fields interacting with each other, you end up with PDEs involving multiple different functions coupled in convoluted ways. Just Google "standard model Lagrangian". Almost every different letter you see is a different field. Now apply on that the Euler-Lagrange equations and you get an absolutely insane system of horribly coupled PDEs. Of course nobody with the whole thing at once, we just look at parts of it. One such part are the famous Yang-Mills equations: 
https://en.m.wikipedia.org/wiki/Yang–Mills_equations
The existence problem for the YM equations is a Millennium Prize problem. Actually, it's not just that one, none of the other realistic interacting QFTs have been solved in 4 spacetime dimensions. The reason the other ones aren't a Millennium Prize problem is probably mostly that many people don't think they even really have a solution, for various reasons, never mind they are still used.
But to quantize, say, the Klein-Gordon equation, the best way is to just solve the ("classical") PDE first, and then to promote the solution to an operator in a specific sense. So how are we supposed to quantize the interacting ones when we can't solve them? Physicists have some workarounds. Probably the most commonly used one is a type of functional integral called a path integral. This is another tool that came out of physics (originally to describe Brownian motion if I'm not mistaken) that has been applied to other areas of math. The idea is, you somehow integrate on a space of functions. Of course, to integrate you need a measure. For some specific functional integrals, this measure is known. For the path integrals of QFT, there is no rigorous formalization of the measure as of yet. Nevertheless, it is used.
And then on top of all of that, you do perturbation theory. What's that? Well, we have a Hilbert space we don't really know, and operators representing fields that we have quantized, nevermind the fact that we don't rigorously know how they act on that Hilbert space (or even if we can rigorously consider these actions, because they relate to PDEs we don't know the solutions to), and we want to approximate the solutions to various problems regarding their action on these spaces, using power series. Great.
Perhaps you will find it amusing to learn that these power series have divergent terms. But maybe you already heard that, and heard that you can just do renormalization, etc. Indeed, renormalization generally fixes the problem, and Epstein-Glaser theory shows how you do that rigorously, starting from first principles, in a manner that is not ad hoc. Only physicists usually don't do that and follow a much less rigorous counterterm procedure, that is easier to work with. But at least we know we can cure the divergences. Trouble is, even AFTER you cure these divergences in the terms of the series, the series STILL diverges if you include every term, as in, it has ZERO radius of convergence. The physicist answer to this? "Well I'll just keep the first few terms of the series, which don't diverge". Well, in some cases people use some other summation schemes, like Borel summation etc. But sometimes that doesn't work.
So, to summarize, we start from PDEs that we don't know how to solve or if they even have solutions, we quantize them via integration measures that don't exist, and then we approximate the solutions to various problems using series that don't converge, by just ignoring the rest of the series, at least when we can get each term to converge. And it's not even a cutting edge theory, it's been around for decades. Not just that, but it is probably THE most successful physical theory ever, that has yielded the most precise predictions. This is how physicists learn to be less formal with math.
To learn about QFT, you may be interested in these books, written mostly for mathematicians, by mathematicians:
https://www.amazon.com/dp/0821847058/?tag=pfamazon01-20
https://www.amazon.com/dp/1316510271/?tag=pfamazon01-20
The second one is essentially a more digested version of the first one, including only the things that for the most part are known, but being significantly bigger. It's also interesting to see Talagrand's comments throughout the text indicating his struggle to understand why various things work. Really that's the main strength of the book imo, the fact that when he covers something that is very suspicious but nevertheless works, he says it explicitly. However I'm not sure how much you would get out of these books without further background into physics. Maybe you could try reading the Arnold book I mentioned, then maybe something like Quantum Theory for Mathematicians by Brian Hall, and then the Talagrand book (or the Folland book if you prefer). You will also see how much of QM and QFT really is just representation theory, and see why it was a huge motivator for its development.