If you know C++ and assembly, then you're almost there already. I've worked on embedded systems, and normally each source module is all C, C++, or assembly, and not mixed (no _asm() directives used in C code, because it interferes with the optimizer). Generally the amount of assembly code is small, mostly for the operating system used in a device, and perhaps some time critical routines. In the case of a cpu with a lot of registers, like an ARM with 16 registers, it's difficult to improve on the optimization of the code, but you need assembly to deal with context switching and dealing with interrupts (ARM uses secondary shadow registers for fast interrupt, normal interrupt, supervisor mode, ... ), or any math routine that needs to use low level hardware like the carry bit or paired registers.
The other thing to learn is how parameters are used in calls (registers versus stack), and how names are "mangled" in C or C++ so you can use the proper parameters and names in assembly. This can usually be done by writing a test module in C or C++ and having the compiler produce assembly code, which gives you the parameter setup and names.
Unless you get involved with the operating system itself, there's not much to learn. Most companies will just buy a basic operating system or hire a consultant if no one in house is famliar with the internals of an operating system.