First, if you don't know much about
Apache Spark you can read through
this tutorial from
tutorialspoint.com. As prerequisites before this reading, you must have some prior exposure to
Scala programming, (at least) to the basic database concepts and some experience on some Linux distro.
Then - if you haven't already done it, you must learn how to make ML pipelines with
PySpark. There are tutorials for this, like
this from
tutorialspoint.com. There are of course other good tutorials as well, which you can find by googling.
Now, for a
Java backend you ask and assuming that you are thinking about writing everything related to your ML model(s) in Python and then calling a Python script in Java, you can write Java code to do some processing - for example some form of batch processing task(s) in order to do some predictions in a deep learning model , export the preprocessed data to .csv or .json format and then call your Python script from
bash for instance, passing the parameters. You can take a look at http://digital-thinking.de/how-to-using-deep-learning-models-within-the-java-ecosystem/ example of using deep learning models in Java ecosystem at
digital-think.de, in order to get the idea of the process.
Needless to say that in order to accomplish the specific goal(s) you have, you'll need to mix and match things accordingly. I don't think that you can find a start-to-finish tutorial for the whole thing regarding the specific goal you have in mind.