Your model is WAY too complex. 4 layers with 50 neurons each is over 10K parameters. How large is your dataset? You’d want to have thousands of data points for each parameter due to large noise levels. So, unless you have over 10 million rows of data, your model will produce pretty random results out of sample.
Making the model too complex is a pretty common rookie mistake in ML. The temptation is to create the “greatest” model ever which turns into the most complex model. But, there is something called the bias/variance trade off. It is one of the fundamentals of ML. Quite frankly, getting the model complexity right is quite challenging and is one of the reasons why ML is sometimes referred to as an art.
There are lots of books to help get you started with ML, but one of the best ones focused on these foundational concepts is “Learning from Data: a short course” by Yaser Abu-Mostafa, et all.
Applying ML to trading is non-trivial. If it were easy, everyone would be doing it

One of the main things it requires is knowing ML very well. I don’t mean knowing how to use the available libraries, that’s not knowing ML

Even knowing enough ML to implement the algorithms (NN or gradient boosted trees) yourself is not enough, although it is a necessary step. You really need to know what is going on underneath the covers; because what’s going on there is not magic

If you don’t, then you’re just throwing shit against the wall. And, there are plenty of people who do that, but their results out of sample look nothing like what they expect.
My other piece of advice would be to forget neural networks and deep learning. Yes, it is very powerful and all that, but it is not the most efficient approach to the trading problem, which is a tabular data problem. Gradient boosted trees are pretty much the state of the art for tabular data problems. This doesn’t mean that you can throw XGBoost or LightGBM at your data and have a tradeable model. It’s just that most people have an easier time understanding what’s going on in a tree model than understanding what’s going on in a neural network; also gradient boosted trees should be way faster to train than NNs. And, if you understand what’s happening underneath the covers, then you’ll understand that both approaches should produce very similar models if you have things tuned properly.
Anyhow, good luck on your journey, it’s a fascinating one.