when it comes to neural nets required essential reading is: functional proxies, second dual spaces, euler-lagrange geodesic characterisics, hamiltonians and the generator of the energy kernal, vector proxies for differential operators and familiarity with basic identities for div grad and curl on spaces with positive ricci curvature, tangent spaces for non-smooth manifolds.
do not read books on neural nets saying: nets are a childrens picture supposed to look like a brain capable of appromiating a very small number of functions. layers are pointless and the set of fucntions possible is the same as in one layer, consciousness is unlikely to appear at layer 10. it is unclear why adding parameters at o(data) is good.
they definitely are useful: excellent if you want to invert a deterministic function and you cannot analytically. so at bs vol/delta/greeks from price and all permuations for example they will be better than any series you can find and very fast so i use them there.