Multilayered neural networks are among the most powerful models of machine learning.
the reasons for their success defy mathematical understanding. Learn a function represented by an artificial
neural network requires optimization of a large nonconvex risk function, a problem
generally attacked by a gradient descent called stochastic gradient descent (SGD), very close in
spirit to Langevin's dynamics in physics.
The reason for the success of SGD optimization is still not understood. SGD converges to
an overall risk optimum or only at a local optimum? In the first case, does this happen because
local minima are absent or because SGD avoids them in some way? In the second case, why the local minima
achieved by SGD have good generalization properties? Obviously, the answer to these questions requires a
understanding of the dynamics of large complex landscapes, such as the one considered in
spin glass theory.
There has been a lot of work in neuron networks using statistical physics, using spin glass techniques. In particular, in a set of pioneering papers, Saad and Solla have shown how one can analyze a simple case, namely twolayer neural networks, or community machine, with the style Physics methods: description of the Fokker style Plank and Langevin. This description makes it possible to "average" the complexity of the neural network landscape.
More generally, studies of artificial neural networks have a long history in statistical mechanics of disordered magnetic systems. The purpose of this thesis is to study the dynamics of learning using these techniques
Similar Positions

Ph D Position F/M Semantics And Compilation Of State Machines In An Interactive Theorem Prover, Inria, France, about 10 hours ago
Initial work on the Vélus compiler treated the basic features of Lustre: simple equations, initialized delays, and function instantiations. Recent PhD work has extended the semantic model and comp...