Here are some of the useful books to get you started for research in theoretical machine learning. Books with * are my favorite ones.

Machine Learning from Bayesian Perspective:

  • Machine Learning: A Probabilistic Perspective, Kevin Murphy, 2013, Link*

  • Pattern Recognition and Machine Learning, Christopher Bishop, 2006, Link*

  • Information Theory, Inference and Learning Algorithms, David J. C. MacKay, 2003, Link*

  • The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, second edition, Jan 2017, Link

  • An Introduction to Statistical Learning with Applications in R, Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, 2017 printing, Link

  • Foundations of Machine Learning, Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar, Second Edition, 2018, Link

  • Bayesian Reasoning and Machine Learning, David Barber, 2012, Link

  • Probabilistic Graphical Models: Principles and Techniques, Daphne Koller and Nir Friedman, 2009

  • Graphical Models, Exponential Families, and Variational Inference, A. Martin Wainwright and Michael I. Jordan, 2008, Link*

  • Graphical models, Steffen Lauritzen, 1991*

Probability Theory:

  • Probability: Theory and Examples, Rick Durrett, fifth edition, 2019, Link

  • Foundations of modern probability, Olav Kallenberg, second edition, 2002, Link*

  • Probability and Stochastics, Erhan Çinlar, 2011, Link

  • Poisson Processes, John Kingman, 1993, Link*

  • Probabilistic Symmetries and Invariance Principles, Olav Kallenberg, 2005, Link

Statistics:

  • Asymptotic Statistics, Aad van der Vaart, 1998, Link*

  • All of Statistics: A Concise Course in Statistical Inference, Larry A. Wasserman, 2004*

  • All of Nonparametric Statistics, Larry A. Wasserman, 2005

  • High-Dimensional Statistics: A Non-Asymptotic Viewpoint, Martin Wainwright, 2019, Link*

  • Testing statistical hypotheses, Erich Leo Lehmann and Joseph Romano, third edition, 2005, Link

  • Weak Convergence and Empirical Processes with Applications to Statistics, Aad W. van der VaartJon A. Wellner, 1996, Link

  • Empirical Processes in M-Estimation, Sara van de Geer, 2009

  • Introduction to Nonparametric Estimation, Alexandre B. Tsybakov, 2003, Link

  • Information Geometry and Its Applications, Shun'ichi Amari, 2016*

  • Differential-geometrical methods in statistics, Shun'ichi Amari, 1985

Bayesian Analysis:

  • Bayesian Data Analysis, Andrew Gelman, John Carlin, Aki Vehtari, Hal S. Stern, Donald Rubin, David Dunson, third edition, 2014, Link*

  • A First Course in Bayesian Statistical Methods, Peter D. Hoff, 2009, Link*

  • The Bayesian Choice, Christian P Robert, second edition, 2007, Link

  • Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, 2005, Link*

  • Fundamentals of Nonparametric Bayesian Inference, Aad van der Vaart and Subhashis Ghosal, 2017, Link

  • Bayesian nonparametrics, Jayanta Kumar Ghosh, 2003, Link*

  • Combinatorial Stochastic Processes, Jim Pitman, 2006, Link*

  • Exchangeability and Related Topics, David Aldous, 1985, Link*

Markov Chian Monte Carlo Methods:

  • Monte Carlo Strategies in Scientific Computing, Jun S. Liu, 2001, Link*

  • Monte Carlo Statistical Methods, Christian P Robert and George Casella, second edition, 2004

  • Markov Chain Monte Carlo in Practice, Editors: David Spiegelhalter, W. R. Gilks, Sylvia Richardson, 1996

Optimization:

  • Convex Optimization, Stephen P. Boyd and Lieven Vandenberghe, 2004, Link*

  • Optimization Models, G.C. Calafiore and L. El Ghaoui, 2014, Link

  • Convex Optimization Algorithms, Dimitri Bertsekas, 2015

  • Introductory Lectures on Convex Optimization, Yurii Nesterov, 2003, Link

  • Convex Analysis, Tyrrell Rockafellar, 1970, Link*