Nonconvex optimization: Challenges and Recent Successes

ICML2016 Tutorial

Information

  • New York, NY  [ICML2016] 
  • Jun. 19th, 2:30-4:30pm   
  • Presenter: Anima Anandkumar 
  • Slides:  [Slides] 
  • Topic Overview

    Most machine learning tasks require solving non-convex optimization problems. For these problems, as the data dimensions grow, the number of critical points can grow exponentially. Local search methods such as gradient descent can get stuck in one of these critical points. Finding the globally optimal solution is computationally hard in the worst case for nonconvex optimization. Instead, over the last few years, focus has shifted in characterizing transparent conditions for non-convex problems which are tractable. In many instances, these conditions turn out to be mild and natural for machine learning applications. This tutorial will provide an overview of the recent theoretical success stories in nonconvex optimization. This includes learning latent variable models, dictionary learning, robust principal component analysis, and so on. Simple iterative methods such as spectral methods, alternating projections, and so on, can be proven to learn consistent models with polynomial sample and computational complexity. This tutorial will present main ingredients towards establishing these results. For instance, one approach is to first obtain an approximate solution which lands in the basin of the globally optimal solution, and then run gradient descent to reach the global optimum. Another approach is to discard the original objective function, and to instead optimize over different objective functions which are better behaved. For example, the use of moment fitting objective instead of maximum likelihood for learning latent variable models allows us to establish consistency in estimation. For problems where finding the globally optimal solution is not tractable, it is desirable to at least reach a local optimum. It turns out that even this is computationally hard in the worst case. In this tutorial, we will cover the important problem of escaping saddle points. Saddle points are critical points which are not local minima, meaning there exist directions where the objective value decreases (for minimization problems). Saddle points can slow down gradient descent arbitrarily. Alternatively, if Newton's method is run, it converges to an arbitrary critical point, and does not distinguish between a local minimum and a saddle 1 point. Solutions such as Trust region methods and recent work on noisy stochastic gradient descent will be covered. There are certainly many challenging open problems in the area of non-convex optimization. While guarantees have been established in individual instances, there is no common unifying theme of what makes non-convex problem tractable. Many challenging instances such as optimization for training multi-layer neural networks or analyzing novel regularization techniques such as dropout for non-convex optimization still remain wide open. On the practical side, conversations between theorists and practitioners can help identify what kind of conditions are reasonable for specific applications, and thus lead to the design of practically motivated algorithms for non-convex optimization with rigorous guarantees. The tutorial will cover these issues as well.

    References

  • "Non-convex Optimization in Machine Learning: Provable Guarantees Using Tensor Methods"
    By Majid Janzamin. PhD thesis. Link
  • "Discovery of Latent Factors in High-dimensional Data Using Tensor Methods"
    By Furong Huang. PhD thesis. Link
  • "Efficient approaches for escaping higher order saddle points in non-convex optimization"
    By Anima Anandkumar, Rong Ge. Conference on Learning Theory (COLT), New-York City, USA, June 2016. Link
  • "Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition"
    By Rong Ge, Furong Huang, Chi Jin, Yang Yuan. Conference on Learning Theory (COLT), Paris, France, June 2015. Link
  • "Tensor Decompositions for Learning Latent Variable Models" by A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade and M. Telgarsky.┬áJournal of Machine Learning Research 15 (2014) 2773-2832. Link