A big part of the utility of math (especially in ML) is having breadth rather than depth. The strategy of picking out specific things you don't know from papers and looking them up is only effective if you have the breadth in your background to understand the answers you find.
Broad knowledge is also what helps you manage the exponential tree of complexity you're encountering.
You won't have seen all the things you come across, but you'll develop the ability to make good judgements about what you need to read to achieve your goals. You'll learn how to recognize when a reference you're reading is more (or less) technical than you need, and how to search for something more appropriate. You'll also learn how and when you can use results without understanding the details.
Finally, as a general grad student strategy trying to learn everything just in time is not a path to success. Even if you had the perfect math oracle that you want it would be setting you up to be left behind. All the oracle gives you is the ability to catch up quickly to the ideas of others. Your job as a grad student is to generate new knowledge and to do that you need to seek things out on your own, not just follow along the latest trend. Part of your job is to go out hunting for ideas that your peers haven't found yet and bring them back to your field.
In supervised learning, you have a bunch of data, a specific question you want to answer, and access to the correct answer to many instances of that question. In unsupervised learning, you have a bunch of data points, and you want to find meaningful patterns in the structure of that data. In reinforcement learning, you have a task you want to take actions to accomplish, and you don't have any access to knowing what the best action is, but after each action you get a rough idea of how good the result was.
AI doesn't need to follow the human model, just like planes don't need to flap their wings like a bird. For most jobs AI will be very different from humans. Even when AI acts as human for entertainment I would imagine them being very different internally, as their job is to mimic aspects of human behaviors, not actually a human as a whole.
Almost all of machine learning is about representing data as vectors and performing linear and non-linear transformations in order to perform classification, regression, etc.
Most of ML is fitting models to data. To fit a model you minimize some error measure as a function of its real valued parameters, e.g. the weights of the connections in a neural network. The algorithms to do the minimization are based on gradient descent, which depends on derivatives, i.e. differential calculus.
Deep Learn - Implementation of research papers on Deep Learning+ NLP+ CV in Python using Keras, TensorFlow and Scikit Learn.
Machine Learning for Humans - Great article.
KubeFlow - Machine Learning Toolkit for Kubernetes.
TL-GAN: transparent latent-space GAN - Use supervised learning to illuminate the latent space of GAN for controlled generation and edit.
Grokking Deep Learning - Repository accompanying "Grokking Deep Learning" book.
Grenade - Deep Learning in Haskell.