Machine learning


  • A big part of the utility of math (especially in ML) is having breadth rather than depth. The strategy of picking out specific things you don't know from papers and looking them up is only effective if you have the breadth in your background to understand the answers you find.

    • Broad knowledge is also what helps you manage the exponential tree of complexity you're encountering.

      • You won't have seen all the things you come across, but you'll develop the ability to make good judgements about what you need to read to achieve your goals. You'll learn how to recognize when a reference you're reading is more (or less) technical than you need, and how to search for something more appropriate. You'll also learn how and when you can use results without understanding the details.

    • Finally, as a general grad student strategy trying to learn everything just in time is not a path to success. Even if you had the perfect math oracle that you want it would be setting you up to be left behind. All the oracle gives you is the ability to catch up quickly to the ideas of others. Your job as a grad student is to generate new knowledge and to do that you need to seek things out on your own, not just follow along the latest trend. Part of your job is to go out hunting for ideas that your peers haven't found yet and bring them back to your field.

  • AI doesn't need to follow the human model, just like planes don't need to flap their wings like a bird. For most jobs AI will be very different from humans. Even when AI acts as human for entertainment I would imagine them being very different internally, as their job is to mimic aspects of human behaviors, not actually a human as a whole.

  • Almost all of machine learning is about representing data as vectors and performing linear and non-linear transformations in order to perform classification, regression, etc.

  • Most of ML is fitting models to data. To fit a model you minimize some error measure as a function of its real valued parameters, e.g. the weights of the connections in a neural network. The algorithms to do the minimization are based on gradient descent, which depends on derivatives, i.e. differential calculus.