Learning Data Science and Machine Learning: First Steps

  • What should one do after learning how to code? Are there topics that help you strengthen your foundations for data science?
  • I hate math, and there are either very basic tutorials or too deep for me. Can you recommend a compact yet comprehensive course on Math and Statistics?
  • How much math is enough to start learning how ML algorithms work?
  • What are some essential statistics topics to get started with data analysis or data science?

The Three Pillars of Data Science & ML

3 Pillars of Data Science and ML

1. Essential Programming

Most data roles are programming-based except for a few like business intelligence, market analysis, product analyst, etc.

  • Common data structures (data types, lists, dictionaries, sets, tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and working with external libraries.
  • Writing python scripts to extract, format, and store data into files or back to databases.
  • Handling multi-dimensional arrays, indexing, slicing, transposing, broadcasting and pseudorandom number generation using NumPy.
  • Performing vectorized operations using scientific computing libraries like NumPy.
  • Manipulate data with Pandas — series, dataframe, indexing in a dataframe, comparison operators, merging dataframes, mapping, and applying functions.
  • Wrangling data using Pandas — checking for null values, imputing it, grouping data, describing it, performing exploratory analysis, etc.
  • Data Visualization using Matplotlib — the API hierarchy, adding styles, color, and markers to a plot, knowledge of various plots and when to use them, line plots, bar plots, scatter plots, histograms, boxplots, and seaborn for more advanced plotting.

2. Essential Mathematics

There are practical reasons why math is essential for folks who want a career as an Machine Learning practitioner, Data Scientist, or Deep Learning Engineer.

  • Basic algebra — variables, coefficients, equations, and linear, exponential, logarithmic functions, etc.
  • Linear Algebra — scalars, vectors, tensors, Norms (L1 & L2), dot product, types of matrices, linear transformation, representing linear equations in matrix notation, solving linear regression problem using vectors and matrices.
  • Calculus — derivatives and limits, derivative rules, chain rule (for backpropagation algorithm), partial derivatives (to compute gradients), the convexity of functions, local/global minima, the math behind a regression model, applied math for training a model from scratch.
  • Estimates of location — mean, median, and other variants of these.
  • Estimates of variability
  • Correlation and covariance
  • Random variables — discrete and continuous
  • Data distributions — PMF, PDF, CDF
  • Conditional probability — Bayesian statistics
  • Commonly used statistical distributions — Gaussian, Binomial, Poisson, Exponential
  • Important theorems — Law of large numbers and Central limit theorem.
  • Inferential StatisticsA more practical and advanced branch of statistics that helps in designing hypothesis testing experiments, pushes us to understand the meaning of metrics deeply and at the same time helps us in quantifying the significance of the results.
  • Important testsStudent’s t-Test, Chi-Square test, ANOVA test, etc.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Priya Reddy

Priya Reddy

Hey This Is priya Reddy Iam a tech writer