Skip to main content

Introduction to Mathematics and Statistics for Data Science






  Hello and welcome to the Data science lessons blog. to perform any data science task mathematics knowledge and its application will be really important. in fact, it's inevitable in the data science field.


Mathematics can be divided into four parts for the Data Science field:


1) Statistics (Descriptive and Inferential):

2) Linear Algebra

3) Probability

4) Optimization 




1) Statistics:

I cannot imagine data science without this evergreen field of Statistics and its applications across the industries and research fields. basically, statistical methods help us to summerise quantitative data and to get insights out of it. it is not easy to gain any insights by just seeing raw numerical data in any way, until and unless you are a math genius!


Topics about Descriptive Statistics:

1) Mean, Median, Mode

2) IQR, percentiles

3) Std deviation and Variance

4) Normal Distribution

5) Z-statistics and T-statistics

6) correlation and linear regression


Topics about Inferential Statistics:

1) Sampling distributions

2) confidence interval 

3) chi-square test

4) Advanced regression

5) ANOVA




2) Linear algebra:


It is a branch of  Mathematics for studying systems of equations. it can be one, two, and multi-dimensional equations. it helps us to solve numerical data or relations between two or more variables by establishing relations or equations between them. for example,

here' one basic algebraic equation:
   

            y = a + bx + cx2

linear-algebra has a wide range of applications such as statics and matrices calculations, linear regression equations, descriptive statistics, graphic image vectors, Fourier series, graphs, and network establishment.

machine-learning algorithms like linear regression, logistic regression uses linear algebra to solve our target variables with given inputs/attributes or feature vectors given in the data set.




3) Probability:


Oh! what to say about probability, it's everywhere!! we all think in terms of chances, how much likely something is to happen in certain events. aren't we?

there are certain types of probability, that we should focus on:

1) independent events probability 
2) dependent events probability
3) conditional probability


bases on these we try to estimate various events, and the likelihood of the outcome. sometimes we wat graphical representations of probable outcomes which we call probability density functions or density curves.

concepts of probability help us estimate expected value from given variables, to solve confusion matrix in classification algorithms, information entropy, evidence of particular attributes in naive Bayes classification, and even in statistics for hypothesis testings. there are much more use cases than mentioned here. we will see based on the application in upcoming blogs.





4) Optimization:


Optimization is a subfield of mathematics that comprises optimizing output based on given input variables. any data set has various input variables. any during training of machine learning algorithms sometimes functions overestimates or underestimates the output variable, and in some cases, functions contain bais in output prediction in the given data set.

to estimates output and to fit the model to data well. algos optimizes training datasets and keeps iterating over and over again to increase accuracy.


Function Optimization involves three elements: the input to the function (e.g. x), the objective function itself (e.g. f()) and the output from the function (e.g. cost).

  • Input (x): The input to the function to be evaluated, e.g. a candidate solution.
  • Function (f()): The objective function or target function that evaluates inputs.
  • Cost: The result of evaluating a candidate solution with the objective function, minimized or maximized.




References:


Math resources

Statistics and probability


NOTE: if you want to contact me, here's email:  avikumar.talaviya@gmail.com



Comments

Popular posts from this blog

Introduction to conditional GANs

In this blog, we are going to see Generative adversarial networks (GAN). A generative adversarial network is a class of machine learning frameworks used for training generative models. Generative models create new data instances that resemble the training data. Given a training set, a GAN learns to generate new data with the same statistics as the training set. GANs much depend on the training loss of the model, the model tries to minimize loss to generate as real images as possible. Table of content 1)     What is GAN and How it works? 2)     What is Conditional GAN? 3)     Advantages of cGAN 4)     Pictorial explanation 5)     Use-cases   1)   What is GAN and How it works? GAN is a  generative model which achieves a high level of realism by pairing a generator with a discriminator. The generator learns to produce the ...

Optimizers supported by the PyTorch Framework

PyTorch is the fastest growing deep learning framework and it is also used by many top fortune companies like Tesla, Apple, Qualcomm, Facebook, and many more. It integrates many algorithms, methods, and classes into a single line of code to ease your day. PyTorch has many optimizer classes such as AdaDelta, Adam, and SGD to name a few. The optimizer takes the parameters we want to update, the learning rate we want to use and optimizers update weights through its step() method. In this blog, we are going to see 13 such optimizers which are supported by the PyTorch framework. Table of contents TORCH.OPTIM AdaDelta Class AdaGrad Class Adam Class AdamW Class SparseAdam Class Adamax Class LBFGS Class RMSprop Class Rprop Class SGD Class ASGD Class NAadam Class RAdam Class Conclusion         TORCH.OPTIM torch.optim  is a PyTorch package containing various optimization algorithms. Most commonly used methods for optimize...