Skip to main content

Introduction to Mathematics and Statistics for Data Science






  Hello and welcome to the Data science lessons blog. to perform any data science task mathematics knowledge and its application will be really important. in fact, it's inevitable in the data science field.


Mathematics can be divided into four parts for the Data Science field:


1) Statistics (Descriptive and Inferential):

2) Linear Algebra

3) Probability

4) Optimization 




1) Statistics:

I cannot imagine data science without this evergreen field of Statistics and its applications across the industries and research fields. basically, statistical methods help us to summerise quantitative data and to get insights out of it. it is not easy to gain any insights by just seeing raw numerical data in any way, until and unless you are a math genius!


Topics about Descriptive Statistics:

1) Mean, Median, Mode

2) IQR, percentiles

3) Std deviation and Variance

4) Normal Distribution

5) Z-statistics and T-statistics

6) correlation and linear regression


Topics about Inferential Statistics:

1) Sampling distributions

2) confidence interval 

3) chi-square test

4) Advanced regression

5) ANOVA




2) Linear algebra:


It is a branch of  Mathematics for studying systems of equations. it can be one, two, and multi-dimensional equations. it helps us to solve numerical data or relations between two or more variables by establishing relations or equations between them. for example,

here' one basic algebraic equation:
   

            y = a + bx + cx2

linear-algebra has a wide range of applications such as statics and matrices calculations, linear regression equations, descriptive statistics, graphic image vectors, Fourier series, graphs, and network establishment.

machine-learning algorithms like linear regression, logistic regression uses linear algebra to solve our target variables with given inputs/attributes or feature vectors given in the data set.




3) Probability:


Oh! what to say about probability, it's everywhere!! we all think in terms of chances, how much likely something is to happen in certain events. aren't we?

there are certain types of probability, that we should focus on:

1) independent events probability 
2) dependent events probability
3) conditional probability


bases on these we try to estimate various events, and the likelihood of the outcome. sometimes we wat graphical representations of probable outcomes which we call probability density functions or density curves.

concepts of probability help us estimate expected value from given variables, to solve confusion matrix in classification algorithms, information entropy, evidence of particular attributes in naive Bayes classification, and even in statistics for hypothesis testings. there are much more use cases than mentioned here. we will see based on the application in upcoming blogs.





4) Optimization:


Optimization is a subfield of mathematics that comprises optimizing output based on given input variables. any data set has various input variables. any during training of machine learning algorithms sometimes functions overestimates or underestimates the output variable, and in some cases, functions contain bais in output prediction in the given data set.

to estimates output and to fit the model to data well. algos optimizes training datasets and keeps iterating over and over again to increase accuracy.


Function Optimization involves three elements: the input to the function (e.g. x), the objective function itself (e.g. f()) and the output from the function (e.g. cost).

  • Input (x): The input to the function to be evaluated, e.g. a candidate solution.
  • Function (f()): The objective function or target function that evaluates inputs.
  • Cost: The result of evaluating a candidate solution with the objective function, minimized or maximized.




References:


Math resources

Statistics and probability


NOTE: if you want to contact me, here's email:  avikumar.talaviya@gmail.com



Comments

Popular posts from this blog

The Ultimate Data Visualization Guide For Beginners

  Hello and welcome to the data science blog site. today I am going to talk about some other sides of the data science field which you might have been aware of or not, that is nothing but 'the arts'. yeah! you heard it right artistic skills are really important to present your data science solutions that you have figured out from data modeling or crunching your data from various sources etc. if you can't present and tell your story to your audience then your solution has no meaning at all. you have to sell your story effectively and visually compellingly way, and that's where data visualization comes into the picture. Here's what we are going to cover: 1) Ideas on visualizations 2) Storytelling 3) Visual display of data 1) Ideas on visualizations: What is data visualizations:   data visualization is nothing but visualizing structured, raw, and numerical data in various forms of charts and graphs to let your audience understand data. that's no big deal, a simple

Introduction to conditional GANs

In this blog, we are going to see Generative adversarial networks (GAN). A generative adversarial network is a class of machine learning frameworks used for training generative models. Generative models create new data instances that resemble the training data. Given a training set, a GAN learns to generate new data with the same statistics as the training set. GANs much depend on the training loss of the model, the model tries to minimize loss to generate as real images as possible. Table of content 1)     What is GAN and How it works? 2)     What is Conditional GAN? 3)     Advantages of cGAN 4)     Pictorial explanation 5)     Use-cases   1)   What is GAN and How it works? GAN is a  generative model which achieves a high level of realism by pairing a generator with a discriminator. The generator learns to produce the target output, while the discriminator learns to distinguish true data from the output o