Hello and welcome to the Data science lessons blog. to perform any data science task mathematics knowledge and its application will be really important. in fact, it's inevitable in the data science field.
Mathematics can be divided into four parts for the Data Science field:
1) Statistics (Descriptive and Inferential):
2) Linear Algebra
3) Probability
4) Optimization
1) Statistics:
I cannot imagine data science without this evergreen field of Statistics and its applications across the industries and research fields. basically, statistical methods help us to summerise quantitative data and to get insights out of it. it is not easy to gain any insights by just seeing raw numerical data in any way, until and unless you are a math genius!
Topics about Descriptive Statistics:
1) Mean, Median, Mode
2) IQR, percentiles
3) Std deviation and Variance
4) Normal Distribution
5) Z-statistics and T-statistics
6) correlation and linear regression
Topics about Inferential Statistics:
1) Sampling distributions
2) confidence interval
3) chi-square test
4) Advanced regression
5) ANOVA
2) Linear algebra:
It is a branch of Mathematics for studying systems of equations. it can be one, two, and multi-dimensional equations. it helps us to solve numerical data or relations between two or more variables by establishing relations or equations between them. for example,
here' one basic algebraic equation:
y = a + bx + cx2
linear-algebra has a wide range of applications such as statics and matrices calculations, linear regression equations, descriptive statistics, graphic image vectors, Fourier series, graphs, and network establishment.
machine-learning algorithms like linear regression, logistic regression uses linear algebra to solve our target variables with given inputs/attributes or feature vectors given in the data set.
3) Probability:
Oh! what to say about probability, it's everywhere!! we all think in terms of chances, how much likely something is to happen in certain events. aren't we?
there are certain types of probability, that we should focus on:
1) independent events probability
2) dependent events probability
3) conditional probability
bases on these we try to estimate various events, and the likelihood of the outcome. sometimes we wat graphical representations of probable outcomes which we call probability density functions or density curves.
concepts of probability help us estimate expected value from given variables, to solve confusion matrix in classification algorithms, information entropy, evidence of particular attributes in naive Bayes classification, and even in statistics for hypothesis testings. there are much more use cases than mentioned here. we will see based on the application in upcoming blogs.
4) Optimization:
Optimization is a subfield of mathematics that comprises optimizing output based on given input variables. any data set has various input variables. any during training of machine learning algorithms sometimes functions overestimates or underestimates the output variable, and in some cases, functions contain bais in output prediction in the given data set.
to estimates output and to fit the model to data well. algos optimizes training datasets and keeps iterating over and over again to increase accuracy.
Function Optimization involves three elements: the input to the function (e.g. x), the objective function itself (e.g. f()) and the output from the function (e.g. cost).
- Input (x): The input to the function to be evaluated, e.g. a candidate solution.
- Function (f()): The objective function or target function that evaluates inputs.
- Cost: The result of evaluating a candidate solution with the objective function, minimized or maximized.
References:
Math resources
Statistics and probability
NOTE: if you want to contact me, here's email: avikumar.talaviya@gmail.com
Comments
Post a Comment