I worked on Regresission problems for three weeks. I read concepts and learned how to implement linear regression in python by studying Machine learning using a python book from Dinesh Kumar published by Wiley India Pvt Ltd.
I built projects and multiple regression models to understand concepts, here in this blog I am sharing all concepts I learned:-
Table of contents:
-
Regression fundamentals and concepts
-
The regression equation and coefficients (image)
-
Python implementation (image)
-
Metrics and errors (Image)
-
Optimization and regulation (image)
-
Other points to keep in mind
1) Regression fundamentals and concept
> Regression is widely used in business applications such as sales prediction, house price prediction or cost of the patient in a hospital prediction, etc
> regression models fit the best line to inputs(X) to predict the output (Y)
2) Regression equation and coefficients
> Linear regression model follows very simple equation as shown below
> simply independent variables learn coefficients from the inputs and predict dependent variables
3) Python implementation
> In python you can implement using statsmodels API and sci-kit
learn's linear models
> statsmodels OLS API will give more detailed information on regression metrics such as residuals plot, P-P Plot, cook's distance, and leverage value
> Scikit learn has linear as well tree base regression models as well
> Scikit learn it more ML-based implementation which is very much useful in industry
4) Metrics for regression and errors
> Three metrics are a must for regression
1) Mean squared error
2) Mean absolute error
3) R-squared
> SEE THE IMAGE BELOW TO FOR THE R2 CALCULATION
5) Optimization and regularization
> Optimization is fundamental to every algorithm, you optimize the loss function to minimize the error as much as possible
> for optimization linear models use gradient descent algorithm to find the least error regression model
> Elastic net>
> encode categorical features and keep only highly effective features only
> regression algorithms assume your data has no outliers or has gaussian distribution
> build a baseline and iterate over and over it
End of the blog
Comments
Post a Comment