Skip to main content

Linear Regression: everything you need to know 👔




 I worked on Regresission problems for three weeks. I read concepts and learned how to implement linear regression in python by studying Machine learning using a python book from Dinesh Kumar published by Wiley India Pvt Ltd.

I built projects and multiple regression models to understand concepts, here in this blog I am sharing all concepts I learned:-



Table of contents:

  1. Regression fundamentals and concepts

  2. The regression equation and coefficients (image)

  3. Python implementation (image)

  4. Metrics and errors (Image)

  5. Optimization and regulation (image)

  6. Other points to keep in mind




1) Regression fundamentals and concept > Regression is widely used in business applications such as sales prediction, house price prediction or cost of the patient in a hospital prediction, etc > regression models fit the best line to inputs(X) to predict the output (Y)





2) Regression equation and coefficients > Linear regression model follows very simple equation as shown below > simply independent variables learn coefficients from the inputs and predict dependent variables





3) Python implementation > In python you can implement using statsmodels API and sci-kit
learn's linear models
> statsmodels OLS API will give more detailed information on regression metrics such as residuals plot, P-P Plot, cook's distance, and leverage value

> Scikit learn has linear as well tree base regression models as well > Scikit learn it more ML-based implementation which is very much useful in industry




4) Metrics for regression and errors > Three metrics are a must for regression 1) Mean squared error 2) Mean absolute error 3) R-squared > SEE THE IMAGE BELOW TO FOR THE R2 CALCULATION






5) Optimization and regularization > Optimization is fundamental to every algorithm, you optimize the loss function to minimize the error as much as possible > for optimization linear models use gradient descent algorithm to find the least error regression model

> when optimizing your models, it has good chances that they can be overfitted to training data so there are some extended algorithms that are used you regulate model error function, like > Ridge > Lasso
> Elastic net>



6) Other key points to note while building regression models > check the multi-collinearity of your features and remove them > mostly you will need standardized or normalized data for linear regression


> encode categorical features and keep only highly effective features only > regression algorithms assume your data has no outliers or has gaussian distribution > build a baseline and iterate over and over it


End of the blog



Comments

Popular posts from this blog

Introduction to Mathematics and Statistics for Data Science

  Hello and welcome to the Data science lessons blog. to perform any data science task mathematics knowledge and its application will be really important. in fact, it's inevitable in the data science field. Mathematics can be divided into four parts for the Data Science field: 1) Statistics (Descriptive and Inferential): 2) Linear Algebra 3) Probability 4) Optimization  1) Statistics: I cannot imagine data science without this evergreen field of Statistics and its applications across the industries and research fields. basically, statistical methods help us to summerise quantitative data and to get insights out of it. it is not easy to gain any insights by just seeing raw numerical data in any way, until and unless you are a math genius! Topics about Descriptive Statistics: 1) Mean, Median, Mode 2) IQR, percentiles 3) Std deviation and Variance 4) Normal Distribution 5) Z-statistics and T-statistics 6) correlation and linear regression Topics about Inferential Statistics: 1) S...

Introduction to conditional GANs

In this blog, we are going to see Generative adversarial networks (GAN). A generative adversarial network is a class of machine learning frameworks used for training generative models. Generative models create new data instances that resemble the training data. Given a training set, a GAN learns to generate new data with the same statistics as the training set. GANs much depend on the training loss of the model, the model tries to minimize loss to generate as real images as possible. Table of content 1)     What is GAN and How it works? 2)     What is Conditional GAN? 3)     Advantages of cGAN 4)     Pictorial explanation 5)     Use-cases   1)   What is GAN and How it works? GAN is a  generative model which achieves a high level of realism by pairing a generator with a discriminator. The generator learns to produce the ...

Optimizers supported by the PyTorch Framework

PyTorch is the fastest growing deep learning framework and it is also used by many top fortune companies like Tesla, Apple, Qualcomm, Facebook, and many more. It integrates many algorithms, methods, and classes into a single line of code to ease your day. PyTorch has many optimizer classes such as AdaDelta, Adam, and SGD to name a few. The optimizer takes the parameters we want to update, the learning rate we want to use and optimizers update weights through its step() method. In this blog, we are going to see 13 such optimizers which are supported by the PyTorch framework. Table of contents TORCH.OPTIM AdaDelta Class AdaGrad Class Adam Class AdamW Class SparseAdam Class Adamax Class LBFGS Class RMSprop Class Rprop Class SGD Class ASGD Class NAadam Class RAdam Class Conclusion         TORCH.OPTIM torch.optim  is a PyTorch package containing various optimization algorithms. Most commonly used methods for optimize...