Overview:
1. Overview of Linear Regression.
2. Polynomial Regression.
3. Choose the best degree of polynomial.
4. Overfitting (High variance) and Underfitting (High bias).
4. Overfitting (High variance) and Underfitting (High bias).
5. Errortraining_data Vs Errorcross-validation - determining polynomial degree.
6. Pseudo Code for Ocatve.
Let's Begin:
Linear Regression:
Linear regression is an approach for modeling one or more independent
attributes (x1, x2, x3, ….. xn) and
the dependent attribute (y).
We can also say linear regression is the technique to fit a
best straight line to our data set regardless of the dimension. But sometimes,
a straight line wouldn’t be the best plot for the data given; such situations
may demand a curve as the best fit to the data.
i.e. The squared error with curve << The squared error with the linear line
Hence to plot a curve we use the technique widely known as
polynomial regression
For more info on implementing Linear regression:
Polynomial Regression:
It is a type of linear regression in which the relationship between the independent
attributes (x1, x2, x3, ….. ,xn) and
the dependent attribute (y) is modeled as an nth degree polynomial.
For Example look at the plots below:
Fig-1
In the above plot we see that, for the data set given a straight line (linear) is not the best fit because the error is more, but the curve in the RHS fits the data set good with very less error.
How to Choose the degree of Polynomial:
Now
since we have known what is polynomial regression, it is even more
important to know how to choose the degree of the polynomial.
For
a 2D data set which can be easily visualized- by seeing the data set
one can tell approximately what degree of polynomial will fit the data
best.
For example by seeing the above data set in Fig-1, one can easily say that a 2nd order or 4th order polynomial curve would best fit the above data set, but again that's just a guess.
The polynomial degree for the above data set is easy to guess because its a 2D data set, what about a data set that is higher in dimension probably 4D or 5D (4 or 5 feature dataset), knowing the fact that we can't visualize a 4D or 5D or higher dimensional dataset, guessing the degree of polynomial is not recommended. So what do we do?
In such a case we would train our data set with different polynomial degrees and see- using which degree of polynomial the error or j_theta is minimum for both training data and cross-validation data. Despite such simple solution in doing so we need to take care of overfitting and underfitting which we shall cover below.
Overfitting and Underfitting:
It is best that we understand Overfitting and Underfitting with plots and example.
Let us take a simple example: suppose we have a 2D data set as given below with X as the independent attribute and Y as the dependent attribute on X or the output or class. We divide the data set in two parts
1. Training Data (TR) - The data on which we train the algorithm.
2. Cross-validation data (CV)
- We use the parameters (weight) derived from the training dataset
multiply it to the attributes of the cross-validation data and compare
the hypothesis to the original output.
We use gradient descent or normal equation on the 2D data given above to find the theta value for both training data set and cross-validation data set for different polynomial degrees (2nd, 4th, 6th) and plot the hypothesis curve for each.
Given below are the plots of the hypothesis curve for both training data and the cross-validation data set
Fig-2
Plotting the hypothesis curve with linear, 2nd, 4th and 6th degree polynomial on Cross-validation Data:
Fig-3
What's with the plots:
The 4th plot of Fig-2 is the hypothesis curve for 6th degree polynomial regression, it fits the training data very well (it touches all the point) and gives zero error (j_theta), whereas in the 4th plot of Fig-3 it fails to fit the cross-validation data giving maximum error. This situation is called overfitting- where the hypothesis curve fits the training data very well with almostzero error but fails to give a good hypothesis for a new data.
errortraining_data <<< errorcross-validation
The 3rd plot of Fig-2 is a hypothesis curve for 4th degree polynomial and fits the training data very well with negligible error, and also in Fig-3 it completely fits the cross-validation data with approximately zero error. hence we conclude that using 4th degree polynomial for a new data set would give best hypothesis than any other degree polynomial.
Conclusion drawn- underfitting and overfitting plays an important role while choosing the degree of polynomial.
Errortraining_data Vs Errorcross-validation - determining polynomial degree.
The above 2D data set- by visualizing the data or the hypothesis curve one can easily tell what order of polynomial would best fit the data set, but when it comes to multidimensional data set since the data cannot be visualized it becomes challenging to select a polynomial degree for the data set.
In such a case we find the error for multiple polynomial degrees for both the training data and the cross-validation data and select the one with minimum error on both data set.
Intuition:
Considering the above data set, For training data on which we train our algorithm- the error (cost function j_theta) decreases as the polynomial degree increases. But for the cross-validation data on which we apply the theta value (parameters) obtained from training the training data set- the error (cost function j_theta) decreases till certain polynomial degree and then increases usuriously.
Below plot will make the intuition clear:
If you see the plot in the LHS the blue line shows that the error decreases as the degree of polynomial increases for training data set. But for the cross-validation data set the red line indicates that the error decreases till it reaches its minimum at 4th degree and then the error increases exorbitantly.
The above plot draws a conclusion that 4th degree polynomial best fits our training as well as cross-validation data considering the error driven in both case.
Hence plotting degree of polynomial to error function plot one can decide the degree of polynomial best fit to their data set.
1. Loading the training data set
x=load('tr_x.ext');
X=x;
y = load ('tr_y.ext');
m=length(y);
2. Take polynomial degrees for the X
x1=[ones(m,1),X];
x2=[ones(m,1),X,X.^2];
x4=[ones(m,1),X,X.^2,X.^3,X.^4];
x6=[ones(m,1),X,X.^2,X.^3,X.^4,X.^5,X.^6];
x8=[ones(m,1),X,X.^2,X.^3,X.^4,X.^5,X.^6,X.^7,X.^8];
[m,n]=size(x);
3. Loading the cross validation data set
x_cv= load ('cv_x.ext');
X_cv=x_cv;
y_cv= load ('cv_y.ext');
m_cv=length(y_cv);
4. Take polynomial degrees for the cross_validation data
x1_cv=[ones(m_cv,1),X_cv];
x2_cv=[ones(m_cv,1),X_cv,X_cv.^2];
x4_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4];
x6_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4,X_cv.^5,X_cv.^6];
x8_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4,X_cv.^5,X_cv.^6,X_cv.^7,X_cv.^8];
[m_cv,n_cv]=size(x_cv);
5. Take x vals with to plot curve
X_vals = (min(X):0.05:max(X))';
6. Take polynomial degrees for the cross_validation data
x_vals1=[ones(size(X_vals)),X_vals];
x_vals2=[ones(size(X_vals)),X_vals,X_vals.^2];
x_vals4=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4];
x_vals6=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4,X_vals.^5,X_vals.^6];
x_vals8=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4,X_vals.^5,X_vals.^6,X_vals.^7,X_vals.^8];
7. Algorithm
a={x1,x2,x4,x6,x8};
b={x_vals1,x_vals2,x_vals4,x_vals6,x_vals8};
c={x1_cv,x2_cv,x4_cv,x6_cv,x8_cv};
for i=1:length(a);
x=a{i}; % Training data set
x_cv=c{i}; % Cross validation data set
x_vals=b{i}; % x_vals feature data set
% Finding theta using normal equation
theta=(x'*x)\ x'*y;
h_tr=x * theta;
h_cv=x_cv * theta;
% Capturing the error
j_theta_tr(i)=(0.5/m).* (h_tr - y)' * (h_tr - y);
j_theta_cv(i)=(0.5/m_cv).* (h_cv - y_cv)' * (h_cv- y_cv);
% Calculating hypothesis curve using the theta and x_vals
h_vals_tr=x_vals * theta;
% Plotting hypothesis curve for training dataset 2D
figure;
plot(x(:,2), y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
hold on
plot (x_vals(:,2), h_vals_tr,'b', 'MarkerSize', 8)
hold off
% Plotting training data's hypothesis curve for cross-validation dataset 2D
figure;
plot(x_cv(:,2), y_cv, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
hold on
plot (x_vals(:,2), h_vals_tr,'b', 'MarkerSize', 8)
hold off
end
% Plotting the error for training and cross-validation data set
figure;
plot ([1,2,4,6,8],j_theta_tr,'b', 'MarkerSize',13)
xlabel('Polynomial Degree')
ylabel('j_tr (Cost function training data)')
figure;
plot ([1,2,4,6,8],j_theta_cv, 'r', 'MarkerSize', 13)
xlabel('Polynomial Degree')
ylabel('j_cv (Cost function crossvalidation data)')
hold off
In such a case we find the error for multiple polynomial degrees for both the training data and the cross-validation data and select the one with minimum error on both data set.
Intuition:
Considering the above data set, For training data on which we train our algorithm- the error (cost function j_theta) decreases as the polynomial degree increases. But for the cross-validation data on which we apply the theta value (parameters) obtained from training the training data set- the error (cost function j_theta) decreases till certain polynomial degree and then increases usuriously.
Below plot will make the intuition clear:
The above plot draws a conclusion that 4th degree polynomial best fits our training as well as cross-validation data considering the error driven in both case.
Hence plotting degree of polynomial to error function plot one can decide the degree of polynomial best fit to their data set.
Pseudo Code for Octave:
clear all;clc;1. Loading the training data set
x=load('tr_x.ext');
X=x;
y = load ('tr_y.ext');
m=length(y);
2. Take polynomial degrees for the X
x1=[ones(m,1),X];
x2=[ones(m,1),X,X.^2];
x4=[ones(m,1),X,X.^2,X.^3,X.^4];
x6=[ones(m,1),X,X.^2,X.^3,X.^4,X.^5,X.^6];
x8=[ones(m,1),X,X.^2,X.^3,X.^4,X.^5,X.^6,X.^7,X.^8];
[m,n]=size(x);
3. Loading the cross validation data set
x_cv= load ('cv_x.ext');
X_cv=x_cv;
y_cv= load ('cv_y.ext');
m_cv=length(y_cv);
4. Take polynomial degrees for the cross_validation data
x1_cv=[ones(m_cv,1),X_cv];
x2_cv=[ones(m_cv,1),X_cv,X_cv.^2];
x4_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4];
x6_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4,X_cv.^5,X_cv.^6];
x8_cv=[ones(m_cv,1),X_cv,X_cv.^2,X_cv.^3,X_cv.^4,X_cv.^5,X_cv.^6,X_cv.^7,X_cv.^8];
[m_cv,n_cv]=size(x_cv);
5. Take x vals with to plot curve
X_vals = (min(X):0.05:max(X))';
6. Take polynomial degrees for the cross_validation data
x_vals1=[ones(size(X_vals)),X_vals];
x_vals2=[ones(size(X_vals)),X_vals,X_vals.^2];
x_vals4=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4];
x_vals6=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4,X_vals.^5,X_vals.^6];
x_vals8=[ones(size(X_vals)),X_vals,X_vals.^2,X_vals.^3,X_vals.^4,X_vals.^5,X_vals.^6,X_vals.^7,X_vals.^8];
7. Algorithm
a={x1,x2,x4,x6,x8};
b={x_vals1,x_vals2,x_vals4,x_vals6,x_vals8};
c={x1_cv,x2_cv,x4_cv,x6_cv,x8_cv};
for i=1:length(a);
x=a{i}; % Training data set
x_cv=c{i}; % Cross validation data set
x_vals=b{i}; % x_vals feature data set
% Finding theta using normal equation
theta=(x'*x)\ x'*y;
h_tr=x * theta;
h_cv=x_cv * theta;
% Capturing the error
j_theta_tr(i)=(0.5/m).* (h_tr - y)' * (h_tr - y);
j_theta_cv(i)=(0.5/m_cv).* (h_cv - y_cv)' * (h_cv- y_cv);
% Calculating hypothesis curve using the theta and x_vals
h_vals_tr=x_vals * theta;
% Plotting hypothesis curve for training dataset 2D
figure;
plot(x(:,2), y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
hold on
plot (x_vals(:,2), h_vals_tr,'b', 'MarkerSize', 8)
hold off
% Plotting training data's hypothesis curve for cross-validation dataset 2D
figure;
plot(x_cv(:,2), y_cv, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
hold on
plot (x_vals(:,2), h_vals_tr,'b', 'MarkerSize', 8)
hold off
end
% Plotting the error for training and cross-validation data set
figure;
plot ([1,2,4,6,8],j_theta_tr,'b', 'MarkerSize',13)
xlabel('Polynomial Degree')
ylabel('j_tr (Cost function training data)')
figure;
plot ([1,2,4,6,8],j_theta_cv, 'r', 'MarkerSize', 13)
xlabel('Polynomial Degree')
ylabel('j_cv (Cost function crossvalidation data)')
hold off
Hi Sam,
ReplyDeleteThe blog looks really promising.But could you please post or provide any introduction about Machine Learning for a begginers like me.
Sure dude, will be doing it in the next post. stay tuned
ReplyDeleteThank you Sam sir,
ReplyDeleteIts really helpful to understand the concept of Polynomial Regression, Overfitting, Underfitting, Choosing polynomial degree.Sir could you please give introduction about Big data and its application .thank u