In this assignment, you will be investigating a bank dataset of 45,211 client information. You get 17 features for each client in the dataset. The 17th feature, y represents whether a client is going to subscribe a term deposit in the bank or not. You are going to model the target, y variable based on the other 16 input variables using logistic regression and its regularized version. Please note that the target (i.e., outcome), y is categorical (Links to an external site.)Links to an external site., so this is going to be a classification model. Submission details
Archive (zip) of the source codes. Please do not submit data.
A README file where you mention what programming language you used, Operating system name/version, Computer architecture.
A PDF/DOC/DOCX/ODT file with your answers to the following problems.
You are allowed to use any of the programming languages from the set {Matlab, Python, R, C++}. Assignments must be submitted by 11:59 pm MST on February 14, 2018. You can still turn in the assignment after the deadline. However, you lose 5 points per hour after the due time, till you get 0. Each individual assignment is worth 100 points. We cannot waive the penalty unless there is a case of illness or other substantial impediments beyond your control, with proof in documents. Note
Please do not use any library function(s) that can do the logistic regression for you.
Dataset
Please download the dataset from this link [ bank.csv ].
Problems
Load the data into memory. Then, convert each of the categorical variables into numerical. For example, the 6th column ("housing") is a categorical variable having two values {"no","yes"}. We can replace "no" with number 0, and "yes" with number 1 in the entire 6th column.
Now, implement logistic regression with SSE as loss function. You need to solve it using the "Gradient Descent" algorithm.
Perform a 10-fold cross-validation to classify the dataset using logistic regression you developed in step 2. Please report accuracy, precision, recall, F1-score in each step of cross-validation and also report the average of these individual metrics. Try with 3 different learning rates, α={0.01,0.1,1}
Scale the features of the dataset using Min-Max scaling to [0,1] range, and repeat step 3. Please do not scale the y feature. And also do not scale the added column for the bias term having all 1s (i.e., x0=1 column)
Scale the features of the dataset using standardization, and repeat step 3. Please do not scale the y feature. And also do not scale the added column for the bias term having all 1s (i.e., x0=1 column)
Implement regularized logistic regression with SSE as loss function. Again, solve using the gradient descent algorithm.
On the standardized dataset, repeat step 3 except using the regularizedlogistic regression you developed in step 6, by varying the parameter, λ={0,1,10,100,1000}.
Summarize (using a plot, or a table) the classification performance metrics (i.e., accuracy, recall, precision, F1-score) you would obtain in each of the experiments above.