CRYSTIN COMPUTER SCIENCE
  • Home
  • Courses
  • About
  • Contact

Logistic Regression

 In this assignment, you will be investigating a bank dataset of 45,211 client information. You get 17 features for each client in the dataset. The 17th feature, y represents whether a client is going to subscribe a term deposit in the bank or not. You are going to model the target, y variable based on the other 16 input variables using logistic regression and its regularized version. Please note that the target (i.e., outcome), y is categorical (Links to an external site.)Links to an external site., so this is going to be a classification model.
Submission details
  1. Archive (zip) of the source codes. Please do not submit data.
  2. A README file where you mention what programming language you used, Operating system name/version, Computer architecture.
  3. A PDF/DOC/DOCX/ODT file with your answers to the following problems.
You are allowed to use any of the programming languages from the set {Matlab, Python, R, C++}. Assignments must be submitted by 11:59 pm MST on February 14, 2018. You can still turn in the assignment after the deadline. However, you lose 5 points per hour after the due time, till you get 0. Each individual assignment is worth 100 points. We cannot waive the penalty unless there is a case of illness or other substantial impediments beyond your control, with proof in documents.
Note
  • Please do not use any library function(s) that can do the logistic regression for you. 
Dataset
  • Please download the dataset from this link [ bank.csv ].
Problems
  1. Load the data into memory. Then, convert each of the categorical variables into numerical. For example, the 6th column ("housing") is a categorical variable having two values {"no","yes"}. We can replace "no" with number 0, and "yes" with number 1 in the entire 6th column.
  2. Now, implement logistic regression with SSE as loss function. You need to solve it using the "Gradient Descent" algorithm. 
  3. Perform a 10-fold cross-validation to classify the dataset using logistic regression you developed in step 2. Please report accuracy, precision, recall, F1-score in each step of cross-validation and also report the average of these individual metrics. Try with 3 different learning rates, α={0.01,0.1,1}
  4. Scale the features of the dataset using Min-Max scaling to [0,1] range, and repeat step 3. Please do not scale the y feature. And also do not scale the added column for the bias term having all 1s (i.e., x0=1 column)
  5. Scale the features of the dataset using standardization, and repeat step 3. Please do not scale the y feature. And also do not scale the added column for the bias term having all 1s (i.e., x0=1 column)
  6. Implement regularized logistic regression with SSE as loss function. Again, solve using the gradient descent algorithm.
  7. On the standardized dataset, repeat step 3 except using the regularized logistic regression you developed in step 6, by varying the parameter, λ={0,1,10,100,1000}.
  8. Summarize (using a plot, or a table) the classification performance metrics (i.e., accuracy, recall, precision, F1-score) you would obtain in each of the experiments above.

Home

About

Terms and Conditions

Contact

Copyright © 2015
  • Home
  • Courses
  • About
  • Contact