Step 1: Function Set
Step 2: Goodness of a Function
Cross Entropy
Step 3: Find the best Function(Gradient Descent)
no squarre error
Discriminative 有时优于 Generative(几率模型:Naive Bayes)
Multi-class Classification
Softmax ==> 0<y<1
Limitation of Logistic Regression