Take Machine_Learning_I_Exam_WS_14_15 and compare your solution. From the course Machine Learning at Technische Universität Berlin (TU Berlin).
Es gibt nur eine richtige Antwort. Falsche Antworten geben 0 Punkte genauso wie keine Antwort.
The Bayes error for classification is:
Independent Component Analysis can be achieved by:
The K-means algorithm:
A biased estimator is sometimes used to:
Which is False? The Restricted Boltzmann machine is:
Sketch a two-dimensional two-class supervised dataset that Fishers linear discriminant and the linear hard-margin SVM would learn separating bounderies with different directions. Your example must be stereotyping for the two classification techniques.
Sketch a two-dimensional unsupervised dataset for which K-means (K=3) may get stuck at a local optimum. In your drawing show (using square markers) the position of the centroids for the local minimum and (with circels) the position of the global minimum. Your example must be stereotyping in order to show the difference between the local and global minimum.
In order for a kernel to be positive semid-definite (PSD) it must satisfy ∑i=1n∑j=1ncicjk(xi,xj)≥0 for all sequences of data points and coefficients.
Show that if k1 and k2 are PSD kernels, then k3=αk1+βk2 with α,β≥0 is also a PSD kernel.
Give an example showing that the positive semi-definiteness is not guaranteed if α<0 or β<0
A feature map associated to a kernel k must satisfy: ⟨ϕ(xi),ϕ(xj)⟩=k(xi,xj)∀i,j∈X Find the feature map of k3=αk1+βk2 and show that it fulfills the equation above.
We consider a discrete probability distribution p(i), i={1..10}. We can represent such probability distribution as a vector p indexed by i subject to the constraints pi≥0 and ∑i=110pi=1. We would like to find analytically the probability distribution p with maximum entropy. The Entropy is given by H(p)=−∑i=110pilogpi and is a concave function
Write down the Lagrangian function associated to this constrained optimization problem.
Show using the Lagrangian method that the optimal probability distribution is uniform with p(i)=0.1 for all i.
Explain briefly why the same Lagrange method cannot be used to find probability distribution with minimum entropy
We consider the regularized regression task this is solved by optimizing minW∑k=1N(yk−w1xk)2 subject to ∣∣w∣∣∞≤1 where w is the solution and (x,y)k is a dataset of N input-output pairs.
Show that the optimization problem can be rewritten as: minWwTXTXw−2yTXw subject to wi≤1 and wi≥−1
You have at your disposal a quadratic solver quadprog(Q,1,A,b)
that is solving
minVvTQv+1Tv subject to Av≤b
Write down the code that builds the numpy array Q,1,A,b
from the data X
and y
.