pedagogic talk mainly based on
Nanotemper Technologies · Munich · 10 June 2016
F. Alexander Wolf |
Institute of Computational Biology
Helmholtz Zentrum München
Machine learning in robotics,
natural language processing, neuroscience research,
and computer vision.
It's statistics using models with higher complexity.
These might yield higher precision but are less interpretable.
Comments
Learn function f:R28×28→{2,4}.
In which way are samples, e.g. for the label y=2, similar to each other?
▷ Strategy: find coordinates = features that reveal the similarity!
▷ Here PCA: diagonalize the covariance matrix (x⊤i⋅xj)Nij=1.
Model function ˆf: estimator ˆyx for Y given X=x ˆyx=ˆfD(x)=Ep(y|x,D)[y]=1k∑i∈Nk(x,D)yi
Model function ˆf: estimator for Y given X=x ˆyx=ˆfD(x)=Ep(y|x,D)[y]=1k∑i∈Nk(x,D)yi Probabilistic model definition reflects uncertainty p(y|x,D)=1k∑i∈Nk(x,D)I(yi=y)
Estimator for y given x ˆyx=ˆfθ(x)=Ep(y|x,θ)[y]=w0+x⊤w Probabilistic model definition p(y|x,θ)=N(y|ˆyx,σ),θ=(w0,w,σ) Estimate parameters from data D θ∗=argmaxθp(θ|D)
Estimate parameters from data D
θ∗=argmaxθp(θ|D,model,beliefs),Optimization
assuming a model and prior beliefs about parameters.
Now
p(θ|D)=p(D|θ)p(θ)/p(D).Bayes′ rule
Evaluate: assume uniform prior p(θ) and iid samples (yi,xi)
p(θ|D)∝p(D|θ)=N∏i=1p(yi,xi|θ)∝N∏i=1p(yi|xi,θ)
Linear regression: logp(θ|D)≃∑Ni=1(yi−ˆfx1)2 ▷ least squares!
A Neural Network consists of layered linear regressions (one for each neuron) stacked with non-linear activation functions ϕ.
▷ Instead of D2, only D independent components.
▷ Instead of D2, only 2d nonzero components. ▷ Statistics ☺!
Consider an example (d=1)
W=(⋱−110⋱⋱0−11⋱) ⇔ ˜ft=ft−ft−1,
that is, f =
↦˜f =
▷ Simple edge structures revealed! Just as the simple cells in V1!
Thank you!