{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook for Sheet 7\n", "# Cross-validation revisited\n", "\n", "Note:On the lecture's homepage you will find the Python package wissrech2. This is a collection of updated code from previous notebooks and some new stuff that you can use for this notebook. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# some basic setup\n", "%matplotlib inline\n", "import numpy as np \n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# H1. Regularization parameter and cross validation revisited\n", "\n", "In this programming homework exercise you will continue Homework H3 from Sheet 6 and consider the estimation of the regularization parameter via k-fold cross-validation more systematically. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 1: Effective degrees of freedom\n", "\n", "Assume to be given training data $D_{\\text{train}} = \\{(x_1,y_1),\\dots,(x_n,y_n)\\}$. In this programming taks you will examine how the regularization parameter $\\lambda > 0$ is related to the degrees of freedom of the regularized kernel regression method\n", "$$\n", " L_\\lambda: D_{\\text{train}} \\mapsto \\mathrm{argmin}_{f \\in \\mathcal H} \\frac{1}{n} \\sum_{i=1}^n (y_i - f(x_i))^2 + \\lambda \\|f\\|_k^2\n", "$$\n", "\n", "The smoothing matrix associated to $L_\\lambda$ and given training points $x_1,\\dots,x_n$ is given by\n", "\n", "$$\n", " S_{\\lambda} = K(K+\\lambda I_n)^{-1},\n", "$$\n", "\n", "wheer $K=(k(x_i,x_j))$ is the kernel matrix. The smoothing matrix maps values $y = (y_1,\\dots,y_n)^T$ to the values $\\hat y = (\\hat y_1,\\dots,\\hat y_n)^T$ predicted for the training points $x_1,\\dots,x_n$ via $S_\\lambda y = \\hat y$.\n", "\n", "If $\\lambda = 0$, we effectively do interpolation and $S_0$ is just the identity. The ability to interpolate $n$ values is reflected in the trace of $S_0$, which is $\\mathrm{trace}(S_0) = n$. This motivates the following definition for $\\lambda > 0$. We say that $\\text{edf}_\\lambda := \\mathrm{trace}(S_\\lambda)$ are the effective degrees of freedom of the learning method $L_\\lambda$ at given training points $x_1,\\dots,x_n$.\n", "\n", "Task: Write a subroutine effective_degrees_of_freedom which computes for a given eigenvalue decomposition of the kernel matrix $K$ the possible effective degrees of freedom and the corresponding values of the regularization parameter $\\tilde \\lambda = n \\lambda$. To this end, numerically invert\n", "$$\n", " \\mathrm{trace}(S_\\lambda) = \\sum_{i=1}^n \\frac{\\mu_i}{n\\lambda + \\mu_i},\n", "$$\n", "where $\\mu_1,\\dots,\\mu_n$ are the eigenvalues of $K$. We recommend to use scipy.optimize.brentq. Let this subroutine search in the interval $[\\tilde \\lambda_{min}, \\tilde \\lambda_{max}]$, where\n", "$$\n", " \\tilde \\lambda_{min} = 0.0001, \\qquad \\tilde \\lambda_{max} = \\frac{n-3}{3} \\mu_{max}\n", "$$\n", "Start with $\\text{edf}=3$ and search until scipy.optimize.brentq gives you a ValueError." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Your implementation must run successfully with the follow code\n", "import wissrech2\n", "\n", "X = np.linspace(0.01,1,100)\n", "eigval, Q = wissrech2.eigenvalue_decomposition(wissrech2.gauss_kernel(X,X, 10))\n", "list_of_edfs, list_of_regparams = effective_degrees_of_freedom(eigval, Q)\n", "print('Checking computed edfs: '+\\\n", " str(np.allclose(list_of_edfs,range(3,10))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now using X = np.linspace(0.01,1,50), plot the computed effective degrees of freedom vs. the computed regularization parameter both for the Gauss kernel with $\\gamma = 10$ and $\\gamma = 100$.\n", "\n", "Question: Why do we obtain maximal effective degrees of freedom $\\text{edf} \\approx 10$ for the Gauss kernel with $\\gamma=10$ although setting $\\lambda = 0$ would in principle allow to interpolate $n=50$ data points.\n", "\n", "Your answer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: Cross-validation revisited\n", "\n", "The noisy sample values $\\text{toy}^{\\text{noisy}}$, which you have used for Notebook 5 and Sheet 6, Programming exercise H3, were generated according to the following model: $\\text{toy}^{\\text{noisy}}_i$ is a realization of the random variable\n", "$$\n", " Y_i = \\text{toy}(X^{\\text{sample}}_i) + \\varepsilon_i, \\qquad i=1,\\dots,10\n", "$$\n", "The random variables $\\varepsilon_1,\\dots,\\varepsilon_10$ are i.i.d with $\\varepsilon_i \\sim \\mathcal N(0,0.5^2)$. The sample points $X^{\\text{sample}}_i$ are fixed.\n", "\n", "Question: In the plots of Notebook 5, Task 4, you plotted a rooted mean square error. Which of the error notions discussed in Sheet 7, Exercise G2, do you approximate with that rooted mean square error?\n", "\n", "Your answer:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The package wissrech2 contains two subroutines test_edfs_cv, test_edfs_aic.\n", "\n", "* test_edfs_cv is a standalone, updated version of the class method __find_best_regparam, which you have implemented for the previous notebook. test_edfs_cv is supposed to operate on the potential edfs and corresponding regularization parameter which are computed by your implementation effective_degrees_of_freedom. It is known that error estimates based on cross-validation rather approximate the expected test error $\\mathbb E[\\mathrm{err}(L_\\lambda, X)]$ than the risk $R_{\\ell_2,P}(\\hat f_\\lambda)$, where $\\hat f_\\lambda = L_\\lambda(D_{\\text{train}})$. This holds in particular when the training data used for the different folds of cross-validation have little overlap.\n", "\n", "* test_edfs_aic provides an alternative approach to estimate the best effective degrees of freedom based on an Aikaike information criterion. Instead of cross-validation this routine computes for every potential regularization parameter $\\lambda$ the following:\n", "$$\n", " R_{\\ell_2,\\text{emp}} + \\frac{2}{N} \\mathrm{trace}(S_\\lambda) \\hat \\sigma.\n", "$$\n", "This can be considered a one-sample estimate of the expected in-sample error $\\mathrm{err}(L,x_1,\\dots,x_n)$.\n", "\n", "Task: Reproduce the plot from Notebook 5, Task 4, for the Gauss kernel with $\\gamma=100$; this time, however, you work only with the regularization parameters computed by effective_degrees_of_freedom. Next add to the same diagramme plots for the estimated errors you obtain with test_edfs_cv and effective_degrees_of_freedom. Please interprete the result you obtain. In particular, think about what the findings tell you about the given training data $(X^{\\text{sample}}, \\text{toy}^{\\text{noisy}})$.\n", "\n", "Your answer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# H2. Bone mineral density estimation\n", "\n", "This programming exercise deals with real-world data. You are given relative spinal bone mineral density measurements on 261 North\n", "American adolescents. Each value is the difference in spnbmd\n", "taken on two consecutive visits, divided by the average. The age is\n", "the average age over the two visits. The spinal bone mineral density is related to the body growth of children; high values indicated a phase of strong growth.\n", "\n", "Variables:\n", "\n", "* idnum:\t\tidentifies the child, and hence the repeat measurements\n", "* age:\t\taverage age of child when measurements were taken\n", "* gender:\t\tmale or female\n", "* spnbmd:\t\tRelative Spinal bone mineral density measurement\n", "\n", "Original source:\n", "\n", "Bachrach LK, Hastie T, Wang M-C, Narasimhan B, Marcus R. Bone Mineral\n", "Acquisition in Healthy Asian, Hispanic, Black and Caucasian Youth. A\n", "Longitudinal Study. J Clin Endocrinol Metab (1999) 84, 4702-12." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 1: Loading the data\n", "\n", "Use the Python library pandas to read in the training and test data, which you find in the files\n", "* bone_mineral_density_train.csv\n", "* bone_mineral_density_test.csv\n", "\n", "Extract data such that you obtain numpy.ndarrays for the following separately for female and male: an array of average ages, which becomes the sampling points $X$, and an array of relative spinal bone mineral density measurements, which becomes the sampling values $Y$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 2: Fitting the data\n", "\n", "Use wissrech2.RegularizedKernelRegression with the following Gauss kernels: $\\gamma = 100$, $\\gamma = 0.5$. Compute fits for both kernels as well as female and male data; use LOOCV to estimate the best effective degrees of freedom. Generate for each kernel a diagramme, where you plot both the female and male training data as well as the fit $\\hat f$ evaluated at artificially created sample points\n", "\n", "X_test = np.linspace(np.min(X_male),np.max(X_male),100).\n", "\n", "Question: For which kernel does the fit $\\hat f$ look more plausible? Why do you think is this fit more plausible?\n", "\n", "Your answer:\n", "\n", "Question: Do you have an explanation why the fit for the Gaussian kernel $\\gamma = 100$ looks so different from what you have seen in the previous programming exercises?\n", "\n", "Your answer:\n", "\n", "Question:What does the more plausible looking fit tell you about the growth of girls and boys?\n", "\n", "Your answer:\n", "\n", "Question: The original source used a cubic smoothing spline with edf=12, which corresponds to a regularization parameter $\\lambda = 0.00022$. In view of this observation, argue why the effective degrees of freedom is a useful concept.\n", "\n", "Your answer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Task 3: Predictive performance\n", "\n", "Compute the mean square training error and mean square prediction error on the test data for the fits obtained with Gauss kernel $\\gamma=100$ and Gauss kernel $\\gamma=0.5$ and automatically selected edf. Compare the errors with your visual assessment on the plausibility of the fits which you gave before.\n", "\n", "Your answer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# your code goes here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }