{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notebook for Exercise Sheet 6:\n",
    "# Cross-validation, Gaussian processes\n",
    "\n",
    "For all tasks of this notebook, the following conditions apply: (1) You can reuse any code from the previous notebook. (2) You can use any standard python routines, including anything from <tt>numpy</tt> and <tt>scipy</tt>. (3) Except for evaluating the Gauss kernel, you are not allowed to use anything from the <tt>sklearn</tt>-package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Task: Importing the code from sheet 5\n",
    "\n",
    "Make implemented subroutines and the data from the previous Notebook 5 available here. To this end, create a file <tt>sheet5.py</tt> where you copy the required code and import the python file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# put setup and required imports here\n",
    "from scipy.linalg import cholesky\n",
    "from scipy.linalg import solve_triangular\n",
    "from scipy.linalg import sove"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimating the regularization parameter via k-fold cross-validation\n",
    "\n",
    "In this programming task, you will use a cross-validation scheme based on Theorem 47 to estimate a good the regularization parameter.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task 1: Class RegularizedKernelRegression\n",
    "\n",
    "Write a Python class <tt>RegularizedKernelRegression</tt> which can be used to compute the solution of\n",
    "$$\n",
    "  \\min_{f \\in \\mathcal H} \\sum_{i=1}^N (f(x_i) - y_i)^2 + \\lambda \\|f\\|_k.  \n",
    "$$\n",
    "The regularization parameter $\\lambda$ is chosen for given data via k-fold cross validation, see https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation . Concretely, given a set of possible regularization parameters $\\Lambda$, the algorithm selects that $\\lambda \\in \\Lambda$ which produces the smallest $L_2$-error\n",
    "$$\n",
    " \\frac{1}{N} \\sum_{i=1}^N (y_i - f^{\\kappa(i)}(x_i))^2.\n",
    "$$\n",
    "Here, $\\kappa: \\{1,\\dots,N\\} \\to \\{1,\\dots,k\\}$ maps the data index $i$ to the index $j$ of the validation set, which the data points has been randomly assigned to. $f^j$ refers to the solution where the points from the $j$th validation set have been removed from the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import sheet5\n",
    "from scipy.linalg import solve\n",
    "\n",
    "class RegularizedKernelRegression:\n",
    "    def __init__(self, kernel, regparams):\n",
    "        \"\"\"\n",
    "        Creates a  RegularizedKernelRegression object.\n",
    "        \n",
    "        The regularization parameter best suited to given data is selected\n",
    "        from a user-specific list of regularization parameters via k-fold\n",
    "        cross-validation.\n",
    "        \n",
    "        Parameters:\n",
    "            kernel - function object which takes two array-like arguments X and Y\n",
    "            regparams - list of potential regularization parameters\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "    \n",
    "    def __find_best_regparam(self, eigval, Q, y, k):\n",
    "        \"\"\"\n",
    "        Auxiliary method used to estimate the best regularization parameter.\n",
    "        \n",
    "        Parameters:\n",
    "            eigval - array-like with N elements; the eigenvalues\n",
    "            Q      - matrix of shape (N,N); the normalized (unit “length”) eigenvectors,\n",
    "                     such that the column Q[:,i] is the eigenvector corresponding\n",
    "                     to the eigenvalue eigval[i].\n",
    "            y      - array-like with shape (N,); sample values\n",
    "            k      - int; the number validation sets, k=N leads to LOOCV\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "        \n",
    "    def fit(self, X, y, cv = True, k='LOOCV', regp = 0):\n",
    "        \"\"\"\n",
    "        Fits the given N data.\n",
    "        \n",
    "        If cv == True, the regularization parameter is estimated using\n",
    "        k-fold cross-validation. By default, LOOCV is used (k=N). If N is not a\n",
    "        multiple of k, then an ValueError is raised. If cv == False,\n",
    "        the user-provided regularization parameter regp is used.\n",
    "        \n",
    "        Parameters:\n",
    "            X    - array-like with shape (N,); sample points \n",
    "            y    - array-like with shape (N,); sample values\n",
    "            cv   - boolean; indicates if the regularization parameter should\n",
    "                   be estimated using cross-validation.\n",
    "            k    - int; the number validation sets, k=N leads to LOOCV\n",
    "            regp - user-specified regularization parameter that is used if\n",
    "                   cross-validation is turned off.\n",
    "        \n",
    "        \"\"\"\n",
    "        # your code goes here  \n",
    "            \n",
    "    def predict(self, X):\n",
    "        \"\"\"\n",
    "        Predicts function values at the given points using the computed fit.\n",
    "        \n",
    "        Parameters:\n",
    "            X - array-like with shape (M,)            \n",
    "        Returns:\n",
    "            y_pred - array-like with shape (M,); the estimated function values\n",
    "        \"\"\"\n",
    "        # your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your must run successfully through the following test:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "kernel100 = lambda X,Y: sheet5.gauss_kernel(X,Y, 100)\n",
    "reg = RegularizedKernelRegression(kernel100,np.linspace(0.001, 15, 100))\n",
    "\n",
    "target_values = [0.65779935,  0.33784479,  0.89531297,  1.6475294,   1.20401756,  0.69247441,\n",
    " -0.84429611,  0.19573349,  1.32045238, -0.32945063]\n",
    "\n",
    "reg.fit(sheet5.X_sample, sheet5.toy_noisy)\n",
    "print('Checking LOOCV-reg. param: '+\\\n",
    "      str(np.allclose(reg.fitted_regparam, 0.001)))\n",
    "print('Checking predicted values: '+\\\n",
    "      str(np.allclose(reg.predict(sheet5.X_sample),target_values)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task 2: How good are the estimated regularization parameters?\n",
    "\n",
    "Let <tt>X_sample = np.linspace(0,1,40)</tt> and <tt>X_full</tt> as in Notebook 5.Task 4. Generate four lines of noisy observations <tt>toy_noisy = toy(X_sample) + 0.5*Noise</tt> where <tt>Noise \\sim N(0,1)</tt>. For each of the lines of observations, generate two diagrams, one for the Gauss kernel with $\\gamma=100$, the other for the Gauss kernel with $\\gamma = 10$. Generate plots like in Notebook 5.Task 4 and add the following to each of the plots:\n",
    "\n",
    "* the estimated regularization parameter obtained via LOOCV and the corresponding CV-$L_2$-error (red dot)\n",
    "* $10$ estimates for the regularization parameter + corresponding CV-$L_2$-error using CV with $k=5$ (blue dots)\n",
    "* $10$ estimates for the regularization parameter + corresponding CV-$L_2$-error using CV with $k=2$ (yellow dots)\n",
    "\n",
    "What do the plots tell you? Write some text here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gaussian processes\n",
    "\n",
    "In this progamming task, you will implement a Gaussian process with prior mean $m=0$ and covariance kernel\n",
    "$$\n",
    " k_w(x,y) = \\exp\\left(-\\frac{1}{2w^2}\\|x-y\\|_2^2\\right) \n",
    "$$\n",
    "with length scale $w$. The assumed noise in the data is described via the noise level $\\sigma^2$. \n",
    "\n",
    "\n",
    "### Task 1: Gaussian process implementation\n",
    "Provide the details for the following Python class below. In contrast to previous tasks on regularized regression, work with the Cholesky decomposition here instead of an eigenvalue decomposition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class GaussianProcess:\n",
    "    def __init__(self, length_scale, noise_level):\n",
    "        \"\"\"\n",
    "        Creates an instance of the GaussianProcess.\n",
    "        \n",
    "        Parameters:\n",
    "            kernel - function object which takes two array-like arguments X and Y\n",
    "            noise_level - float; the noise level\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "\n",
    "    def fit(self, X,y):\n",
    "        \"\"\"\n",
    "        Fits the Gaussian process to the given N observations\n",
    "        \n",
    "        Parameters:\n",
    "            X - array-like with shape (N,d) or (N,) if d==1\n",
    "            y - array-like with shape (N,)\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "        \n",
    "    def predict(self, X):\n",
    "        \"\"\"\n",
    "        Predicts the Gaussian process at the given points X.\n",
    "        \n",
    "        If the Gaussian process has not been fitted to data yet,\n",
    "        mean and covariance of the prior in the points X are returned\n",
    "        \n",
    "        Parameters:\n",
    "            X - array-like with shape (M,d) or (M,) if d==1\n",
    "            \n",
    "        Returns:\n",
    "            mu - matrix with shape (M,d); posterior mean in the points X\n",
    "            sig - matrix with shape (M,M); posterior covariance in the points X\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "    \n",
    "    def generate_sample_path(self, X, n=1):\n",
    "        \"\"\"\n",
    "        Generates sample paths of the posterior distribution\n",
    "        computed in the points X.\n",
    "        \n",
    "        If the Gaussian process has not been fitted to any data yet,\n",
    "        then sample paths of the prior are generated.\n",
    "        \n",
    "        To assure positive definiteness of the posterior covariance,\n",
    "        0.000001 noise is added to the posterior covariance.\n",
    "        \n",
    "        Parameters:\n",
    "            X - array-like with shape (M,)\n",
    "            n - number of sample paths (default n=1)\n",
    "            \n",
    "        Returns:\n",
    "            f - array with shape (M,n) or (M,) if n==1\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "    \n",
    "    def plot_sample_paths(self, ax, X, n=1):\n",
    "        \"\"\"\n",
    "        Adds sample paths of the Gaussian process to the plot.\n",
    "        \n",
    "        If the Gaussian process has been fitted to data, then\n",
    "        realizations of the posterior distribution in the points X\n",
    "        are generated, otherwise realizations of the prior distribution\n",
    "        in X.\n",
    "        \n",
    "        Parameters:\n",
    "            X - array-like with shape (M,)\n",
    "            n - number of sample paths (default n=1)\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "            \n",
    "        \n",
    "    def plot_prediction(self, ax, X, legend=False):\n",
    "        \"\"\"\n",
    "        Adds a plot to the given axis object which visualizes\n",
    "        the posterior distribution computed in the points X.\n",
    "        \n",
    "        If the Gaussian process has not been fitted to any data yet,\n",
    "        then the prior distribution is visualized.\n",
    "        \n",
    "        The points X are depicted as blue dots, the posterior mean\n",
    "        is depicted as a red line and the 95%-confindence region is\n",
    "        depicted as filled area around the posterior mean with color lightgray.\n",
    "        As title the length scale and the noise level are printed.\n",
    "        \n",
    "        Parameters:\n",
    "            X      - array-like with shape (M,)\n",
    "            ax     - axis object\n",
    "            legend - if True a legend is depicted in the plot\n",
    "        \"\"\"\n",
    "        # your code goes here\n",
    "            "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your code has to pass the following tests successfully:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "gp = GaussianProcess(np.sqrt(1/20), 0.0001)\n",
    "X_obs = [0,0.25,0.5,0.75,1.0]\n",
    "y_obs = [0,0.8,0.5,0.3, 1.0]\n",
    "\n",
    "target_mu = [[  5.75901382e-05]\n",
    " ,[  7.99894861e-01]\n",
    " ,[  4.99981486e-01]\n",
    " ,[  3.00057226e-01]\n",
    " ,[  9.99871278e-01]]\n",
    "\n",
    "target_sig = [[  9.99842618e-05,   1.17243789e-08,  -6.67412986e-09,   3.35724040e-09,\n",
    "   -1.29059649e-09],\n",
    " [  1.17243789e-08,   9.99756334e-05,   1.64210410e-08,  -8.62783973e-09,\n",
    "    3.35724040e-09],\n",
    " [ -6.67412986e-09,   1.64210410e-08,   9.99735192e-05,   1.64210411e-08,\n",
    "   -6.67412986e-09],\n",
    " [  3.35724040e-09,  -8.62783973e-09,   1.64210411e-08,   9.99756334e-05,\n",
    "    1.17243788e-08],\n",
    " [ -1.29059649e-09,   3.35724040e-09,  -6.67412986e-09,   1.17243788e-08,\n",
    "    9.99842618e-05]]\n",
    "\n",
    "gp.fit(X_obs,y_obs)\n",
    "X = np.linspace(0.0,1.0,5)\n",
    "mu, sig = gp.predict(X)\n",
    "\n",
    "print('Checking posterior mean: '+\\\n",
    "     str(np.allclose(mu,target_mu)))\n",
    "\n",
    "print('Checking posterior covariance: '+\\\n",
    "     str(np.allclose(sig,target_sig)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Task 2: Meaning of the length scale and the noise level \n",
    "\n",
    "Use your Gaussian Process implementation to produce a number of plots where you vary $w$ and $\\sigma^2$. By refering to these plots, explain the meaning of the length scale and the noise level."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}