rapid_models.gp_diagnostics.cv
Module Contents
Functions
|
Compute multifold CV residuals for GP regression with noiseless |
|
Compute multifold CV residuals from the Cholesky factor L of the |
|
Compute Leave-One-Out (LOO) residuals for GP regression with noiseless |
|
Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the |
|
Check that the list of index subsets (list of lists) is valid |
|
Check that the argument is a 2d numpy array which is lower triangular |
|
Check that the argument is a numpy array of correct dimension |
|
Compute multifold cv residuals using matrix inverse (for testing) |
- rapid_models.gp_diagnostics.cv.multifold(K: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], folds: List[List[int]], noise_variance: float = 0.0, check_args: bool = True) Tuple[None, None, None] | Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]
Compute multifold CV residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted)
- Parameters:
K (2d array) – GP prior covariance matrix
Y_train (array) – training observations
folds (list of lists) – The index subsets
noise_variance – variance of the observational noise. Set noise_variance = 0 for noiseless observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal
space
- Return type:
mean
This function just calls ‘multifold_cholesky()’ with the appropriate Cholesky factor. It is based on the formulation derived in:
- [D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian
Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]
- rapid_models.gp_diagnostics.cv.multifold_cholesky(L: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], folds: List[List[int]], check_args: bool = True) Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]
Compute multifold CV residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)
- Parameters:
L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)
Y_train (array) – training observations
folds (list of lists) – The index subsets
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal
space
- Return type:
mean
Note: * The matrix K = L L.T is the covariance matrix of the predicted
observations Y_train
- For observations including Gaussian noise with fixed variance (v), the
matrix K is
- K = (K + v*I) where K[i, j] is the prior covariance of the latent GP
between the i-th an j-th training location
- This implementation uses the Cholesky factor instead of the inverse
precision matrix, but is otherwise equivalent to the formulas derived in
- [D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian
Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]
- rapid_models.gp_diagnostics.cv.loo(K: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], noise_variance: float = 0.0, check_args: bool = True) Tuple[None, None, None] | Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]
Compute Leave-One-Out (LOO) residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted) This function just calls ‘loo_cholesky()’ with the appropriate Cholesky factor.
- Parameters:
K (2d array) – GP prior covariance matrix
Y_train (array) – training observations
noise_variance (float) – variance of the observational noise. Set noise_variance = 0. for noiseless observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal
space
- Return type:
mean
- rapid_models.gp_diagnostics.cv.loo_cholesky(L: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], check_args: bool = True) Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]
Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)
- Parameters:
L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)
Y_train (array) – training observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal
space
- Return type:
mean
Note: * The matrix K = L L.T is the covariance matrix of the predicted observations
Y_train
- For observations including Gaussian noise with fixed variance (v), the
matrix K is K = (K + v*I) where K[i, j] is the prior covariance of the latent GP between the i-th an j-th training location
This implementation uses the Cholesky factor instead of the inverse precision matrix, but is otherwise equivalent to the formulas derived in
- [O. Dubrule. Cross validation of kriging in a unique neighborhood.
Journal of the International Association for Mathematical Geology, 15 (6):687-699, 1983.]
- rapid_models.gp_diagnostics.cv.check_folds_indices(folds: List[List[int]], n_max: int)
Check that the list of index subsets (list of lists) is valid
- Parameters:
folds (list of lists) – The index subsets.
n_max (int) – Total number of indices.
- Raises:
AssertionError – if not ‘folds’ represents the range [0:n_max-1] of n_max indices split into non overlapping subsets
- rapid_models.gp_diagnostics.cv.check_lower_triangular(arr: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float] | Any, argname: str = 'arr')
Check that the argument is a 2d numpy array which is lower triangular
- Parameters:
() (arr) – object
- Raises:
AssertionError – if not ‘arr’ represents a lower triangular matrix
- rapid_models.gp_diagnostics.cv.check_numeric_array(arr: nptyping.NDArray[Any, nptyping.Float] | Any, dim: int, argname: str = 'arr')
Check that the argument is a numpy array of correct dimension
- Parameters:
() (arr) – object
- Raises:
AssertionError – if not ‘arr’ represents a ‘dim’-dimensional numpy array
- rapid_models.gp_diagnostics.cv._multifold_inv(K, Y_train, folds)
Compute multifold cv residuals using matrix inverse (for testing) (residual = observed - predicted)
- Parameters:
K (2d array) – covariance matrix
Y_train (array) – training observations
folds (list of lists) – The index subsets.
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal
space
- Return type:
mean
- [D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process
multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]