rapid_models.gp_diagnostics.cv¶
Module Contents¶
Functions¶
|
Compute multifold CV residuals for GP regression with noiseless |
|
Compute multifold CV residuals from the Cholesky factor L of the |
|
Compute Leave-One-Out (LOO) residuals for GP regression with noiseless |
|
Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the |
|
Check that the list of index subsets (list of lists) is valid |
|
Check that the argument is a 2d numpy array which is lower triangular |
|
Check that the argument is a numpy array of correct dimension |
|
Compute multifold cv residuals using matrix inverse (for testing) |
- rapid_models.gp_diagnostics.cv.multifold(K: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], folds: List[List[int]], noise_variance: float = 0.0, check_args: bool = True) Union[Tuple[None, None, None], Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]]¶
Compute multifold CV residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted)
- Parameters:
K (2d array) – GP prior covariance matrix
Y_train (array) – training observations
folds (list of lists) – The index subsets
noise_variance – variance of the observational noise. Set noise_variance = 0 for noiseless observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space
- Return type:
mean
This function just calls ‘multifold_cholesky()’ with the appropriate Cholesky factor. It is based on the formulation derived in:
[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]
- rapid_models.gp_diagnostics.cv.multifold_cholesky(L: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], folds: List[List[int]], check_args: bool = True) Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]¶
Compute multifold CV residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)
- Parameters:
L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)
Y_train (array) – training observations
folds (list of lists) – The index subsets
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space
- Return type:
mean
Note: * The matrix K = L L.T is the covariance matrix of the predicted observations Y_train * For observations including Gaussian noise with fixed variance (v), the matrix K is K = (K + v*I) where K[i, j] is the prior covariance of the latent GP between the i-th an j-th training location
This implementation uses the Cholesky factor instead of the inverse precision matrix, but is otherwise equivalent to the formulas derived in
[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]
- rapid_models.gp_diagnostics.cv.loo(K: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], noise_variance: float = 0.0, check_args: bool = True) Union[Tuple[None, None, None], Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]]¶
Compute Leave-One-Out (LOO) residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted) This function just calls ‘loo_cholesky()’ with the appropriate Cholesky factor.
- Parameters:
K (2d array) – GP prior covariance matrix
Y_train (array) – training observations
noise_variance (float) – variance of the observational noise. Set noise_variance = 0. for noiseless observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal space
- Return type:
mean
- rapid_models.gp_diagnostics.cv.loo_cholesky(L: nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Y_train: nptyping.NDArray[nptyping.Shape[N], nptyping.Float], check_args: bool = True) Tuple[nptyping.NDArray[nptyping.Shape[N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], nptyping.NDArray[nptyping.Shape[N], nptyping.Float]]¶
Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)
- Parameters:
L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)
Y_train (array) – training observations
check_args (bool) – Check (assert) that arguments are well-specified before computation
- Returns:
Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal space
- Return type:
mean
Note: * The matrix K = L L.T is the covariance matrix of the predicted observations Y_train * For observations including Gaussian noise with fixed variance (v), the matrix K is K = (K + v*I) where K[i, j] is the prior covariance of the latent GP between the i-th an j-th training location
This implementation uses the Cholesky factor instead of the inverse precision matrix, but is otherwise equivalent to the formulas derived in
[O. Dubrule. Cross validation of kriging in a unique neighborhood. Journal of the International Association for Mathematical Geology, 15 (6):687-699, 1983.]
- rapid_models.gp_diagnostics.cv.check_folds_indices(folds: List[List[int]], n_max: int)¶
Check that the list of index subsets (list of lists) is valid
- Parameters:
folds (list of lists) – The index subsets.
n_max (int) – Total number of indices.
- Raises:
AssertionError – if not ‘folds’ represents the range [0:n_max-1] of n_max indices split into non overlapping subsets
- rapid_models.gp_diagnostics.cv.check_lower_triangular(arr: Union[nptyping.NDArray[nptyping.Shape[N, N], nptyping.Float], Any], argname: str = 'arr')¶
Check that the argument is a 2d numpy array which is lower triangular
- Parameters:
() (arr) – object
- Raises:
AssertionError – if not ‘arr’ represents a lower triangular matrix
- rapid_models.gp_diagnostics.cv.check_numeric_array(arr: Union[nptyping.NDArray[Any, nptyping.Float], Any], dim: int, argname: str = 'arr')¶
Check that the argument is a numpy array of correct dimension
- Parameters:
() (arr) – object
- Raises:
AssertionError – if not ‘arr’ represents a ‘dim’-dimensional numpy array
- rapid_models.gp_diagnostics.cv._multifold_inv(K, Y_train, folds)¶
Compute multifold cv residuals using matrix inverse (for testing) (residual = observed - predicted)
- Parameters:
K (2d array) – covariance matrix
Y_train (array) – training observations
folds (list of lists) – The index subsets.
- Returns:
Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space
- Return type:
mean
[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]