pysr3.lme.oracles module

class pysr3.lme.oracles.LinearLMEOracle(problem: LMEProblem | None, n_iter_inner=200, tol_inner=1e-06, warm_start_duals=False, prior=None)

Bases: object

Implements Linear Mixed-Effects Model functional for given problem.

The model is:

Y_i = X_i*β + Z_i*u_i + 𝜺_i,

where

u_i ~ 𝒩(0, diag(𝛄)),

𝜺_i ~ 𝒩(0, Ξ›)

The problem should be provided as LMEProblem.

Creates an oracle on top of the given problem

Parameters:
  • problem (LMEProblem) – set of data and answers. See docs for LinearLMEProblem class for more details.

  • n_iter_inner (int) – maximal number of iterations for the oracle’s numerical subroutines, if any

  • tol_inner (float) – tolerance for stopping criteria of the oracle’s numerical subroutines, if any

  • warm_start_duals (bool) – if to warm-start when numerical subroutines are called sequentially

  • prior – additional prior for the oracle, see skmixed.priors for more info.

static beta_gamma_to_x(beta, gamma)

Merges beta and gamma into one array x = [beta, gamma]

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

Returns:

x – the ndarray of [beta, gamma]

demarginalized_loss(beta: ndarray, gamma: ndarray, **kwargs) float

Evaluates a de-marginalized loss of the model

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

the value of the de-marginalized loss

flip_probabilities_beta(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Estimates the probabilities of coordinates in beta to flip their signs under the normal posterior.

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

probabilities each coordinates of beta, as ndarray.

forget()

Detaches the problem from the oracle

Returns:

None

get_condition_numbers()

Returns the condition numbers of the data matrices X and Z

Returns:

Tuple of the condition numbers of the data matrices X and Z

get_ic(ic, beta, gamma, **kwargs)

Wrapper for information criteria

Parameters:
  • ic (str) – Can be β€œIC_jones2010bic”, β€œIC_vaida2005aic”

  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

The value of the information criterion requested

gradient_beta(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Evaluates the gradient of the loss-function with respect to beta

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

the gradient of the loss function with respect to beta

gradient_gamma(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Returns the gradient of the loss function with respect to gamma: βˆ‡_𝛄[β„’](Ξ², 𝛄)

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

grad_gamma (np.ndarray, shape = [k]) – The gradient of the loss function with respect to gamma: βˆ‡_𝛄[β„’](Ξ², 𝛄)

gradient_value_function(w)

Returns the gradient of the value function.

Parameters:

w (ndarray, (p+q)) – vector of parameters [beta, gamma]

Returns:

Gradient of the value function

hessian_beta(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Evaluates Hessian of the loss-function with respect to beta

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Hessian of the loss function with respect to beta

hessian_beta_gamma(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Evaluates mixed Hessian of the loss-function with respect to (beta, gamma)

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Hessian of the loss function with respect to (beta, gamma)

hessian_gamma(beta: ndarray, gamma: ndarray, take_only_positive_part=False, take_expected_value=False, **kwargs) ndarray

Returns the Hessian of the loss function with respect to gamma βˆ‡Β²_𝛄[β„’](Ξ², 𝛄).

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • take_only_positive_part (bool) – Whether to return only the positive-definite part of Hessian

  • take_expected_value (bool) – Whether to return the expected value of the hessian

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

hessian (np.ndarray, shape = [k, k]) – Hessian of the loss function with respect to gamma βˆ‡Β²_𝛄[β„’](Ξ², 𝛄).

instantiate(problem)

Attach the problem to the oracle

Parameters:

problem (LMEProblem) – problem to attach

Returns:

None

joint_gradient(x, *args, **kwargs)

Takes x = [beta, gamma] and evaluates the gradient of the loss with this beta and gamma.

Parameters:
  • x (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • args – for debugging and passing parameters

  • kwargs – for debugging and passing parameters

Returns:

Gradient of the loss at x = [beta, gamma]

joint_loss(x, *args, **kwargs)

Takes x = [beta, gamma] and evaluates the loss with this beta and gamma.

Parameters:
  • x (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • args – for debugging and passing parameters

  • kwargs – for debugging and passing parameters

Returns:

Loss at x = [beta, gamma]

jones2010bic(beta, gamma, tolerance=0.0, **kwargs)

Implements Bayes Information Criterion (BIC) from (Jones, 2010) # https://www.researchgate.net/publication/51536734_Bayesian_information_criterion_for_longitudinal_and_clustered_data

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value of Jones’s BIC

loss(beta: ndarray, gamma: ndarray, **kwargs) float

Returns the loss function value β„’(Ξ², 𝛄).

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

result (float) – The value of the loss function: β„’(Ξ², 𝛄)

muller_hui_2016ic(beta, gamma, tolerance=0.0, **kwargs)

Implements Information Criterion (IC) from (Muller, 2016) https://www.tandfonline.com/doi/full/10.1080/01621459.2016.1215989 page 1326, equation 3

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value of Mueller’s IC

optimal_beta(gamma: ndarray, _dont_solve_wrt_beta=False, **kwargs)

Returns beta (vector of estimations of fixed effects) which minimizes loss function for a fixed gamma.

The algorithm for computing optimal beta is:

kernel = βˆ‘X_i^TΞ©_iX_i

tail = βˆ‘X_i^TΞ©_iY_i

Ξ² = (kernel)^{-1}*tail

It’s available almost exclusively in linear models. In general one should use gradient_beta and do iterative minimization instead.

Parameters:
  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • _dont_solve_wrt_beta (bool, Optional) – If true, then it does not perform the outer matrix inversion and returns the (kernel, tail) instead. It’s left here for the purposes of use in child classes where both the kernel and the tail should be adjusted to account for regularization.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

beta (np.ndarray, shape = [p]) – Vector of optimal estimates of the fixed effects for given gamma.

optimal_gamma(beta: ndarray, gamma: ndarray, method='pgd', **kwargs) ndarray

Evaluates optimal gamma given beta

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

  • method (str) – which numerical subroutine to use: β€œpgd” or β€œip”

Returns:

optimal gamma

optimal_gamma_ip(beta: ndarray, gamma: ndarray, log_progress=False, **kwargs)

Evaluates optimal gamma given beta with an Interior Point Method

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

  • log_progress (bool) – Whether to log the loss function during optimization (for debugging only)

Returns:

optimal gamma

optimal_gamma_pgd(beta: ndarray, gamma: ndarray, log_progress=False, **kwargs)

Evaluates optimal gamma given beta with a Proximal Projected Gradient Descent

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects, initial point.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

  • log_progress (bool) – Whether to log the loss function during optimization (for debugging only)

Returns:

optimal gamma

optimal_random_effects(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Returns set of optimal random effects estimations for given beta and gamma.

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

u (np.ndarray, shape = [m, q]) – Set of optimal random effects estimations for given beta and gamma

vaida2005aic(beta, gamma, tolerance=0.0, marginalized=False, **kwargs)

Calculates Akaike Information Criterion (AIC) from https://www.jstor.org/stable/2673485

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value for Vaida AIC

value_function(w, **kwargs)

Evaluates value function of the problem. If no relaxation is applied then the value function is the loss function.

Parameters:
  • w (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • kwargs – for debugging and passing parameters

Returns:

Value of the value function

x_to_beta_gamma(x)

Takes x = [beta, gamma], splits it, and returns beta and gamma separately

Parameters:

x (ndarray, (p+q)) – vector of parameters [beta, gamma]

Returns:

Tuple of numpy arrays (beta and gamma)

class pysr3.lme.oracles.LinearLMEOracleSR3(problem: LMEProblem | None, lb=0.1, lg=0.1, warm_start=True, take_only_positive_part=True, take_expected_value=False, central_path_neighbourhood_target=0.5, **kwargs)

Bases: LinearLMEOracle

Implements Sparse Relaxed Regularized Regression (SR3) for Linear Mixed-Effects Model. The problem should be provided as LMEProblem.

Creates an oracle on top of the given problem. The problem should be in the form of LinearLMEProblem.

Parameters:
  • problem (LMEProblem) – The set of data and answers. See the docs for LinearLMEProblem for more details.

  • lb (float) – Regularization coefficient (inverse std) for ||Ξ² - tΞ²||^2

  • lg (float) – Regularization coefficient (inverse std) for ||𝛄 - t𝛄||^2

find_optimal_parameters(w, log_progress=False, regularizer=None, increase_lambdas=False, line_search=False, prox_step_len=1.0, update_prox_every=1, **kwargs)

Wrapper around the routine that maximizes the likelihood of the oracle with respect to its non-relaxed parameters. This routine is designed for experiments doing all relaxations of (SR3 and IP) simultaneously (SR3Practical).

Parameters:
  • w (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • log_progress (bool) – whether to log progress of internal numerical routines

  • regularizer (Regularizer) – an instance of Regularizer class

  • increase_lambdas (bool) – whether to iteratively increase the parameters of SR3 relaxation

  • line_search (bool) – whether to perform line-search inside IP method

  • prox_step_len (float) – step-size for numerical subroutines

  • update_prox_every (int) – how frequently algorithm updates sparse variables (tbeta, tgamma)

  • kwargs – for passing extra arguments

Returns:

Five objects (beta, gamma, tbeta, tgamma, logger)

find_optimal_parameters_ip(beta: ndarray, gamma: ndarray, tbeta=None, tgamma=None, regularizer=None, increase_lambdas=False, line_search=False, prox_step_len=1.0, update_prox_every=1, logger=None, mu_decay=0.5, **kwargs)
gradient_beta(beta: ndarray, gamma: ndarray, tbeta: ndarray | None = None, **kwargs) ndarray

Evaluates the gradient of the loss-function with respect to beta

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects, initial point.

  • tbeta (np.ndarray, shape = [p]) – Vector of relaxed estimates of fixed effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

the gradient of the loss function with respect to beta

gradient_gamma(beta: ndarray, gamma: ndarray, tgamma: ndarray | None = None, **kwargs) ndarray

Returns the gradient of the loss function with respect to gamma: grad_gamma = βˆ‡_𝛄[β„’](Ξ², 𝛄) + lg*(𝛄 - t𝛄)

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tgamma (np.ndarray, shape = [q]) – Vector of relaxed estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

grad_gamma (np.ndarray, shape = [q]) – The gradient of the loss function with respect to gamma: grad_gamma = βˆ‡_𝛄[β„’](Ξ², 𝛄) + lg*(𝛄 - t𝛄)

gradient_value_function(w, logger=None, **kwargs)

Returns the gradient of the value function.

Parameters:

w (ndarray, (p+q)) – vector of parameters [beta, gamma]

Returns:

Gradient of the value function

hessian_beta(beta: ndarray, gamma: ndarray, **kwargs)

Returns the Hessian of the loss function with respect to gamma

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

hessian (np.ndarray, shape = [q, q]) – Hessian of the loss function with respect to gamma

hessian_gamma(beta: ndarray, gamma: ndarray, **kwargs) ndarray

Returns the Hessian of the loss function with respect to gamma.

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

hessian (np.ndarray, shape = [q, q]) – Hessian of the loss function with respect to gamma.

joint_gradient(x, tbeta=None, tgamma=None)

Takes x = [beta, gamma] and evaluates the gradient of the loss with this beta and gamma.

Parameters:
  • x (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • tbeta (np.ndarray, shape = [p]) – Vector of relaxed estimates of fixed effects.

  • tgamma (np.ndarray, shape = [q]) – Vector of relaxed estimates of random effects.

Returns:

Gradient of the loss at x = [beta, gamma]

joint_loss(x, w=None, *args, **kwargs)

Takes x = [beta, gamma], w = [tbeta, tgamma], and evaluates the loss with this beta and gamma.

Parameters:
  • x (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • w (ndarray, (p+q)) – vector of parameters [tbeta, tgamma]

  • args – for debugging and passing parameters

  • kwargs – for debugging and passing parameters

Returns:

Loss at (x, w)

jones2010bic(beta, gamma, tolerance=0.0, **kwargs)

Implements Bayes Information Criterion (BIC) from (Jones, 2010) # https://www.researchgate.net/publication/51536734_Bayesian_information_criterion_for_longitudinal_and_clustered_data

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value of Jones’s BIC

loss(beta: ndarray, gamma: ndarray, tbeta: ndarray | None = None, tgamma: ndarray | None = None, **kwargs)

Returns the loss function value β„’(Ξ², 𝛄) + lb/2*||Ξ² - tΞ²||^2 + lg/2*||𝛄 - t𝛄||^2

Parameters:
  • beta (np.ndarray, shape = [n]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • tbeta (np.ndarray, shape = [n]) – Vector of relaxed estimates of fixed effects.

  • tgamma (np.ndarray, shape = [k]) – Vector of relaxed estimates of random effects.

  • kwargs – Not used, left for future and for passing debug/experimental parameters.

Returns:

result (float) – The value of the loss function: β„’(Ξ², 𝛄) + lb/2*||Ξ² - tΞ²||^2 + lg/2*||𝛄 - t𝛄||^2

muller_hui_2016ic(beta, gamma, tolerance=0.0, **kwargs)

Implements Information Criterion (IC) from (Muller, 2018)

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value of Mueller’s IC

optimal_beta(gamma: ndarray, tbeta: ndarray | None = None, _dont_solve_wrt_beta=False, **kwargs)

Returns beta (vector of estimations of fixed effects) which minimizes loss function for a fixed gamma. The algorithm for computing optimal beta is:

kernel = βˆ‘X_i^TΞ©_iX_i
tail = βˆ‘X_i^TΞ©_iY_i
Ξ² = (kernel + I*lb)^{-1}*(tail + lb*tbeta)

It’s available almost exclusively in linear models. In general one should use gradient_beta and do iterative minimization instead.

Parameters:
  • gamma (np.ndarray, shape = [k]) – Vector of estimates of random effects.

  • tbeta (np.ndarray, shape = [n]) – Vector of (nnz_tbeta)-sparse estimates for fixed effects.

  • _dont_solve_wrt_beta (bool, Optional) – If true, then it does not perform the outer matrix inversion and returns the (kernel, tail) instead. It’s left here for the purposes of use in child classes where both the kernel and the tail should be adjusted to account for regularization.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

beta (np.ndarray, shape = [n]) – Vector of optimal estimates of the fixed effects for given gamma.

vaida2005aic(beta, gamma, tolerance=0.0, marginalized=False, **kwargs)

Calculates Akaike Information Criterion (AIC) from https://www.jstor.org/stable/2673485

Parameters:
  • beta (np.ndarray, shape = [p]) – Vector of estimates of fixed effects.

  • gamma (np.ndarray, shape = [q]) – Vector of estimates of random effects.

  • tolerance (float, positive) – Threshold for absolute values of beta and gamma being considered zero. Should account for the finite tolerance of the numerical solver.

  • kwargs – Not used, left for future and for passing debug/experimental parameters

Returns:

Value for Vaida AIC

value_function(w, **kwargs)

Evaluates value function of the problem.

Parameters:
  • w (ndarray, (p+q)) – vector of parameters [beta, gamma]

  • kwargs – for debugging and passing parameters

Returns:

Value of the value function