pysr3.linear.problems module

class pysr3.linear.problems.LinearProblem(a, b, c=None, obs_std=None, regularization_weights=None)

Bases: object

Helper class which implements Linear models’ abstractions over a given dataset.

It also can generate random problems with specific characteristics.

Constructs LinearProblem – a helper class that abstracts the data for the models.

Parameters:
  • a (ndarray (n, p)) – data matrix

  • b (ndarray (n, )) – target variable

  • obs_std (ndarray (n, )) – variances of mean-zero Gaussian noise for each observation

  • regularization_weights (ndarray (n, )) – observation-specific weights for the regularizer. Inverse-proportional to the objects’ importance.

static from_dataframe(data: DataFrame, features: List[str], target: str, must_include_features: List[str] | None = None, obs_std: str | None = None, c=None)

Creates LinearProblem from a Pandas dataframe

Parameters:
  • data (pd.DataFrame) – pandas dataframe with dataset

  • features (List[str]) – list of column names that should be included as features

  • target (str) – name of the column containing the observations

  • must_include_features (List[str]) – list of column names that are not going to be affected by regularization. In other words, list of features that receive regularization_weight=0. All others receive 1.

  • obs_std (float | ndarray (num_objects, )) – variances of mean-zero Gaussian noise for each observation (array) OR for all observations (float)

  • c (ndarray (p, p), optional) – matrix C for SR3 relaxation, see the paper. If None then an identity is used.

Returns:

problem (LinearProblem) – problem with the dataset inside

static from_x_y(x, y, c=None, regularization_weights=None)

Creates a LinearProblem from provided dataset

Parameters:
  • x (ndarray (n, p)) – design matrix with objects being rows and columns being features

  • y (ndarray (n, )) – vector of observations

  • c (ndarray (p, p), optional) – matrix C for SR3 relaxation, see the paper. If None then an identity is used.

  • regularization_weights (ndarray (n, )) – observation-specific weights for the regularizer. Inverse-proportional to the objects’ importance.

Returns:

problem (LinearProblem) – problem with provided data inside

static generate(num_objects=100, num_features=10, obs_std=0.1, true_x=None, seed=42)

Generates a random dataset with a linear dependence between observations and features

Parameters:
  • num_objects (int) – number of objects (rows) in the dataset

  • num_features (int) – number of features (columns) in the dataset

  • obs_std (float | ndarray (num_objects, )) – variances of mean-zero Gaussian noise for each observation (array) OR for all observations (float)

  • true_x (ndarray (num_features, )) – true vector of coefficients. If None then generates a random one from U[0, 1]^num_features

  • seed (int) – random seed

Returns:

problem (LinearProblem) – generated problem

to_x_y()

Converts LinearProblem class to array representation :returns: * x (ndarray (n, p)) – design matrix with objects being rows and columns being features

  • y (ndarray (n, )) – vector of observations