pysr3.linear.problems module

class pysr3.linear.problems.LinearProblem(a, b, c=None, obs_std=None, regularization_weights=None)

Bases: object

Helper class which implements Linear models’ abstractions over a given dataset.

It also can generate random problems with specific characteristics.

Constructs LinearProblem – a helper class that abstracts the data for the models.

Parameters:

a (ndarray (n, p)) – data matrix
b (ndarray (n, )) – target variable
obs_std (ndarray (n, )) – variances of mean-zero Gaussian noise for each observation
regularization_weights (ndarray (n, )) – observation-specific weights for the regularizer. Inverse-proportional to the objects’ importance.

static from_dataframe(data: DataFrame, features: List[str], target: str, must_include_features: List[str] | None = None, obs_std: str | None = None, c=None)

Creates LinearProblem from a Pandas dataframe

Parameters:

data (pd.DataFrame) – pandas dataframe with dataset
features (List[str]) – list of column names that should be included as features
target (str) – name of the column containing the observations
must_include_features (List[str]) – list of column names that are not going to be affected by regularization. In other words, list of features that receive regularization_weight=0. All others receive 1.
obs_std (float | ndarray (num_objects, )) – variances of mean-zero Gaussian noise for each observation (array) OR for all observations (float)
c (ndarray (p, p), optional) – matrix C for SR3 relaxation, see the paper. If None then an identity is used.

Returns:

problem (LinearProblem) – problem with the dataset inside

static from_x_y(x, y, c=None, regularization_weights=None)

Creates a LinearProblem from provided dataset

Parameters:

x (ndarray (n, p)) – design matrix with objects being rows and columns being features
y (ndarray (n, )) – vector of observations
c (ndarray (p, p), optional) – matrix C for SR3 relaxation, see the paper. If None then an identity is used.
regularization_weights (ndarray (n, )) – observation-specific weights for the regularizer. Inverse-proportional to the objects’ importance.

Returns:

problem (LinearProblem) – problem with provided data inside

static generate(num_objects=100, num_features=10, obs_std=0.1, true_x=None, seed=42)

Generates a random dataset with a linear dependence between observations and features

Parameters:

num_objects (int) – number of objects (rows) in the dataset
num_features (int) – number of features (columns) in the dataset
obs_std (float | ndarray (num_objects, )) – variances of mean-zero Gaussian noise for each observation (array) OR for all observations (float)
true_x (ndarray (num_features, )) – true vector of coefficients. If None then generates a random one from U[0, 1]^num_features
seed (int) – random seed

Returns:

problem (LinearProblem) – generated problem

to_x_y()

Converts LinearProblem class to array representation :returns: * x (ndarray (n, p)) – design matrix with objects being rows and columns being features

y (ndarray (n, )) – vector of observations