nnbma.dataset package

Submodules

nnbma.dataset.regression_dataset module

class nnbma.dataset.regression_dataset.RegressionDataset(x: ndarray, y: ndarray, inputs_names: List[str] | None = None, outputs_names: List[str] | None = None)[source]

Bases: Dataset

Dataset dedicated to regression.

Parameters:
  • x (numpy.ndarray) – Array containing the input features of the regression model. Its shape is considered to be \(N \times I\) where \(N\) is the number of entries and \(I\) the number of input features.

  • y (numpy.ndarray) – Array containing the output features of the regression model. Its shape is considered to be \(N \times O\) where \(N\) is the number of entries and \(O\) the number of output features.

  • inputs_names (Optional[List[str]], optional) – list of the names of the input features, by default None

  • outputs_names (Optional[List[str]], optional) – list of the names of the output features, by default None

Raises:
  • ValueErrorx and y must have the same number of rows \(N\).

  • ValueErrorx and inputs_names must have the same number of features \(I\).

  • ValueErrory and outputs_names must have the same number of features \(O\).

apply_transf(x_op: Callable[[ndarray], ndarray] | None, y_op: Callable[[ndarray], ndarray] | None) RegressionDataset[source]

Apply an operator to x and y. A new dataset is returned so the operators should not use in-place operations.

Parameters:
  • x_op (Callable[[numpy.ndarray], np.ndarray]) – Operator to apply on the input features.

  • y_op (Callable[[numpy.ndarray], np.ndarray]) – Operator to apply on the output features.

Returns:

New dataset with transformed values.

Return type:

RegressionDataset

static from_pandas(df_x: DataFrame, df_y: DataFrame) RegressionDataset[source]

Converts two pandas DataFrames to a RegressionDataset object.

Parameters:
  • df_x (pd.DataFrame) – DataFrame of the inputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(I\) columns, i.e., features.

  • df_y (pd.DataFrame) – DataFrame of the outputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(O\) columns, i.e., features.

Returns:

associated RegressionDataset object. The x attribute is set to values in the df_x DataFrame, and the input_names attribute to its column names. The y attribute is set to values in the df_y DataFrame, and the output_names attribute to its column names.

Return type:

RegressionDataset

getall(numpy: Literal[True]) Tuple[ndarray, ndarray][source]
getall(numpy: Literal[False]) Tuple[Tensor, Tensor]

Returns all the dataset in numpy.ndarray or torch.Tensor depending on the value of the numpy parameter.

Parameters:

numpy (bool, optional) – If numpy==True, the returned object will be numpy arrays. Else, they will be torch tensors.

Returns:

Inputs and outputs sets.

Return type:

tuple of torch.Tensor or numpy.ndarray

has_nan() Tuple[bool, bool][source]

Returns a tuple of two boolean. The first one is True if the input features contain at least one NaN, else False. The second one is True if the output features contain at least one NaN, else False.

Returns:

Evaluate the presence of NaN in the input and output sets.

Return type:

tuple of bool

has_nonfinite() Tuple[bool, bool][source]

Returns a tuple of two boolean. The first one is True if the input features contain at least one non finite value (including NaNs), else False. The second one is True if the output features contain at least one non finite value (including NaNs), else False.

Returns:

Evaluate the presence of non finite values in the input and output sets.

Return type:

tuple of bool

property inputs_names: int

List of the names of the input features.

inputs_size() int[source]

Returns the number of floating point values in x.

join(other: RegressionDataset) RegressionDataset[source]

Returns the union of two datasets. Data are copied.

Parameters:

other (RegressionDataset) – Other dataset to join with.

Returns:

New dataset constructed as the union of the two datasets.

Return type:

type

static load(filename: str, path: str | None = None) RegressionDataset[source]

loads a regression dataset from a pickle file.

Parameters:
  • filename (str) – name of the file to be read.

  • path (Optional[str], optional) – path to the file to be read, by default None.

Returns:

loaded regression dataset.

Return type:

RegressionDataset

property n_inputs: int

Number of input features \(I\).

property n_outputs: int

Number of output features \(O\).

property outputs_names: int

List of the names of the output features.

outputs_size() int[source]

Returns the number of floating point values in y.

save(filename: str, path: str | None = None) None[source]

saves the dataset to a pickle file.

Parameters:
  • filename (str) – name of the file to be created.

  • path (Optional[str], optional) – path to the file to be created, by default None

stats() Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]

Provides a few statistics on the dataset (the mean, the standard deviation, the min and the max for each column).

Returns:

Tuple of dictionaries, each containing the mean, the standard deviation, the min and the max for each column. The first dictionary corresponds to the input x and the second to the output y.

Return type:

Tuple[ Dict[str, np.ndarray], Dict[str, np.ndarray] ]

substract(other: RegressionSubset) RegressionSubset[source]

Returns the subtraction of two datasets. Data are copied.

Description.

Parameters:

other (RegressionSubset) – Subset of self.

Returns:

New subset of self containing all values that were not in other.

Return type:

RegressionSubset

to_pandas() Tuple[DataFrame, DataFrame][source]

Converts the dataset to two pandas DataFrames.

Returns:

DataFrames of the input x and output y, respectively. The columns are names with the input_names and output_names, respectively, if they are not None.

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

property x: Tensor

Input tensor.

property y: Tensor

Output tensor.

class nnbma.dataset.regression_dataset.RegressionSubset(dataset: RegressionDataset, indices: Sequence[int])[source]

Bases: RegressionDataset

Subset of RegressionDataset.

dataset

Dataset from which entries are extracted.

Type:

RegressionDataset

indices

Indices of entries to extract.

Type:

Sequence of int

issubsetof(dataset: RegressionDataset) bool[source]

Returns True of self is a subset of dataset.

Parameters:

dataset (RegressionDataset) – Dataset of which we want to know if self is a subset.

Returns:

True of self is a subset of dataset else False.

Return type:

bool

property x: Tensor

Input tensor.

property y: Tensor

Output tensor.

nnbma.dataset.mask_dataset module

class nnbma.dataset.mask_dataset.MaskDataset(m: ndarray, features_names: List[str] | None = None)[source]

Bases: Dataset

Dataset dedicated to ignore some specified values during learning.

Parameters:
  • m (np.ndarray) – Array containing the output features of the regression model. y must be of shape \(N \times F\) where \(N\) is the number of entries and \(F\) the number of features.

  • features_names (Optional[List[str]], optional) – list of feature names, by default None.

Raises:

ValueErrorm and features_names must have the same number of features \(F\).

property features_names: List[str] | None

Features names.

features_size() int[source]

Returns the number of floating point values in m.

static from_pandas(df_m: DataFrame) MaskDataset[source]

Converts a pandas DataFrame to a MaskDataset object.

Parameters:

df_m (pd.DataFrame) – DataFrame of the masked outputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(F\) columns, i.e., features.

Returns:

associated MaskDataset object. The m attribute is set to values in the DataFrame, and the feature_names attribute to the column names.

Return type:

MaskDataset

getall(numpy: Literal[True]) ndarray[source]
getall(numpy: Literal[False]) Tensor

Returns all the dataset in numpy.ndarray or torch.Tensor depending on the value of the numpy parameter.

Parameters:

numpy (bool, optional) – If numpy==True, the returned object will be numpy arrays. Else, they will be torch tensors.

Returns:

Mask.

Return type:

torch.Tensor or numpy.ndarray

join(other: MaskDataset) MaskDataset[source]

Returns the union of two datasets. Data are copied.

Parameters:

other (MaskDataset) – Other dataset to join with.

Returns:

New dataset constructed as the union of the two datasets.

Return type:

MaskDataset

static load(filename: str, path: str | None = None) MaskDataset[source]

loads a mask dataset from a pickle file.

Parameters:
  • filename (str) – name of the file to be read.

  • path (Optional[str], optional) – path to the file to be read, by default None.

Returns:

loaded mask dataset.

Return type:

MaskDataset

property m: Tensor

Mask Tensor.

property n_features: int

Number of output features.

save(filename: str, path: str | None = None) None[source]

saves the dataset to a pickle file.

Parameters:
  • filename (str) – name of the file to be created.

  • path (Optional[str], optional) – path to the file to be created, by default None.

stats() Dict[str, ndarray][source]

Computes the proportion of masked entries for each output column.

Returns:

dictionary of masked entry proportion for each output feature.

Return type:

Dict[str, np.ndarray]

substract(other: MaskSubset) MaskDataset[source]

Returns the subtraction of two datasets. Data are copied.

Parameters:

other (MaskDataset) – Subset of self.

Returns:

New subset of self containing all values that were not in other.

Return type:

MaskDataset

to_pandas() DataFrame[source]

Converts the mask dataset to a pandas DataFrame.

Returns:

DataFrame of the mask on the output y.

Return type:

pd.DataFrame

class nnbma.dataset.mask_dataset.MaskSubset(dataset: MaskDataset, indices: Sequence[int])[source]

Bases: MaskDataset

Subset of RegressionDataset.

Parameters:
  • dataset (MaskDataset) – Dataset from which entries are extracted.

  • indices (Sequence of int) – Indices of entries to extract.

issubsetof(dataset: MaskDataset) bool[source]

Returns True if self is a subset of dataset.

Parameters:

dataset (MaskDataset) – Dataset of which we want to know if self is a subset.

Returns:

True if self is a subset of dataset else False.

Return type:

bool

property m: Tensor

Mask Tensor.

Module contents