nnbma.dataset package
Submodules
nnbma.dataset.regression_dataset module
- class nnbma.dataset.regression_dataset.RegressionDataset(x: ndarray, y: ndarray, inputs_names: List[str] | None = None, outputs_names: List[str] | None = None)[source]
Bases:
DatasetDataset dedicated to regression.
- Parameters:
x (numpy.ndarray) – Array containing the input features of the regression model. Its shape is considered to be \(N \times I\) where \(N\) is the number of entries and \(I\) the number of input features.
y (numpy.ndarray) – Array containing the output features of the regression model. Its shape is considered to be \(N \times O\) where \(N\) is the number of entries and \(O\) the number of output features.
inputs_names (Optional[List[str]], optional) – list of the names of the input features, by default None
outputs_names (Optional[List[str]], optional) – list of the names of the output features, by default None
- Raises:
ValueError –
xandymust have the same number of rows \(N\).ValueError –
xandinputs_namesmust have the same number of features \(I\).ValueError –
yandoutputs_namesmust have the same number of features \(O\).
- apply_transf(x_op: Callable[[ndarray], ndarray] | None, y_op: Callable[[ndarray], ndarray] | None) RegressionDataset[source]
Apply an operator to
xandy. A new dataset is returned so the operators should not use in-place operations.- Parameters:
x_op (Callable[[numpy.ndarray], np.ndarray]) – Operator to apply on the input features.
y_op (Callable[[numpy.ndarray], np.ndarray]) – Operator to apply on the output features.
- Returns:
New dataset with transformed values.
- Return type:
- static from_pandas(df_x: DataFrame, df_y: DataFrame) RegressionDataset[source]
Converts two pandas DataFrames to a RegressionDataset object.
- Parameters:
df_x (pd.DataFrame) – DataFrame of the inputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(I\) columns, i.e., features.
df_y (pd.DataFrame) – DataFrame of the outputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(O\) columns, i.e., features.
- Returns:
associated RegressionDataset object. The
xattribute is set to values in thedf_xDataFrame, and theinput_namesattribute to its column names. Theyattribute is set to values in thedf_yDataFrame, and theoutput_namesattribute to its column names.- Return type:
- getall(numpy: Literal[True]) Tuple[ndarray, ndarray][source]
- getall(numpy: Literal[False]) Tuple[Tensor, Tensor]
Returns all the dataset in numpy.ndarray or torch.Tensor depending on the value of the
numpyparameter.- Parameters:
numpy (bool, optional) – If
numpy==True, the returned object will be numpy arrays. Else, they will be torch tensors.- Returns:
Inputs and outputs sets.
- Return type:
tuple of torch.Tensor or numpy.ndarray
- has_nan() Tuple[bool, bool][source]
Returns a tuple of two boolean. The first one is
Trueif the input features contain at least one NaN, elseFalse. The second one isTrueif the output features contain at least one NaN, elseFalse.- Returns:
Evaluate the presence of NaN in the input and output sets.
- Return type:
tuple of bool
- has_nonfinite() Tuple[bool, bool][source]
Returns a tuple of two boolean. The first one is
Trueif the input features contain at least one non finite value (including NaNs), elseFalse. The second one isTrueif the output features contain at least one non finite value (including NaNs), elseFalse.- Returns:
Evaluate the presence of non finite values in the input and output sets.
- Return type:
tuple of bool
- property inputs_names: int
List of the names of the input features.
- join(other: RegressionDataset) RegressionDataset[source]
Returns the union of two datasets. Data are copied.
- Parameters:
other (RegressionDataset) – Other dataset to join with.
- Returns:
New dataset constructed as the union of the two datasets.
- Return type:
type
- static load(filename: str, path: str | None = None) RegressionDataset[source]
loads a regression dataset from a pickle file.
- Parameters:
filename (str) – name of the file to be read.
path (Optional[str], optional) – path to the file to be read, by default None.
- Returns:
loaded regression dataset.
- Return type:
- property n_inputs: int
Number of input features \(I\).
- property n_outputs: int
Number of output features \(O\).
- property outputs_names: int
List of the names of the output features.
- save(filename: str, path: str | None = None) None[source]
saves the dataset to a pickle file.
- Parameters:
filename (str) – name of the file to be created.
path (Optional[str], optional) – path to the file to be created, by default None
- stats() Tuple[Dict[str, ndarray], Dict[str, ndarray]][source]
Provides a few statistics on the dataset (the mean, the standard deviation, the min and the max for each column).
- Returns:
Tuple of dictionaries, each containing the mean, the standard deviation, the min and the max for each column. The first dictionary corresponds to the input x and the second to the output y.
- Return type:
Tuple[ Dict[str, np.ndarray], Dict[str, np.ndarray] ]
- substract(other: RegressionSubset) RegressionSubset[source]
Returns the subtraction of two datasets. Data are copied.
Description.
- Parameters:
other (RegressionSubset) – Subset of
self.- Returns:
New subset of
selfcontaining all values that were not in other.- Return type:
- to_pandas() Tuple[DataFrame, DataFrame][source]
Converts the dataset to two pandas DataFrames.
- Returns:
DataFrames of the input
xand outputy, respectively. The columns are names with theinput_namesandoutput_names, respectively, if they are notNone.- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
- property x: Tensor
Input tensor.
- property y: Tensor
Output tensor.
- class nnbma.dataset.regression_dataset.RegressionSubset(dataset: RegressionDataset, indices: Sequence[int])[source]
Bases:
RegressionDatasetSubset of RegressionDataset.
- dataset
Dataset from which entries are extracted.
- Type:
- indices
Indices of entries to extract.
- Type:
Sequence of int
- issubsetof(dataset: RegressionDataset) bool[source]
Returns
Trueofselfis a subset ofdataset.- Parameters:
dataset (RegressionDataset) – Dataset of which we want to know if
selfis a subset.- Returns:
Trueofselfis a subset ofdatasetelseFalse.- Return type:
bool
- property x: Tensor
Input tensor.
- property y: Tensor
Output tensor.
nnbma.dataset.mask_dataset module
- class nnbma.dataset.mask_dataset.MaskDataset(m: ndarray, features_names: List[str] | None = None)[source]
Bases:
DatasetDataset dedicated to ignore some specified values during learning.
- Parameters:
m (np.ndarray) – Array containing the output features of the regression model. y must be of shape \(N \times F\) where \(N\) is the number of entries and \(F\) the number of features.
features_names (Optional[List[str]], optional) – list of feature names, by default None.
- Raises:
ValueError –
mandfeatures_namesmust have the same number of features \(F\).
- property features_names: List[str] | None
Features names.
- static from_pandas(df_m: DataFrame) MaskDataset[source]
Converts a pandas DataFrame to a MaskDataset object.
- Parameters:
df_m (pd.DataFrame) – DataFrame of the masked outputs. This DataFrame should contain \(N\) rows, i.e., number of entries, and \(F\) columns, i.e., features.
- Returns:
associated MaskDataset object. The
mattribute is set to values in the DataFrame, and thefeature_namesattribute to the column names.- Return type:
- getall(numpy: Literal[True]) ndarray[source]
- getall(numpy: Literal[False]) Tensor
Returns all the dataset in numpy.ndarray or torch.Tensor depending on the value of the
numpyparameter.- Parameters:
numpy (bool, optional) – If
numpy==True, the returned object will be numpy arrays. Else, they will be torch tensors.- Returns:
Mask.
- Return type:
torch.Tensor or numpy.ndarray
- join(other: MaskDataset) MaskDataset[source]
Returns the union of two datasets. Data are copied.
- Parameters:
other (MaskDataset) – Other dataset to join with.
- Returns:
New dataset constructed as the union of the two datasets.
- Return type:
- static load(filename: str, path: str | None = None) MaskDataset[source]
loads a mask dataset from a pickle file.
- Parameters:
filename (str) – name of the file to be read.
path (Optional[str], optional) – path to the file to be read, by default None.
- Returns:
loaded mask dataset.
- Return type:
- property m: Tensor
Mask Tensor.
- property n_features: int
Number of output features.
- save(filename: str, path: str | None = None) None[source]
saves the dataset to a pickle file.
- Parameters:
filename (str) – name of the file to be created.
path (Optional[str], optional) – path to the file to be created, by default None.
- stats() Dict[str, ndarray][source]
Computes the proportion of masked entries for each output column.
- Returns:
dictionary of masked entry proportion for each output feature.
- Return type:
Dict[str, np.ndarray]
- substract(other: MaskSubset) MaskDataset[source]
Returns the subtraction of two datasets. Data are copied.
- Parameters:
other (MaskDataset) – Subset of
self.- Returns:
New subset of
selfcontaining all values that were not in other.- Return type:
- class nnbma.dataset.mask_dataset.MaskSubset(dataset: MaskDataset, indices: Sequence[int])[source]
Bases:
MaskDatasetSubset of RegressionDataset.
- Parameters:
dataset (MaskDataset) – Dataset from which entries are extracted.
indices (Sequence of int) – Indices of entries to extract.
- issubsetof(dataset: MaskDataset) bool[source]
Returns
Trueifselfis a subset ofdataset.- Parameters:
dataset (MaskDataset) – Dataset of which we want to know if
selfis a subset.- Returns:
Trueifselfis a subset ofdatasetelseFalse.- Return type:
bool
- property m: Tensor
Mask Tensor.