Pre and post processing operators

The operators module contains helpers to manipulate NumPy ndarray. These operators are useful for pre or post processing of data. Note that they cannot handle PyTorch Tensors. To nest operators in a neural network, see the following file.

[1]:

import os
import sys

sys.path.append(os.path.join(os.path.abspath(""), ".."))

import numpy as np
import pandas as pd

from torch import nn

from nnbma.networks import FullyConnected
from nnbma.operators import (
    log10,
    pow10,
    asinh,
    Normalizer,
    NormTypes,
    SequentialOperator,
)

from functions import Fexample as F

Introductory examples

[2]:

n_features = 5
n_entries = 10

mean = np.random.normal(0, 1, size=n_features)
std = np.abs(np.random.normal(0, 1, size=n_features)) + 1
x = np.random.normal(mean, std, size=(n_entries, n_features)).astype("float32")

print(f"Data shape: {x.shape}")
print(x)

Data shape: (10, 5)
[[ 3.4063318e+00 -1.1354405e+00  9.8771578e-01  7.0870233e-01
   1.7290703e+00]
 [ 2.1967297e+00 -1.9723588e+00  1.1380010e-03 -2.0811489e+00
   2.4284778e+00]
 [ 1.6692861e+00 -9.6962959e-01  1.2388496e+00  2.8925421e+00
   5.9123330e+00]
 [ 1.6724726e+00  1.3535130e+00  3.5991206e+00  6.4644545e-01
   1.2336152e+00]
 [ 8.2049608e-01 -1.2848368e+00 -6.3905917e-02 -1.0490260e+00
  -2.0751235e+00]
 [ 1.1257614e+00 -1.9223274e-01  1.5219371e+00 -2.8947442e+00
   2.0434897e-01]
 [ 1.9933865e+00  2.5302145e-01 -1.3949286e+00  2.5250411e+00
   1.4091047e+00]
 [ 1.1154927e+00 -1.2004021e+00 -7.3979467e-02  2.7410027e-01
  -2.2426999e+00]
 [-2.4341707e-01  9.9058968e-01  1.4949166e+00 -6.7498440e-01
   1.8345610e+00]
 [-6.9475919e-01 -2.5285777e-01  5.9505379e-01  2.4864352e+00
  -3.8048300e-01]]

Rescaling

You can choose the rescale your data with for instance log10 if they are over several order of magnitude. Alternatively, you can choose to use asinh if you have positive and negative values (which is the case here).

[3]:

y = log10(x)
print(y)

[[ 0.532287           nan -0.00536801 -0.14953615  0.23781267]
 [ 0.34177664         nan -2.9438577          nan  0.38533413]
 [ 0.2225308          nan  0.0930186   0.4612797   0.7717589 ]
 [ 0.22335902  0.13146244  0.55619645 -0.18946813  0.0911797 ]
 [-0.08592349         nan         nan         nan         nan]
 [ 0.05144635         nan  0.18239672         nan -0.6896276 ]
 [ 0.29959154 -0.5968427          nan  0.40226847  0.14894328]
 [ 0.04746673         nan         nan -0.5620906          nan]
 [        nan -0.0041062   0.17461695         nan  0.26353216]
 [        nan         nan -0.22544378  0.39557716         nan]]

/home/einig/PHD/projects/ism-model-nn-approximation/nnbma/operators/base.py:10: RuntimeWarning: invalid value encountered in log10
  return np.log10(t)

[4]:

y = asinh(x)
print(y)

[[ 1.9396644e+00 -9.7397798e-01  8.7266058e-01  6.5978122e-01
   1.3154666e+00]
 [ 1.5283064e+00 -1.4312052e+00  1.1380007e-03 -1.4793483e+00
   1.6203358e+00]
 [ 1.2851425e+00 -8.5973459e-01  1.0406084e+00  1.7839062e+00
   2.4772642e+00]
 [ 1.2867789e+00  1.1106617e+00  1.9926003e+00  6.0824162e-01
   1.0373164e+00]
 [ 7.4859309e-01 -1.0691719e+00 -6.3862495e-02 -9.1561884e-01
  -1.4767357e+00]
 [ 9.6756536e-01 -1.9106807e-01  1.2068704e+00 -1.7846254e+00
   2.0295283e-01]
 [ 1.4406739e+00  2.5039664e-01 -1.1350307e+00  1.6564913e+00
   1.1432626e+00]
 [ 9.6072841e-01 -1.0162306e+00 -7.3912151e-02  2.7077913e-01
  -1.5471891e+00]
 [-2.4107517e-01  8.7470382e-01  1.1919401e+00 -6.3205260e-01
   1.3671030e+00]
 [-6.4836782e-01 -2.5023797e-01  5.6457895e-01  1.6421815e+00
  -3.7185386e-01]]

Normalization

You may probably want to normalize your data. There are several options:

NONE: No normalization
MEAN0: Center the columns, i.e., set their means to 0
STD1: Reduce the columns, i.e., set their variances to 1
MEAN0STD1: Center and reduce the columns, i.e., set their means to 0 and their variances to 1
MIN0MAX1: Apply a MinMax normalization, i.e., set the minimum value of each column to 0 and the maximum to 1
MIN1MAX1: Apply an alternative MinMax normalization, i.e., set the minimum value of each column to -1 and the maximum to 1

[5]:

norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MIN1MAX1)

y = norm(x)
print(y)

[[ 1.00000000e+00 -4.96722460e-01 -4.58065867e-02  2.45297432e-01
  -2.59339809e-02]
 [ 4.10107136e-01 -1.00000000e+00 -4.40907955e-01 -7.18833566e-01
   1.45593882e-01]
 [ 1.52886152e-01 -3.97012770e-01  5.47666550e-02  1.00000000e+00
   1.00000000e+00]
 [ 1.54440045e-01  1.00000000e+00  1.00000000e+00  2.23782420e-01
  -1.47443056e-01]
 [-2.61047721e-01 -5.86561322e-01 -4.66956556e-01 -3.62147272e-01
  -9.58902359e-01]
 [-1.12177372e-01  7.04717636e-02  1.68136597e-01 -1.00000000e+00
  -3.99867833e-01]
 [ 3.10941696e-01  3.38223577e-01 -1.00000000e+00  8.72997165e-01
  -1.04404747e-01]
 [-1.17185175e-01 -5.35786867e-01 -4.70990777e-01  9.51055288e-02
  -1.00000000e+00]
 [-7.79891670e-01  7.81757474e-01  1.57315493e-01 -2.32884109e-01
  -6.27040863e-05]
 [-1.00000000e+00  3.40151787e-02 -2.03058541e-01  8.59655499e-01
  -5.43296337e-01]]

[6]:

norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MEAN0STD1)

y = norm(x)
print(y)

[[ 1.771317   -0.6508456   0.1475014   0.21452552  0.30665794]
 [ 0.75111127 -1.4352963  -0.5907225  -1.1924846   0.60300183]
 [ 0.30625358 -0.4954296   0.33541664  1.3159049   2.079136  ]
 [ 0.3089411   1.6820718   2.1015303   0.18312742  0.0967301 ]
 [-0.409635   -0.79087603 -0.6393927  -0.6719524  -1.305206  ]
 [-0.1521674   0.23323111  0.54724175 -1.6028064  -0.33937728]
 [ 0.5796071   0.6505717  -1.6353533   1.1305625   0.17108625]
 [-0.16082823 -0.7117347  -0.6469304  -0.00465803 -1.3762091 ]
 [-1.3069632   1.3419006   0.52702314 -0.4833114   0.3513551 ]
 [-1.6876354   0.17640676 -0.14631471  1.1110923  -0.5871748 ]]

Embedding within a network

These operators can be added before and after a network. This is really user-friendly as people that haven’t trained the network won’t have to check how the data was preprocessed.

[7]:

preprocessing = SequentialOperator([asinh, norm])
postprocessing = pow10

net = FullyConnected(
    [n_features, 10, 10, 1],
    nn.ReLU(),
    inputs_transformer=preprocessing,
    outputs_transformer=postprocessing,
)

Usually, you can just evaluate the network as it was a function (net(x)). But, by default it won’t apply the pre and post processings. So you need to call the method evaluate which is mainly the same with more options:

[8]:

y = net.evaluate(x, transform_inputs=True, transform_outputs=True)
print(y)

[[1.5116196 ]
 [1.5584499 ]
 [1.4138592 ]
 [1.7413094 ]
 [1.9702724 ]
 [2.2404044 ]
 [0.88946015]
 [1.695295  ]
 [1.290986  ]
 [1.5530138 ]]