Pre and post processing operators
The operators module contains helpers to manipulate NumPy ndarray. These operators are useful for pre or post processing of data. Note that they cannot handle PyTorch Tensors. To nest operators in a neural network, see the following file.
[1]:
import os
import sys
sys.path.append(os.path.join(os.path.abspath(""), ".."))
import numpy as np
import pandas as pd
from torch import nn
from nnbma.networks import FullyConnected
from nnbma.operators import (
log10,
pow10,
asinh,
Normalizer,
NormTypes,
SequentialOperator,
)
from functions import Fexample as F
Introductory examples
[2]:
n_features = 5
n_entries = 10
mean = np.random.normal(0, 1, size=n_features)
std = np.abs(np.random.normal(0, 1, size=n_features)) + 1
x = np.random.normal(mean, std, size=(n_entries, n_features)).astype("float32")
print(f"Data shape: {x.shape}")
print(x)
Data shape: (10, 5)
[[ 3.4063318e+00 -1.1354405e+00 9.8771578e-01 7.0870233e-01
1.7290703e+00]
[ 2.1967297e+00 -1.9723588e+00 1.1380010e-03 -2.0811489e+00
2.4284778e+00]
[ 1.6692861e+00 -9.6962959e-01 1.2388496e+00 2.8925421e+00
5.9123330e+00]
[ 1.6724726e+00 1.3535130e+00 3.5991206e+00 6.4644545e-01
1.2336152e+00]
[ 8.2049608e-01 -1.2848368e+00 -6.3905917e-02 -1.0490260e+00
-2.0751235e+00]
[ 1.1257614e+00 -1.9223274e-01 1.5219371e+00 -2.8947442e+00
2.0434897e-01]
[ 1.9933865e+00 2.5302145e-01 -1.3949286e+00 2.5250411e+00
1.4091047e+00]
[ 1.1154927e+00 -1.2004021e+00 -7.3979467e-02 2.7410027e-01
-2.2426999e+00]
[-2.4341707e-01 9.9058968e-01 1.4949166e+00 -6.7498440e-01
1.8345610e+00]
[-6.9475919e-01 -2.5285777e-01 5.9505379e-01 2.4864352e+00
-3.8048300e-01]]
Rescaling
You can choose the rescale your data with for instance log10 if they are over several order of magnitude. Alternatively, you can choose to use asinh if you have positive and negative values (which is the case here).
[3]:
y = log10(x)
print(y)
[[ 0.532287 nan -0.00536801 -0.14953615 0.23781267]
[ 0.34177664 nan -2.9438577 nan 0.38533413]
[ 0.2225308 nan 0.0930186 0.4612797 0.7717589 ]
[ 0.22335902 0.13146244 0.55619645 -0.18946813 0.0911797 ]
[-0.08592349 nan nan nan nan]
[ 0.05144635 nan 0.18239672 nan -0.6896276 ]
[ 0.29959154 -0.5968427 nan 0.40226847 0.14894328]
[ 0.04746673 nan nan -0.5620906 nan]
[ nan -0.0041062 0.17461695 nan 0.26353216]
[ nan nan -0.22544378 0.39557716 nan]]
/home/einig/PHD/projects/ism-model-nn-approximation/nnbma/operators/base.py:10: RuntimeWarning: invalid value encountered in log10
return np.log10(t)
[4]:
y = asinh(x)
print(y)
[[ 1.9396644e+00 -9.7397798e-01 8.7266058e-01 6.5978122e-01
1.3154666e+00]
[ 1.5283064e+00 -1.4312052e+00 1.1380007e-03 -1.4793483e+00
1.6203358e+00]
[ 1.2851425e+00 -8.5973459e-01 1.0406084e+00 1.7839062e+00
2.4772642e+00]
[ 1.2867789e+00 1.1106617e+00 1.9926003e+00 6.0824162e-01
1.0373164e+00]
[ 7.4859309e-01 -1.0691719e+00 -6.3862495e-02 -9.1561884e-01
-1.4767357e+00]
[ 9.6756536e-01 -1.9106807e-01 1.2068704e+00 -1.7846254e+00
2.0295283e-01]
[ 1.4406739e+00 2.5039664e-01 -1.1350307e+00 1.6564913e+00
1.1432626e+00]
[ 9.6072841e-01 -1.0162306e+00 -7.3912151e-02 2.7077913e-01
-1.5471891e+00]
[-2.4107517e-01 8.7470382e-01 1.1919401e+00 -6.3205260e-01
1.3671030e+00]
[-6.4836782e-01 -2.5023797e-01 5.6457895e-01 1.6421815e+00
-3.7185386e-01]]
Normalization
You may probably want to normalize your data. There are several options:
NONE: No normalizationMEAN0: Center the columns, i.e., set their means to 0STD1: Reduce the columns, i.e., set their variances to 1MEAN0STD1: Center and reduce the columns, i.e., set their means to 0 and their variances to 1MIN0MAX1: Apply a MinMax normalization, i.e., set the minimum value of each column to 0 and the maximum to 1MIN1MAX1: Apply an alternative MinMax normalization, i.e., set the minimum value of each column to -1 and the maximum to 1
[5]:
norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MIN1MAX1)
y = norm(x)
print(y)
[[ 1.00000000e+00 -4.96722460e-01 -4.58065867e-02 2.45297432e-01
-2.59339809e-02]
[ 4.10107136e-01 -1.00000000e+00 -4.40907955e-01 -7.18833566e-01
1.45593882e-01]
[ 1.52886152e-01 -3.97012770e-01 5.47666550e-02 1.00000000e+00
1.00000000e+00]
[ 1.54440045e-01 1.00000000e+00 1.00000000e+00 2.23782420e-01
-1.47443056e-01]
[-2.61047721e-01 -5.86561322e-01 -4.66956556e-01 -3.62147272e-01
-9.58902359e-01]
[-1.12177372e-01 7.04717636e-02 1.68136597e-01 -1.00000000e+00
-3.99867833e-01]
[ 3.10941696e-01 3.38223577e-01 -1.00000000e+00 8.72997165e-01
-1.04404747e-01]
[-1.17185175e-01 -5.35786867e-01 -4.70990777e-01 9.51055288e-02
-1.00000000e+00]
[-7.79891670e-01 7.81757474e-01 1.57315493e-01 -2.32884109e-01
-6.27040863e-05]
[-1.00000000e+00 3.40151787e-02 -2.03058541e-01 8.59655499e-01
-5.43296337e-01]]
[6]:
norm = Normalizer(pd.DataFrame(x), norm_type=NormTypes.MEAN0STD1)
y = norm(x)
print(y)
[[ 1.771317 -0.6508456 0.1475014 0.21452552 0.30665794]
[ 0.75111127 -1.4352963 -0.5907225 -1.1924846 0.60300183]
[ 0.30625358 -0.4954296 0.33541664 1.3159049 2.079136 ]
[ 0.3089411 1.6820718 2.1015303 0.18312742 0.0967301 ]
[-0.409635 -0.79087603 -0.6393927 -0.6719524 -1.305206 ]
[-0.1521674 0.23323111 0.54724175 -1.6028064 -0.33937728]
[ 0.5796071 0.6505717 -1.6353533 1.1305625 0.17108625]
[-0.16082823 -0.7117347 -0.6469304 -0.00465803 -1.3762091 ]
[-1.3069632 1.3419006 0.52702314 -0.4833114 0.3513551 ]
[-1.6876354 0.17640676 -0.14631471 1.1110923 -0.5871748 ]]
Embedding within a network
These operators can be added before and after a network. This is really user-friendly as people that haven’t trained the network won’t have to check how the data was preprocessed.
[7]:
preprocessing = SequentialOperator([asinh, norm])
postprocessing = pow10
net = FullyConnected(
[n_features, 10, 10, 1],
nn.ReLU(),
inputs_transformer=preprocessing,
outputs_transformer=postprocessing,
)
Usually, you can just evaluate the network as it was a function (net(x)). But, by default it won’t apply the pre and post processings. So you need to call the method evaluate which is mainly the same with more options:
[8]:
y = net.evaluate(x, transform_inputs=True, transform_outputs=True)
print(y)
[[1.5116196 ]
[1.5584499 ]
[1.4138592 ]
[1.7413094 ]
[1.9702724 ]
[2.2404044 ]
[0.88946015]
[1.695295 ]
[1.290986 ]
[1.5530138 ]]