Polynomial expansion of inputs
In this notebook, we introduce the PolynomialExpansion module which is able to create polynomial features of inputs in a differentiable way.
[1]:
import os
import sys
sys.path.append(os.path.join(os.path.abspath(""), ".."))
import torch
from nnbma.networks import FullyConnected, PolynomialNetwork
from nnbma.layers import PolynomialExpansion
PolynomialExpansion module
PolynomialExpansion is a torch Module that creates all possible (non-constant) monomial from a set of inputs. For instance, for degree=2, we have:
\(\mathrm{poly}((x_1,\, x_2,\,x_3)) = (x_1,\,x_2,\,x_3,\,x_1^2,\,x_1x_2,\,x_1x_3,\,x_2^2,\,x_2x_3,\,x_3^2)\).
Here’s the corresponding expansion:
[2]:
input_features = 3
order = 2
layer = PolynomialExpansion(input_features, order)
x = torch.tensor([2.0, 3.0, 5.0]) # Must have x.shape[-1] = input_features
print("Input:", x)
y = layer(x)
print("Output:", y)
Input: tensor([2., 3., 5.])
Output: tensor([ 2., 3., 5., 4., 6., 10., 9., 15., 25.])
As any modules from this package, it works with batched inputs along the first axes:
[3]:
x = torch.tensor(
[
[
[2.0, 3.0, 5.0],
[1.0, 1.0, 1.0],
],
[
[1.0, 0.0, 0.0],
[0.0, 2.0, 3.0],
],
]
) # Must have x.shape[-1] = input_features
print("Input:", x, sep="\n")
y = layer(x)
print("Output:", y, sep="\n")
Input:
tensor([[[2., 3., 5.],
[1., 1., 1.]],
[[1., 0., 0.],
[0., 2., 3.]]])
Output:
tensor([[[ 2., 3., 5., 4., 6., 10., 9., 15., 25.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1.]],
[[ 1., 0., 0., 1., 0., 0., 0., 0., 0.],
[ 0., 2., 3., 0., 0., 0., 4., 6., 9.]]])
Contrary to its classical use in preprocessing, this expansion is completely differentiable with respect to its inputs, so that it can be integrated into a neural network (and not placed before, a situation where the derivation of the network with respect to the inputs would be performed with respect to the polynomial features, and not with respect to the real inputs).
[4]:
x = torch.tensor(
[2.0, 3.0, 5.0], requires_grad=True
) # Must have x.shape[-1] = input_features
print("Input:", x)
y = layer(x)
print("Output:", y)
Input: tensor([2., 3., 5.], requires_grad=True)
Output: tensor([ 2., 3., 5., 4., 6., 10., 9., 15., 25.],
grad_fn=<SqueezeBackward1>)
This class also has a static method expanded_features that calculate the number of polynomial features depending on the number of real input features and the max order. It can be useful to anticipate the number of input_features the neural network placed after the layer will have.
[5]:
print(
"Number of polynomial features:",
PolynomialExpansion.expanded_features(order, input_features),
)
Number of polynomial features: 9
PolynomialNetwork module
PolynomialNetwork is a convenience class that allows to integrate a PolynomialLayer at the input of a network inheriting from NeuralNetwork.
[6]:
# We use expanded_features to anticipate the number of input_features the FullyConnected network will have
n_poly = PolynomialExpansion.expanded_features(order, input_features)
subnet = FullyConnected(
[n_poly, 10, 10, 1],
torch.nn.ReLU(),
)
net = PolynomialNetwork(
input_features,
order,
subnet,
)
net.eval()
[7]:
x = torch.tensor(
[2.0, 3.0, 5.0], requires_grad=True
) # Must have x.shape[-1] = input_features
print("Input:", x)
y = net(x)
print("Output gradient:", y)
Input: tensor([2., 3., 5.], requires_grad=True)
Output gradient: tensor([0.9394])
Standardization of polynomial features
The outputs of a PolynomialLayer are not standardized, even for standardized inputs. For example, if an input feature \(x\) is standardized (meaning that \(\mu_{x}=0\) and \(\sigma_{x}=1\)), then \(x^2\) is no longer standardized since \(\mu_{x^2}=1\) and generally \(\sigma_{x_2}\neq1\).
There is no analytical way of calculating the moments of polynomial features, as this depends on the distribution of the data.
[8]:
x = torch.normal(0, torch.ones(100, 2))
order = 2
layer = PolynomialExpansion(x.size(-1), order)
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([0., 0., 0., 0., 0.])
Parameter containing:
tensor([1., 1., 1., 1., 1.])
[9]:
print(x.mean(dim=0), layer(x).mean(dim=0))
tensor([-0.1325, 0.0635]) tensor([-0.1325, 0.0635, 0.8942, -0.0248, 0.9705])
[10]:
layer.update_standardization(x, reset=True)
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([-0.1325, 0.0635, 0.8942, -0.0248, 0.9705])
Parameter containing:
tensor([0.9363, 0.9831, 1.4429, 0.7592, 1.8849])
This standardization can also be done by batch in case of large datasets:
[11]:
layer.update_standardization(x[:50], reset=True)
print(layer.means, layer.stds, "", sep="\n")
layer.update_standardization(x[50:])
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([-0.1833, 0.0645, 1.0221, -0.0162, 0.7427])
Parameter containing:
tensor([0.9942, 0.8594, 1.8087, 0.7058, 1.1887])
Parameter containing:
tensor([-0.1325, 0.0635, 0.8942, -0.0248, 0.9705])
Parameter containing:
tensor([0.9363, 0.9831, 1.4429, 0.7592, 1.8849])
Note that there is a convenience method of PolynomialNetwork also called update_standardization.