Polynomial expansion of inputs

In this notebook, we introduce the PolynomialExpansion module which is able to create polynomial features of inputs in a differentiable way.

[1]:
import os
import sys

sys.path.append(os.path.join(os.path.abspath(""), ".."))

import torch

from nnbma.networks import FullyConnected, PolynomialNetwork
from nnbma.layers import PolynomialExpansion

PolynomialExpansion module

PolynomialExpansion is a torch Module that creates all possible (non-constant) monomial from a set of inputs. For instance, for degree=2, we have:

\(\mathrm{poly}((x_1,\, x_2,\,x_3)) = (x_1,\,x_2,\,x_3,\,x_1^2,\,x_1x_2,\,x_1x_3,\,x_2^2,\,x_2x_3,\,x_3^2)\).

Here’s the corresponding expansion:

[2]:
input_features = 3
order = 2

layer = PolynomialExpansion(input_features, order)

x = torch.tensor([2.0, 3.0, 5.0])  # Must have x.shape[-1] = input_features
print("Input:", x)

y = layer(x)
print("Output:", y)
Input: tensor([2., 3., 5.])
Output: tensor([ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.])

As any modules from this package, it works with batched inputs along the first axes:

[3]:
x = torch.tensor(
    [
        [
            [2.0, 3.0, 5.0],
            [1.0, 1.0, 1.0],
        ],
        [
            [1.0, 0.0, 0.0],
            [0.0, 2.0, 3.0],
        ],
    ]
)  # Must have x.shape[-1] = input_features
print("Input:", x, sep="\n")

y = layer(x)
print("Output:", y, sep="\n")
Input:
tensor([[[2., 3., 5.],
         [1., 1., 1.]],

        [[1., 0., 0.],
         [0., 2., 3.]]])
Output:
tensor([[[ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.],
         [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]],

        [[ 1.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
         [ 0.,  2.,  3.,  0.,  0.,  0.,  4.,  6.,  9.]]])

Contrary to its classical use in preprocessing, this expansion is completely differentiable with respect to its inputs, so that it can be integrated into a neural network (and not placed before, a situation where the derivation of the network with respect to the inputs would be performed with respect to the polynomial features, and not with respect to the real inputs).

[4]:
x = torch.tensor(
    [2.0, 3.0, 5.0], requires_grad=True
)  # Must have x.shape[-1] = input_features
print("Input:", x)

y = layer(x)
print("Output:", y)
Input: tensor([2., 3., 5.], requires_grad=True)
Output: tensor([ 2.,  3.,  5.,  4.,  6., 10.,  9., 15., 25.],
       grad_fn=<SqueezeBackward1>)

This class also has a static method expanded_features that calculate the number of polynomial features depending on the number of real input features and the max order. It can be useful to anticipate the number of input_features the neural network placed after the layer will have.

[5]:
print(
    "Number of polynomial features:",
    PolynomialExpansion.expanded_features(order, input_features),
)
Number of polynomial features: 9

PolynomialNetwork module

PolynomialNetwork is a convenience class that allows to integrate a PolynomialLayer at the input of a network inheriting from NeuralNetwork.

[6]:
# We use expanded_features to anticipate the number of input_features the FullyConnected network will have
n_poly = PolynomialExpansion.expanded_features(order, input_features)

subnet = FullyConnected(
    [n_poly, 10, 10, 1],
    torch.nn.ReLU(),
)

net = PolynomialNetwork(
    input_features,
    order,
    subnet,
)
net.eval()
[7]:
x = torch.tensor(
    [2.0, 3.0, 5.0], requires_grad=True
)  # Must have x.shape[-1] = input_features
print("Input:", x)

y = net(x)
print("Output gradient:", y)
Input: tensor([2., 3., 5.], requires_grad=True)
Output gradient: tensor([0.9394])

Standardization of polynomial features

The outputs of a PolynomialLayer are not standardized, even for standardized inputs. For example, if an input feature \(x\) is standardized (meaning that \(\mu_{x}=0\) and \(\sigma_{x}=1\)), then \(x^2\) is no longer standardized since \(\mu_{x^2}=1\) and generally \(\sigma_{x_2}\neq1\).

There is no analytical way of calculating the moments of polynomial features, as this depends on the distribution of the data.

[8]:
x = torch.normal(0, torch.ones(100, 2))
order = 2

layer = PolynomialExpansion(x.size(-1), order)
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([0., 0., 0., 0., 0.])
Parameter containing:
tensor([1., 1., 1., 1., 1.])
[9]:
print(x.mean(dim=0), layer(x).mean(dim=0))
tensor([-0.1325,  0.0635]) tensor([-0.1325,  0.0635,  0.8942, -0.0248,  0.9705])
[10]:
layer.update_standardization(x, reset=True)
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([-0.1325,  0.0635,  0.8942, -0.0248,  0.9705])
Parameter containing:
tensor([0.9363, 0.9831, 1.4429, 0.7592, 1.8849])

This standardization can also be done by batch in case of large datasets:

[11]:
layer.update_standardization(x[:50], reset=True)
print(layer.means, layer.stds, "", sep="\n")
layer.update_standardization(x[50:])
print(layer.means, layer.stds, sep="\n")
Parameter containing:
tensor([-0.1833,  0.0645,  1.0221, -0.0162,  0.7427])
Parameter containing:
tensor([0.9942, 0.8594, 1.8087, 0.7058, 1.1887])

Parameter containing:
tensor([-0.1325,  0.0635,  0.8942, -0.0248,  0.9705])
Parameter containing:
tensor([0.9363, 0.9831, 1.4429, 0.7592, 1.8849])

Note that there is a convenience method of PolynomialNetwork also called update_standardization.