Quantizing a neural network does not necessarily lead to reduction in the physical storage size needed to represent the model. The goal is to replace floating point computations with fixed point or look up table based activation functions as well as reducing the bits needed for activation function calculation. A detailed definition and workflows that go behind quantizing a network can be found in the documentation below:
As such the goal is to reduce execution memory requirements and specialised hardware requirements allowing ML and DL models to run on lower power hardware
