Intel® Neural Compressor



Intel® Neural Compressor supports automatic quantization tuning flow by converting quantizable layers to INT8 and allowing users to control model accuracy and performance tradeoffs and implements the latest quantization algorithms from the research community.

Visit the Intel® Neural Compressor online document website at: intel.github.io/neural-compressor.

Run experiments with quantized models in 4 steps:

1 Create project and choose input model

2 Add optimization to get optimized models

3 Benchmark and profile optimized models

Create new project

Click to create new project and start adding optimizations.