8bit Quantization Usage Guide | Weixin public doc

# 8bit Quantization Usage Guide

# start

Weixin Mini Program AI General Interface is a general AI model inference solution provided by the official platform, supporting Int8 model quantization inference.This can significantly improve model inference performance and reduce model storage and compute costs.

This guide shows how to optimize floating point classification Demo with this technique.

# 1. Get ready

Download the Model Quantization Tool and install dependencies.

git clone https://github.com/wechat-miniprogram/xnet-miniprogram.git && cd xnet-miniprogram/nncs && pip install -r requirements.txt

Please download the ImageNet dataset , or ImageNet-mini .
Please download the pretrained model Mobilenetv2 .catalog

ImageNet
|---train
|     |---n01440764
|     |---n01443537
|     |---...
|     |---n15075141
|---val
|     |---n01440764
|     |---n01443537
|     |---...
|     |---n15075141
nncs
|---nncs
|---demo
|     |---imagenet_classification
|---requirements.txt
|---README.md
mobilenet-v2-71dot82.onnx

# 2. Examples of Quantitative Training

Reference code: demo/imagenet_classification/train_imagenet_onnx.py
Modify data sources and ONNX paths:

    ...
    args.train_data = "/data/yangkang/datasets/ImageNet"
    args.val_data = "/data/yangkang/datasets/ImageNet"
    ...
    model = "mobilenet-v2-71dot82.onnx"

Run quantitative training

cd demo/imagenet_classification && python train_imagenet_onnx.py

Examples of logs:demo/imagenet_classification/nncs_onnx_lr1e-5.logfile，Floating-point model precision 71.82, after QAT fine-tuning precision 71.52.
Quantization model export: mobilenetv2_qat.onnx

python deploy.py

Quantization scheme support: Quantized Perceptual Training (QAT) and Post Training Quantization (PTQ)

# 3. Weixin Mini Program Demo

The Demo of Quantitative Classification borrows from the Floating Point Classification Demo .The differences to note are:

this.session = wx.createInferenceSession({
    model: modelPath,
    precisionLevel : 0,
    allowNPU : false,    
    allowQuantize: true, // 需设置为true，激活量化推理
    });

# 4. Operational effects

Scan the QR code below, then click on the interface - General AI Reasoning Capabilities - mobileNetInt8, to view the running results.

运行 demo，可以看到摄像头在采集同时，将会实时地将分类结果写回到页面下方。

完整 demo 请参考官方github小程序示例

# 5. Turn on the Time-consuming Test

  data: {
    predClass: "None",
    classifier: null,
    enableSpeedTest: true,  // 设置true
    avgTime: 110.0,
    minTime: 110.0
  },

The iphone13ProMax, the floating-point classification Demo takes about 10ms, and the quantized classification Demo takes about 5ms.