Skip to content

Model Compression & Adaption Support Grid

Below are tables summarizing the support for various compression and adaption techniques across different models. Apart from these, other similar models may be supported but have not been tested.

Vision Compression

Note that in the Table below, CPU and GPU indicate the target device of deployment. PTQ indicates Post Training Quantization and QAT indicates Quantization Aware Training.

Model CPU PTQ - Torch CPU PTQ - OpenVino CPU PTQ - ONNX GPU PTQ - TensorRT CPU QAT - Torch Knowledge Distillation Structured Pruning CPU QAT - OpenVino
Resnet (timm)
Convnextv2 (huggingface) - -
Mobilenetv3 (timm)
DeiT (huggingface)
VanillaNet (timm) - - -
Swin (huggingface)
YoloX (mmyolo/mmdet) - - -
RTMDet (mmyolo/mmdet) - - - -
Yolov8 (mmyolo) - - - - - -

LLM Compression

Model LLM Quantization LLM Engine TensorRT LLM Engine Exllama LLM Engine MLC-LLM LLM Structured Pruning
LLaMA
LlaMA-2
Vicuna
Mistral -
Mixtral -
Gemma - - -

Adapt

LLM Tasks

  • Text Generation
  • Summarization
  • Question Answering
  • Text Classification
  • Translation

All of the major huggingface models are supported for these tasks.

Image Classification

All major image models on huggingface and timm are supported in Adapt.

Object Detection

LoRA SSF DoRA Full Fine Tuning
YoloX
RTMDet

Instance Segmentation

LoRA SSF DoRA Full Fine Tuning
SegNeXT

Pose Detection

LoRA SSF DoRA Full Fine Tuning
RTMO

Note that quantization support (QLoRA/QSSF) for adaptation of vision models is currently not supported.