Skip to content

bug: MiniMax-M2.5 NotImplementedError: No FP8 MoE backend supports the deployment configuration. #317

@chris-aug

Description

@chris-aug

Bug Description

Issue Summary

Encountered NotImplementedError: No FP8 MoE backend supports the deployment configuration when attempting to deploy MiniMax-M2.5 using vLLM. The error persists even when explicitly specifying --dtype float16 in the launch command.

Steps to Reproduce

  1. Download the MiniMax-M2.5 model
  2. Inspect the model's config.json, which contains the following FP8 quantization configuration:
    {
      "quantization_config": {
        "activation_scheme": "dynamic",
        "fmt": "float8_e4m3fn",
        "quant_method": "fp8",
        "weight_block_size": [128, 128],
        "modules_to_not_convert": ["gate", "e_score_correction_bias", "lm_head"]
      }
    }
    

Steps to Reproduce

Execute the following launch command:

python -m vllm.entrypoints.openai.api_server \
    --host 0.0.0.0 \
    --port 8356 \
    --model models/MiniMax-M2.5 \
    --gpu-memory-utilization 0.95 \
    --trust-remote-code \
    --tensor-parallel-size 8 \
    --dtype float16 \
    --no-enable-prefix-caching \
    --no-enable-chunked-prefill \
    --distributed-executor-backend mp \
    --served-model-name MiniMax-M2.5 \
    --enable-auto-tool-choice \
    --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --enforce-eager

Installation Documentation
https://vllm-kunlun.readthedocs.io/en/v0.15.1/installation.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions