# Qwen3.5 最佳实践 ms-swift 支持使用transformers/Megatron后端对[Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/Moe模型进行训练。Qwen3.5 属于混合思考的多模态模型,结合了linear attention和full attention。本文将介绍如何对Qwen3.5 Dense/Moe模型进行推理、指令微调以及强化学习。 ## 环境设置 ```shell pip install -U ms-swift # "transformers==5.2.*" 会遇到与vllm的兼容问题,参考这个issue: https://github.com/modelscope/ms-swift/issues/8254 # "transformers==5.3.*" 会遇到视频训练问题,参考这个issue: https://github.com/modelscope/ms-swift/issues/8362 pip install -U "transformers==5.2.*" "qwen_vl_utils>=0.0.14" peft liger-kernel # flash-linear-attention # 若出现训练缓慢的问题请参考:https://github.com/fla-org/flash-linear-attention/issues/758 pip install -U "flash-linear-attention>=0.4.2" --no-build-isolation # causal_conv1d pip install -U git+https://github.com/Dao-AILab/causal-conv1d --no-build-isolation # flash-attention pip install "flash-attn==2.8.3" --no-build-isolation # deepspeed训练 pip install deepspeed # vllm (torch2.10) for inference/deployment/RL pip install -U "vllm>=0.17.0" # 对于强化学习(RL)训练,需要覆盖 vLLM 的默认安装版本 pip install -U "transformers==5.2.*" ``` - Qwen3.5 视频数据训练卡住:使用decord后端读取视频可能导致卡住问题,参考[这个issue](https://github.com/dmlc/decord/issues/269)。你可以使用torchcodec后端,具体参考[qwen_vl_utils](https://github.com/QwenLM/Qwen3-VL/blob/50068df2334f309979ff05d75f1078c8309c63ed/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L390-L400)库。 ## 推理 使用 ms-swift 的 `TransformersEngine` 进行推理: - 其中特定模型参数,例如 `VIDEO_MAX_TOKEN_NUM` 等环境变量的含义与Qwen3-VL相同,参考[命令行参数文档](../Instruction/Command-line-parameters.md#qwen3_vl,qwen3_5)。 ```python import os # os.environ['SWIFT_DEBUG'] = '1' os.environ['CUDA_VISIBLE_DEVICES'] = '0' os.environ['IMAGE_MAX_TOKEN_NUM'] = '1024' os.environ['VIDEO_MAX_TOKEN_NUM'] = '128' os.environ['FPS_MAX_FRAMES'] = '16' from swift import get_model_processor, get_template from swift.infer_engine import TransformersEngine, InferRequest, RequestConfig model, processor = get_model_processor('Qwen/Qwen3.5-4B') # attn_impl='flash_attention_2' template = get_template(processor, enable_thinking=False) engine = TransformersEngine(model, template=template) infer_request = InferRequest(messages=[{ "role": "user", "content": '