沐曦集群使用vllm

Ethereal Lv4

1. 提前下载好文件

下载地址:https://pub-docstore.metax-tech.com:7001/

  • 账号:wuluo

可以分享后使用,例如:https://pub-docstore.metax-tech.com:7001/sharing/PsE45bzdD

编写爬虫脚本(目前还没有跑通,浏览器崩溃):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from pyvirtualdisplay import Display
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.by import By
import os
import time

download_dir = os.path.join(os.getcwd(), "firefox_downloads")
driver_path = "/root/yjk/softwares/geckodriver"
url = "https://pub-docstore.metax-tech.com:7001/sharing/PsE45bzdD"

def start_download():
# 如果目录不存在,则创建它
if not os.path.exists(download_dir):
os.makedirs(download_dir)

# 创建一个虚拟显示
display = Display(visible=0, size=(1280, 768))
display.start()

# 配置Firefox选项
firefox_options = Options()
# 设置下载相关的首选项
firefox_options.set_preference("browser.download.folderList", 2) # 0是桌面; 1是默认“下载”' 2是自定义文件夹
firefox_options.set_preference("browser.download.dir", download_dir) # 设置自定义下载路径
firefox_options.set_preference("browser.download.useDownloadDir", True)
# 对于特定文件类型,禁用下载前的询问弹窗
# "application/octet-stream" 是一个通用的二进制文件类型
firefox_options.set_preference("browser.helperApps.neverAsk.saveToDisk",
"application/zip, application/octet-stream, application/x-zip-compressed, application/x-tar, application/pdf")
firefox_options.headless = True # 以headless模式运行,具体代码如下:

# 创建Firefox
service = Service(driver_path)
browser = webdriver.Firefox(options=firefox_options, service=service)
browser.get(url)

# --- 等待页面加载完成 ---
print("等待页面加载...")
browser.implicitly_wait(10) # 等待10秒钟,直到页面加载完成

button = browser.find_element(By.XPATH, '//*[@id="ext-gen43"]')
button.click()

# --- 等待下载完成 (此逻辑与浏览器无关) ---
print("开始等待下载完成...")
timeout = 60 * 30 # 设置60秒的超时时间
start_time = time.time()
download_complete = False
new_file_path = None

files_before = set(os.listdir(download_dir))
while time.time() - start_time < timeout:
files_after = set(os.listdir(download_dir))
new_files = files_after - files_before

if time.time() - start_time > 10:
if not browser.service.process:
print('Browser has quit unexpectedly')
browser.save_screenshot('1.png')

if new_files:
filename = new_files.pop()
# Firefox 下载时可能会创建 .part 文件,当下载完成时,.part 后缀会消失
if not filename.endswith('.part'):
new_file_path = os.path.join(download_dir, filename)
print(f"✅ 下载完成!文件位于: {new_file_path}")
download_complete = True
break

time.sleep(0.5)

if not download_complete:
print(f"❌ 下载超时!在{timeout}秒内未完成下载。")

# --- 6. 使用找到的文件 ---
if new_file_path:
file_size = os.path.getsize(new_file_path)
print(f"文件大小: {file_size / 1024 / 1024:.2f} MB")

browser.quit()
display.stop()

if __name__ == "__main__":
start_download()

目前通过挂载的方式将文件存储在以下位置:sudo mount -t nfs 10.118.14.133:/zion0/nfsdir /nfsdir0

2. 通过镜像加载运行

镜像位置位于/nfsdir0/WT-X201/2.32.0.x/images/pytorch,可以导入vllm镜像,位置位于/nfsdir0/WT-X201/2.32.0.x/images/AI/py310/vllm_2.32.0.11-torch2.4-py310-ubuntu20.04-amd64.tar,pytorch位置位于/nfsdir0/WT-X201/2.32.0.x/images/pytorch/hpcc-x201-pytorch_2.32.0.3-torch2.4-py310-ubuntu20.04-amd64.tar

使用以下命令导入:

1
docker load < /nfsdir0/WT-X201/2.32.0.x/images/AI/py310/vllm_2.32.0.11-torch2.4-py310-ubuntu20.04-amd64.tar

使用set -x可以发现,docker实际执行的命令是nerdctl -n k8s.io --address /data/containerd/run/containerd.sock

编写以下脚本运行镜像:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash

# Default Docker image
DEFAULT_IMAGE="vllm:2.32.0.11-torch2.4-py310-ubuntu20.04-amd64"

# Check if an image name is provided as an argument
if [ -n "$1" ]; then
DOCKER_IMAGE="$1"
else
DOCKER_IMAGE="$DEFAULT_IMAGE"
fi

# Start building the docker run command
DOCKER_COMMAND="nerdctl -n k8s.io --address /data/containerd/run/containerd.sock run"

# Add the always-included device if it exists
if [ -e "/dev/htcd" ]; then
DOCKER_COMMAND+=" --device=/dev/htcd"
else
echo "Warning: /dev/htcd not found on the host."
fi

# Get all devices under /dev/dri and add them to the command
if [ -d "/dev/dri" ]; then
for device in /dev/dri/*; do
if [ -c "$device" ]; then # Check if it's a character device
DOCKER_COMMAND+=" --device=$device"
fi
done
else
echo "Error: /dev/dri directory not found on the host. Graphics devices might not be available."
exit 1
fi

# Add the rest of your docker command using the chosen image
DOCKER_COMMAND+=" -it ${DOCKER_IMAGE} /bin/bash"

# Print the generated command
echo "Generated Docker command:"
echo "$DOCKER_COMMAND"

# Execute the command (uncomment the line below to actually run it)
eval "$DOCKER_COMMAND"

脚本版本2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#!/bin/bash

# Enable shell tracing for debugging (optional, can be removed once confident)
# set -x

# --- IMPORTANT: Find your DOCKER/NERDCTL executable path ---
# 1. Deactivate conda: `conda deactivate`
# 2. Run: `which docker`
# 3. If it points to a script/symlink that calls nerdctl, use that script's path.
# Otherwise, use the direct binary path shown by `which docker`.
# Example: DOCKER_BIN="/usr/bin/docker"
# If `which docker` gives no output outside conda, you'll need to find it manually:
# `find / -name docker -type f 2>/dev/null | grep -E "(bin/docker|sbin/docker)"`
# Once found, replace the placeholder below with the actual path.
DOCKER_BIN="nerdctl -n k8s.io --address /data/containerd/run/containerd.sock" # <--- REPLACE THIS WITH YOUR ACTUAL DOCKER/NERDCTL PATH

# Default Docker image (used if not provided as the second argument)
DEFAULT_IMAGE="vllm:2.32.0.11-torch2.4-py310-ubuntu20.04-amd64"

# --- Argument Parsing ---
# Check for at least one argument (mapping directory)
if [ -z "$1" ]; then
echo "Usage: $0 <host_dir_to_map> [docker_image_name]"
echo " <host_dir_to_map>: The host directory to mount into the container (e.g., /home/user/data:/app/data)"
echo " [docker_image_name]: Optional. The Docker image to use. Defaults to ${DEFAULT_IMAGE}"
exit 1
fi

MAPPED_DIR="$1" # First argument is the directory mapping

# Check for the second argument (Docker image name)
if [ -n "$2" ]; then
DOCKER_IMAGE="$2"
else
DOCKER_IMAGE="$DEFAULT_IMAGE"
fi

# Start building the docker run command
DOCKER_COMMAND="${DOCKER_BIN} run"

# Add the directory mapping
DOCKER_COMMAND+=" -v ${MAPPED_DIR}"

# Add the always-included device if it exists
if [ -e "/dev/htcd" ]; then
DOCKER_COMMAND+=" --device=/dev/htcd"
else
echo "Warning: /dev/htcd not found on the host."
fi

# Get all devices under /dev/dri and add them to the command
if [ -d "/dev/dri" ]; then
for device in /dev/dri/*; do
if [ -c "$device" ]; then # Check if it's a character device
DOCKER_COMMAND+=" --device=$device"
fi
done
else
echo "Error: /dev/dri directory not found on the host. Graphics devices might not be available."
exit 1
fi

# Add the rest of your docker command using the chosen image
DOCKER_COMMAND+=" -it ${DOCKER_IMAGE} /bin/bash"

# Print the generated command before execution
echo "-------------------------------------"
echo "Generated command string (before eval):"
echo "$DOCKER_COMMAND"
echo "-------------------------------------"

# Execute the command (uncomment the line below to actually run it)
eval "$DOCKER_COMMAND"

# Disable shell tracing (if enabled)
# set +x

脚本版本3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#!/bin/bash

# Enable shell tracing for debugging (optional, can be removed once confident)
# set -x

# --- IMPORTANT: Find your DOCKER/NERDCTL executable path ---
# 1. Deactivate conda: `conda deactivate`
# 2. Run: `which docker`
# 3. If it points to a script/symlink that calls nerdctl, use that script's path.
# Otherwise, use the direct binary path shown by `which docker`.
# Example: DOCKER_BIN="/usr/bin/docker"
# If `which docker` gives no output outside conda, you'll need to find it manually:
# `find / -name docker -type f 2>/dev/null | grep -E "(bin/docker|sbin/docker)"`
# Once found, replace the placeholder below with the actual path.
DOCKER_BIN="nerdctl -n k8s.io --address /data/containerd/run/containerd.sock" # <--- REPLACE THIS WITH YOUR ACTUAL DOCKER/NERDCTL PATH

# Default Docker image (used if not provided as the second argument)
DEFAULT_IMAGE="vllm:2.32.0.11-torch2.4-py310-ubuntu20.04-amd64"

DOCKER_IMAGE="${DOCKER_IMAGE:-$DEFAULT_IMAGE}"

# Check if at least one volume mapping argument is provided.
# "$#" holds the total number of command-line arguments.
if [ "$#" -eq 0 ]; then
echo "Error: At least one volume mapping is required."
echo
echo "Usage: [DOCKER_IMAGE=<image_name>] $0 <host_dir1:container_dir1> [<host_dir2:container_dir2> ...]"
echo
echo "Description:"
echo " This script starts a Docker container and maps one or more host directories"
echo " into the container."
echo
echo "Arguments:"
echo " <host_dir:container_dir> A directory to mount, with host and container paths separated by a colon."
echo " Multiple mapping arguments can be provided."
echo
echo "Environment Variable:"
echo " DOCKER_IMAGE The Docker image to use. Defaults to: '${DEFAULT_IMAGE}'"
echo
echo "Examples:"
echo " # Map a single directory using the default image (${DEFAULT_IMAGE})"
echo " $0 /home/user/data:/app/data"
echo
echo " # Map multiple directories with a specific image"
echo " DOCKER_IMAGE=python:3.9-slim $0 /home/user/project:/app /home/user/logs:/logs"
exit 1
fi

# Start building the docker run command
DOCKER_COMMAND="${DOCKER_BIN} run -it --rm"

# Iterate over all command-line arguments.
# "$@" treats each command-line argument as a separate, quoted string.
for mapping in "$@"; do
# For each argument, add the "-v" flag and the mapping itself to our command array.
DOCKER_COMMAND+=" -v ${mapping}"
done

# Add the always-included device if it exists
if [ -e "/dev/htcd" ]; then
DOCKER_COMMAND+=" --device=/dev/htcd"
else
echo "Warning: /dev/htcd not found on the host."
fi

# Get all devices under /dev/dri and add them to the command
if [ -d "/dev/dri" ]; then
for device in /dev/dri/*; do
if [ -c "$device" ]; then # Check if it's a character device
DOCKER_COMMAND+=" --device=$device"
fi
done
else
echo "Error: /dev/dri directory not found on the host. Graphics devices might not be available."
exit 1
fi

# Add the rest of your docker command using the chosen image
DOCKER_COMMAND+=" ${DOCKER_IMAGE} /bin/bash"

# Print the generated command before execution
echo "-------------------------------------"
echo "Generated command string (before eval):"
echo "$DOCKER_COMMAND"
echo "-------------------------------------"

# Execute the command (uncomment the line below to actually run it)
eval "$DOCKER_COMMAND"

# Disable shell tracing (if enabled)
# set +x

测试命令ht-smiimport vllmimport torchprint(torch.cuda.is_available())

运行vllm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# vllm_model.py
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
import os
import json

# 自动下载模型时,指定使用modelscope。不设置的话,会从 huggingface 下载
os.environ['VLLM_USE_MODELSCOPE']='True'

def get_completion(prompts, model, tokenizer=None, max_tokens=512, temperature=0.8, top_p=0.95, max_model_len=2048):
stop_token_ids = [151329, 151336, 151338]
# 创建采样参数。temperature 控制生成文本的多样性,top_p 控制核心采样的概率
sampling_params = SamplingParams(temperature=temperature, top_p=top_p, max_tokens=max_tokens, stop_token_ids=stop_token_ids)
# 初始化 vLLM 推理引擎
llm = LLM(model=model, tokenizer=tokenizer, max_model_len=max_model_len,trust_remote_code=True)
outputs = llm.generate(prompts, sampling_params)
return outputs


if __name__ == "__main__":
# 初始化 vLLM 推理引擎
model='/root/autodl-tmp/qwen/Qwen2-7B-Instruct' # 指定模型路径
# model="qwen/Qwen2-7B-Instruct" # 指定模型名称,自动下载模型
tokenizer = None
# 加载分词器后传入vLLM 模型,但不是必要的。
# tokenizer = AutoTokenizer.from_pretrained(model, use_fast=False)

text = ["你好,帮我介绍一下什么时大语言模型。",
"可以给我将一个有趣的童话故事吗?"]
# messages = [
# {"role": "system", "content": "你是一个有用的助手。"},
# {"role": "user", "content": prompt}
# ]
# 作为聊天模板的消息,不是必要的。
# text = tokenizer.apply_chat_template(
# messages,
# tokenize=False,
# add_generation_prompt=True
# )

outputs = get_completion(text, model, tokenizer=tokenizer, max_tokens=512, temperature=1, top_p=1, max_model_len=2048)

# 输出是一个包含 prompt、生成文本和其他信息的 RequestOutput 对象列表。
# 打印输出。
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

使用vllm serve:

经过测试,只能使用Qwen2.5-1.5B-Instruct模型:

1
2
3
4
pip install modelscope
modelscope download --model Qwen/Qwen2.5-1.5B-Instruct
cp -r ~/.cache/modelscope/hub/models/Qwen/ ./model/
vllm serve /root/python_venv/llm/model/Qwen/Qwen2.5-1.5B-Instruct

将在8000端口可用:

1
2
3
4
5
6
curl http://localhost:8000/v1/completions -H "Content-Type: application/json"   -d '{
"model": "/root/python_venv/llm/model/Qwen/Qwen2.5-1.5B-Instruct",
"prompt": ["<|begin▁of▁sentence|>你好,DeepSeek!<|end▁of▁sentence|>"],
"max_tokens": 100,
"temperature": 0.6
}'

测试暂时还无法安装lmcache,缺少nvcc

1
2
3
4
5
6
7
8
9

## 3. 测试结果

```shell
Average latency: 0.22ms
P50 latency: 0.15ms
P90 latency: 0.16ms
P99 latency: 1.97ms
Average throughput: 84.45 GB/s

参考

在Python中,如何检查Selenium WebDriver是否已经退出? - Dev59

Python + Selenium: 使用webdriver判断浏览器是否关闭 - 代码先锋网

简单但好用:4种Selenium截图方法了解一下_selenium 截图-CSDN博客

Failed to decode response from marionette - Error to open Webdriver using python · Issue #690 · mozilla/geckodriver

Selenium与自动化测试技巧-CSDN博客

记录一次爬虫报错:Message: Failed to decode response from marionette-CSDN博客

wheel安装+使用wheel安装第三方库+临时换源安装和永久换源安装-CSDN博客

pip使用中科大源、清华源或修改默认源为中科大源、清华源_清华源后缀-CSDN博客

代码移植 - 新手社区 - 沐曦开发者论坛

(3 封私信 / 81 条消息) 新一代书生·浦语大模型(InternLM3)沐曦训推实践 - 知乎

vllm部署求助 - LLM - 沐曦开发者论坛

Services top-level elements | Docker Docs

沐曦C500 VGPU使用 - 运维 - 沐曦开发者论坛

离线安装沐曦 GPU 管理 - DaoCloud Enterprise

Docker容器如何优雅使用NVIDIA GPU-腾讯云开发者社区-腾讯云

[大模型]Qwen2-7B-Instruct vLLM 部署调用_vllm qwen2-CSDN博客

Connection refused · Issue #798 · vllm-project/vllm

有没有佬使用vllm框架的?怎么我无法serve起来服务? - 开发调优 - LINUX DO

vllm serve的参数大全及其解释-CSDN博客

[Bug]: 使用vllm sever 出现会卡住不动 v100-32g · Issue #13753 · vllm-project/vllm

What happens after model_running.py loading model weights? · vllm-project/vllm · Discussion #10175

引擎参数 — vLLM 文档

快速开始 | vLLM 中文站

vllm减小显存 | vllm小模型大显存问题_gpu-memory-utilization-CSDN博客

Example: Offload KV cache to CPU | LMCache

  • Title: 沐曦集群使用vllm
  • Author: Ethereal
  • Created at: 2025-06-20 13:01:19
  • Updated at: 2025-06-23 12:51:32
  • Link: https://ethereal-o.github.io/2025/06/20/沐曦集群使用vllm/
  • License: This work is licensed under CC BY-NC-SA 4.0.
 Comments
On this page
沐曦集群使用vllm