llm · 2025-03-04 0

docker 搭建 ollama 和 deepseek

工具

  1. ollama:用于下载和管理模型
  2. DeepSeek-R1:是要使用的 LLM模型
  3. Nomic-Embed-Text向量模型: 用于将文本库进行切分,编码,转换进入向量库

一、启动 ollama

1.拉取镜像

docker pull ollama/ollama:0.5.13-rc6

2.配置 Docker 使用 GPU

1) 安装 nvidia-container-toolkit

若使用 GPU,使用的是 Nvidia GPU,需安装 nvidia-container-toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
    | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
    | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
    | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt-get update
apt-get install -y nvidia-container-toolkit

2) 配置 Docker 使用 Nvidia driver

nvidia-ctk runtime configure --runtime=docker
systemctl restart docker

3.启动容器

1) docker run 启动

只使用 CPU

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama_1 ollama/ollama:0.5.13-rc6

若使用 GPU

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama_1 ollama/ollama:0.5.13-rc6

2) docker compose 启动

docker-compose.yml

version: "3"

services:
  ollama1:
    image: ollama/ollama:0.5.13-rc6
    container_name: ollama_1
    restart: no
    ports:
      - 11434:11434
    volumes:
      - "./.ollama:/root/.ollama"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

二、安装 deepseek-r1

1.安装 deepseek-r1:7b

ollama run deepseek-r1:7b

2.安装 nomic-embed-text

nomic-embed-text 模型是一个强大的嵌入式文本处理工具

ollama pull nomic-embed-text

3.查看模型

root@509d39be4053:/# ollama list
NAME                       ID              SIZE      MODIFIED     
nomic-embed-text:latest    0a109f422b47    274 MB    26 hours ago    
deepseek-r1:7b             0a8c26691023    4.7 GB    26 hours ago    
deepseek-r1:1.5b           a42b25d8c10a    1.1 GB    26 hours ago
root@509d39be4053:/# ollama ps
NAME              ID              SIZE      PROCESSOR          UNTIL              
deepseek-r1:7b    0a8c26691023    6.1 GB    58%/42% CPU/GPU    4 minutes from now

4.模型信息

1) deepseek-r1:7b 模型不支持工具
2) qwen3:4b 模型支持工具

root@01c01ee81203:~# ollama show deepseek-r1:7b
  Model
    architecture        qwen2     
    parameters          7.6B      
    context length      131072    
    embedding length    3584      
    quantization        Q4_K_M    

  Capabilities
    completion    
    thinking      

  Parameters
    stop    "<|begin▁of▁sentence|>"    
    stop    "<|end▁of▁sentence|>"      
    stop    "<|User|>"                 
    stop    "<|Assistant|>"            

  License
    MIT License                    
    Copyright (c) 2023 DeepSeek    
    ...                            
root@01c01ee81203:~# ollama show qwen3:4b       
  Model
    architecture        qwen3     
    parameters          4.0B      
    context length      262144    
    embedding length    2560      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         
    thinking      

  Parameters
    top_k             20                
    top_p             0.95              
    repeat_penalty    1                 
    stop              "<|im_start|>"    
    stop              "<|im_end|>"      
    temperature       0.6               

  License
    Apache License               
    Version 2.0, January 2004    
    ...                          

5.构建本地知识库

初期接触LLM即大语言模型,觉得虽然很强大,但是有时候AI会一本正经的胡说八道,这种大模型的幻觉对于日常使用来说具有很大的误导性,特别是如果我们要用在生成环境下,由于缺少精确性而无法使用。 为什么会造成这种结果那,简单来说就是模型是为了通用性设计的,缺少相关知识,所以导致回复的结果存在胡说八道的情况,根据香农理论,减少信息熵,就需要引入更多信息。

从这个角度来说,就有两个途径,一是重新利用相关专业知识再次训练加强模型,或进行模型微调; 模型训练的成本是巨大的,微调也需要重新标记数据和大量的计算资源,对于个人来说基本不太现实; 二是在问LLM问题的时候,增加些知识背景,让模型可以根据这些知识背景来回复问题;后者即是知识库的构建原理了。

有个专门的概念叫RAG(Retrieval-Augmented Generation),即检索增强生成,是一种结合检索技术和生成模型的技术框架,旨在提升模型生成内容的准确性和相关性。其核心思想是:在生成答案前,先从外部知识库中检索相关信息,再将检索结果与用户输入结合,指导生成模型输出更可靠的回答。

简单概述,利用已有的文档、内部知识生成向量知识库,在提问的时候结合库的内容一起给大模型,让其回答的更准确,它结合了信息检索和大模型技术。

这样做有什么好处那?

  1. 由于日常的业务知识是保存到本地的,所以减少信息泄露的风险;
  2. 由于提问结合了业务知识,所以减少了模型的幻觉,即减少了模型的胡说八道;
  3. 模型的回复结合了业务知识和实时知识,所以实时性可以更好;
  4. 不用重新训练模型,微调模型降低了成本;

三、ollama 可视化

1.page-assist

可安装 google 浏览器插件,下载地质 https://github.com/n4ze3m/page-assist/releases

Ollama 设置:
Ollama

RAG 设置:
RAG

使用:
使用

四、GPU 使用情况

使用 nvidia-smi 查看 GPU 使用

zxm@zxm-pc:~$ nvidia-smi
Tue Mar  4 23:29:00 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1060 3GB    Off | 00000000:01:00.0  On |                  N/A |
| 35%   35C    P0              29W / 120W |    266MiB /  3072MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1781      G   /usr/lib/xorg/Xorg                          149MiB |
|    0   N/A  N/A      1927      G   /usr/bin/gnome-shell                         30MiB |
|    0   N/A  N/A     73461      G   ...seed-version=20250228-151446.092000       82MiB |
+---------------------------------------------------------------------------------------+

ollama 使用 GPU

zxm@zxm-pc:~$ nvidia-smi
Tue Mar  4 23:31:21 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1060 3GB    Off | 00000000:01:00.0  On |                  N/A |
| 35%   43C    P2              78W / 120W |   2339MiB /  3072MiB |     18%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1781      G   /usr/lib/xorg/Xorg                          149MiB |
|    0   N/A  N/A      1927      G   /usr/bin/gnome-shell                         31MiB |
|    0   N/A  N/A     73461      G   ...seed-version=20250228-151446.092000       78MiB |
|    0   N/A  N/A     75704      C   /usr/bin/ollama                            2074MiB |
+---------------------------------------------------------------------------------------+

五、OpenAI 兼容性

1.列出本地模型

请求:

curl http://localhost:11434/api/tags

响应:

{
    "models": [
        {
            "name": "qwen3:14b",
            "model": "qwen3:14b",
            "modified_at": "2025-10-24T15:34:59.157623395Z",
            "size": 9276198565,
            "digest": "bdbd181c33f2ed1b31c972991882db3cf4d192569092138a7d29e973cd9debe8",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "qwen3",
                "families": [
                    "qwen3"
                ],
                "parameter_size": "14.8B",
                "quantization_level": "Q4_K_M"
            }
        },
        {
            "name": "nomic-embed-text:latest",
            "model": "nomic-embed-text:latest",
            "modified_at": "2025-10-23T15:44:00.04103599Z",
            "size": 274302450,
            "digest": "0a109f422b47e3a30ba2b10eca18548e944e8a23073ee3f3e947efcf3c45e59f",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "nomic-bert",
                "families": [
                    "nomic-bert"
                ],
                "parameter_size": "137M",
                "quantization_level": "F16"
            }
        },
        {
            "name": "deepseek-r1:7b",
            "model": "deepseek-r1:7b",
            "modified_at": "2025-10-23T15:41:22.653128255Z",
            "size": 4683075440,
            "digest": "755ced02ce7befdb13b7ca74e1e4d08cddba4986afdb63a480f2c93d3140383f",
            "details": {
                "parent_model": "",
                "format": "gguf",
                "family": "qwen2",
                "families": [
                    "qwen2"
                ],
                "parameter_size": "7.6B",
                "quantization_level": "Q4_K_M"
            }
        }
    ]
}

2.显示模型信息

curl http://localhost:11434/api/show -d '{
  "model": "deepseek-r1:7b"
}'

3.调用嵌入模型

请求:

curl http://localhost:11434/api/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nomic-embed-text:latest",
    "prompt": "你好"
  }'

响应:

{
    "embedding": [
        -0.12307237833738327,
        0.29820987582206726,
        -3.833275556564331,
        0.06487877666950226,
        1.4995296001434326,
        0.23798543214797974,
        -0.7658764123916626,
        -0.44865143299102783,
        -0.40256860852241516,
        -1.2598717212677002,
        -0.8907259702682495,
        1.7141786813735962,
        0.1831144392490387,
        0.16616633534431458,
        0.13582943379878998,
        -1.0568212270736694,
        0.05641965940594673,
        -1.422386884689331,
        -0.9263020753860474,
        1.2330042123794556,
        -0.8702852725982666,
        0.8141596913337708,
        -0.19736900925636292,
        -0.8921308517456055,
        4.122570514678955,
        -0.3852195739746094,
        0.8616183400154114,
        1.2724435329437256,
        -0.07922960817813873,
        0.4311417043209076,
        0.24930191040039062,
        -0.8231167793273926,
        -0.39267492294311523,
        0.3824201822280884,
        -2.01654052734375
    ]
}

3.调用对话

1) openapi

请求:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "deepseek-r1:7b",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "你是谁?"
            }
        ]
    }'

相应:

{
    "id": "chatcmpl-141",
    "object": "chat.completion",
    "created": 1761238914,
    "model": "deepseek-r1:7b",
    "system_fingerprint": "fp_ollama",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\u003cthink\u003e\n我是DeepSeek-R1,一个由深度求索公司开发的智能助手,我会尽我所能为您提供帮助。\n\u003c/think\u003e\n\n我是DeepSeek-R1,一个由深度求索公司开发的智能助手,我会尽我所能为您提供帮助。"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 12,
        "completion_tokens": 53,
        "total_tokens": 65
    }
}

2) qwen

请求:

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "你是谁?"
        }
    ]
}'

响应:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "我是通义千问,阿里巴巴集团旗下的超大规模语言模型。我能够回答问题、创作文字,如写故事、公文、邮件、剧本等,还能进行逻辑推理、编程,表达观点,玩游戏等。我支持多种语言,包括但不限于中文、英文、德语、法语、西班牙语等。如果你有任何问题或需要帮助,欢迎随时告诉我!"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 22,
        "completion_tokens": 79,
        "total_tokens": 101,
        "prompt_tokens_details": {
            "cached_tokens": 0
        }
    },
    "created": 1761236409,
    "system_fingerprint": null,
    "model": "qwen-plus",
    "id": "chatcmpl-c6ba1546-40ee-4978-914c-62b7dbe23efd"
}