SpringBoot+本地部署大模型实现RAG知识库功能
- 1、Linux系统部署本地大模型
- 1.1 安装ollama
- 1.2 启动ollama
- 1.3 下载deepseek模型
- 2、Springboot代码调用本地模型实现基础问答功能
- 3、集成向量数据库
- 4、知识库数据喂取
- 5、最终实现RAG知识库功能
1、Linux系统部署本地大模型
1.1 安装ollama
# wget https://ollama.com/download/ollama-linux-amd64.tgz
# tar -C /usr/local -zxvf ollama-linux-amd64.tgz
1.2 启动ollama
# ollama serve
// 这里注意如果要允许其他客户端远程调用本模型的话需要执行以下启动命令
OLLAMA_HOST=0.0.0.0 OLLAMA_ORIGINS=* ollama serve
或者
OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_ORIGINS=* ollama serve > ollama.log 2>&1
这里我启动的时候遇到了报错(主要问题是服务器上的libstdc++ 版本低)
ollama: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.25‘ not found (required by ollama)
解决办法如下:(链接放这里)
https://blog.csdn.net/u011250186/article/details/147144845
这里需要注意的是这里的第三步骤配置并编译的时间会比较长
1.3 下载deepseek模型
ollama run deepseek-r1:1.5b
可以直接通过服务器去先模型发起提问:
2、Springboot代码调用本地模型实现基础问答功能
这里前端我主要做的是知识库基本问答功能的一个界面可以调用后台接口然后后台接口根据http请求去调用本地模型的问题大接口生成回复。
@GetMapping(value = "/getArtificialIntelligence")public ResponseEntity<String> getFaultsByTaskId(@RequestParam(name = "message") String message) throws PromptException {return ResponseEntity.ok(aiService.getArtificialIntelligence(message));}
@Value("${ollama.url}")private String OLLAMA_API_URL;@Value("${ollama.model}")private String OLLAMA_MODEL;@Overridepublic String getArtificialIntelligence(String message) throws PromptException {try {// 1. 构建请求体OllamaRequest request = new OllamaRequest(OLLAMA_MODEL, message,true);// 2. 发送请求HttpHeaders headers = new HttpHeaders();headers.setContentType(MediaType.APPLICATION_JSON);HttpEntity<OllamaRequest> entity = new HttpEntity<>(request, headers);ResponseEntity<OllamaResponse> response = restTemplate.exchange(OLLAMA_API_URL,HttpMethod.POST,entity,OllamaResponse.class);System.out.println(response);System.out.println(response.getBody().getResponse());// 3. 解析响应// generate接口if (response.getStatusCode().is2xxSuccessful() && response.hasBody()) {return response.getBody().getResponse();}throw new PromptException("API响应异常:" + response.getStatusCode());} catch (HttpStatusCodeException e) {// 精准处理HTTP状态码switch (e.getStatusCode().value()) {case 400:throw new PromptException("请求参数错误:" + e.getResponseBodyAsString());case 404:throw new PromptException("模型未找到,请检查配置");case 500:throw new PromptException("模型服务内部错误");default:throw new PromptException("API请求失败:" + e.getStatusCode());}} catch (Exception e) {throw new PromptException("系统异常:" + e.getMessage());}}
3、集成向量数据库
这里我选择使用的是PostgreSQL的vector向量拓展
官网地址:
https://pgxn.org/dist/vector/0.7.4/README.html#Windows
首先必须确保已安装Visual Studio 中的 C++ 支持,然后运行:
& "D:\SoftWare\visual Studio\VC\Auxiliary\Build\vcvars64.bat"cd D:\SoftWare\pgvector\pgvector-master
set "PGROOT=D:\SoftWare\PgSQL"
nmake /F Makefile.win
nmake /F Makefile.win install# 设置 Visual Studio 环境变量
& "D:\SoftWare\Visual Studio\VC\Auxiliary\Build\vcvars64.bat"# 进入源码目录
cd D:\SoftWare\pgvector\pgvector-master# 执行编译
& "D:\SoftWare\Visual Studio\VC\Tools\MSVC\14.43.34808\bin\Hostx64\x64\nmake.exe" /F Makefile.win PGROOT="D:\SoftWare\PgSQL" PG_CONFIG="D:\SoftWare\PgSQL\bin\pg_config.exe"
然后就可以根据pgsql的拓展给的提示去建数据库和表了
4、知识库数据喂取
我是通过前端界面实现一个文件上传功能然后将文件解析成向量然后存入向量数据库。
这里使用到了另外一个deepseek模型 nomic-embed-text
需要通过此模型将文件中的数据转成向量然后存储到数据库中。
后端代码如下:
@PostMapping("/upload")public ResponseEntity<String> uploadFile(@RequestParam("multipartFiles") MultipartFile file) {try {documentService.processUploadedFile(file);return ResponseEntity.ok("文件已成功解析并存入数据库");} catch (Exception e) {return ResponseEntity.status(500).body("文件处理失败:" + e.getMessage());}}
private final Tika tika = new Tika();@Resourceprivate EmbeddingService embeddingService;@Resourceprivate DocumentChunkRepository documentChunkRepository;/*** 处理上传的文件* @param file*/@Overridepublic void processUploadedFile(MultipartFile file) throws IOException, TikaException {String fileName = file.getOriginalFilename();if (fileName == null || fileName.isEmpty()) throw new IllegalArgumentException("文件名为空");// 使用Tika解析文件为文本String textContent = tika.parseToString(file.getInputStream());// 对文本进行分块 - 改进分块策略
// List<String> chunks = splitTextIntoChunks(textContent, 512);List<String> chunks = splitTextIntoChunks(textContent, 512, 100);// 批量保存DocumentChunk对象List<DocumentChunk> documentChunks = new ArrayList<>();for (String chunk : chunks) {try {float[] embedding = embeddingService.getEmbedding(chunk);DocumentChunk chunkEntity = new DocumentChunk();chunkEntity.setFilename(fileName);chunkEntity.setContent(chunk);chunkEntity.setEmbedding(embedding);documentChunks.add(chunkEntity);} catch (Exception e) {logger.error("处理文本块时出错: {}", e.getMessage(), e);}}// 批量保存到数据库if (!documentChunks.isEmpty()) {documentChunkRepository.saveAll(documentChunks);}}/*** 将文本分割成适当大小的块,同时尝试保留句子或段落的完整性。** @param text 文本内容* @param maxChunkSize 每个块的最大字符数* @return 分割后的文本块列表*/private List<String> splitTextIntoChunks(String text, int maxChunkSize, int overlap) {List<String> chunks = new ArrayList<>();StringBuilder currentChunk = new StringBuilder(maxChunkSize);String[] sentences = text.split("。|?|!|\\n"); // 按句号/换行切分句子for (String sentence : sentences) {if (sentence.trim().isEmpty()) continue;if (currentChunk.length() + sentence.length() > maxChunkSize) {chunks.add(currentChunk.toString());// 添加 overlap 部分if (overlap > 0 && !chunks.isEmpty()) {String lastPart = getLastNChars(chunks.get(chunks.size() - 1), overlap);currentChunk = new StringBuilder(lastPart).append(sentence);} else {currentChunk = new StringBuilder(sentence);}} else {currentChunk.append(sentence).append(" ");}}if (currentChunk.length() > 0) {chunks.add(currentChunk.toString());}return chunks;
}// 辅助函数:取字符串末尾 n 字符
private String getLastNChars(String str, int n) {return str.length() > n ? str.substring(str.length() - n) : str;
}
@Service
public class EmbeddingServiceImpl implements EmbeddingService {private final RestTemplate restTemplate = new RestTemplate();private final ObjectMapper objectMapper = new ObjectMapper();@Overridepublic float[] getEmbedding(String text) {String url = "http://192.168.2.45:11434/api/embeddings";ObjectMapper objectMapper = new ObjectMapper();try {Map<String, Object> requestBody = new HashMap<>();requestBody.put("model", "nomic-embed-text");requestBody.put("prompt", text);String requestBodyJson = objectMapper.writeValueAsString(requestBody);// 发送POST请求并接收响应ResponseEntity<String> responseEntity = restTemplate.postForEntity(url, requestBodyJson, String.class);if (responseEntity.getStatusCode() != HttpStatus.OK) {throw new RuntimeException("HTTP 错误状态码: " + responseEntity.getStatusCodeValue());}Map<String, Object> map = objectMapper.readValue(responseEntity.getBody(), Map.class);Object embeddingObj = map.get("embedding");if (embeddingObj instanceof float[]) {return (float[]) embeddingObj;} else if (embeddingObj instanceof List<?>) {@SuppressWarnings("unchecked")List<Double> list = (List<Double>) embeddingObj;float[] arr = new float[list.size()];for (int i = 0; i < arr.length; i++) {arr[i] = list.get(i).floatValue();}return arr;} else {throw new RuntimeException("Unexpected type for embedding: " + (embeddingObj != null ? embeddingObj.getClass().getName() : "null"));}} catch (Exception e) {throw new RuntimeException("Failed to get embedding", e);}}
5、最终实现RAG知识库功能
最后就是当用户发起提问的时候首先得去数据库中检索是否有相似的内容片段 如果有的话需要将向量匹配到的内容的原文拿到之后再给模型发起提问的时候带上这些上下文的 Prompt然后让模型根据这部分内容去增强生成 这既是检索增强生成 RAG。下面是代码示例如下:
这里调用的模型的chat接口返回的数据格式是流式回复所以我这里月使用Flux格式返给前端 让前端生成回复的时候不需要等待太久用户体验更好一点。
/*** 流式回复* @param message* @return*/@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)public Flux<StreamResponse> streamResponse(@RequestParam String message) {return aiService.getStreamResponse(message).timeout(Duration.ofMinutes(10)).doFinally((SignalType signal) -> {if (signal == SignalType.CANCEL) {logger.info("客户端中断了流式连接");}});}
public Flux<StreamResponse> getStreamResponse(String message) {return Flux.defer(() -> {try {// 1. 检索相关上下文List<DocumentChunk> chunks = retrievalService.retrieveRelevantChunks(message, 3);StringBuilder contextBuilder = new StringBuilder();for (DocumentChunk chunk : chunks) {contextBuilder.append(chunk.getContent()).append("\n\n");}// 2. 构建带上下文的 PromptString prompt = String.format("请基于以下上下文回答问题:\n\n%s\n\n问题:%s", contextBuilder.toString(), message);// 3. 构建请求体Map<String, Object> requestBody = new HashMap<>();requestBody.put("model", OLLAMA_MODEL);requestBody.put("messages", Collections.singletonList(new Message("user", prompt)));requestBody.put("stream", true);requestBody.put("options", new Options());// 4. 发送流式请求return webClient.post().uri(OLLAMA_API_URL).contentType(MediaType.APPLICATION_JSON).bodyValue(requestBody).retrieve().bodyToFlux(String.class).map(this::parseChunk).doOnSubscribe(sub -> log.debug("建立连接成功")).doOnNext(response -> log.trace("收到分块数据:{}", response)).doOnError(e -> log.error("流式处理异常:", e)).onErrorResume(e -> {log.error("流式请求失败", e);return Flux.error(new PromptException("AI服务暂时不可用"));});} catch (Exception e) {return Flux.error(new PromptException("文档检索失败: " + e.getMessage()));}})
}// 处理每部分分块数据private StreamResponse parseChunk(String chunk) {try {JsonNode node = new ObjectMapper().readTree(chunk);StreamResponse response = new StreamResponse();response.setResponse( node.path("message").path("content").asText());response.setDone(node.path("done").asBoolean());return response;} catch (Exception e) {return new StreamResponse("解析错误", true);}}
@Service
public class RetrievalServiceImpl implements RetrievalService {@Resourceprivate EmbeddingService embeddingService;@Resourceprivate DocumentChunkRepository documentChunkRepository;/*** 检索最相关的文档片段* @param query 检索内容* @param topK 返回最相似的数量* @return*/@Overridepublic List<DocumentChunk> retrieveRelevantChunks(String query, int topK) {// 1. 获取嵌入向量float[] queryVector = embeddingService.getEmbedding(query);// 2. 将 float[] 转换为 PostgreSQL 可识别的 vector 字符串格式 "[v1,v2,v3]"StringBuilder sb = new StringBuilder("[");for (int i = 0; i < queryVector.length; i++) {sb.append(queryVector[i]);if (i < queryVector.length - 1) {sb.append(",");}}sb.append("]");String vectorAsString = sb.toString();// 3. 使用字符串形式传参,避免 Hibernate 自动转成 byteareturn documentChunkRepository.findSimilarChunks(vectorAsString, topK);}
}
这个demo测试已经写了很久了 有很多细节在文章的时候有点想不起来了 后期还会继续优化补充!!!!