用大模型（qwen）提取知识三元组并构建可视化知识图谱：从文本到图谱的完整实现

引言

知识图谱作为一种结构化的知识表示方式，在智能问答、推荐系统、数据分析等领域有着广泛应用。在信息爆炸的时代，如何从非结构化文本中提取有价值的知识并进行结构化展示，是NLP领域的重要任务。知识三元组（Subject-Relation-Object）是知识图谱的基本组成单元，通过大模型强大的语义理解能力，我们可以自动化提取这些三元组，并构建可交互的知识图谱可视化界面。本文将介绍一个基于大模型的知识图谱构建工具，它能从文本中自动提取知识三元组（主体-关系-客体），并通过可视化工具生成交互式知识图谱。

这是运行结果获得，如下图示：

在这里插入图片描述

一、核心依赖库

在开始之前，确保已安装以下依赖库：

pip install networkx pyvis  # 知识图谱构建与可视化
# 其他基础库：json, re, os（通常Python环境自带）

而至于大模型环境，我们不在给出。

二、代码整体结构解析

整个项目代码主要包含四个核心模块，形成"文本输入→三元组提取→图谱构建→可视化输出"的完整流程：

# 核心模块关系
文本输入 → extract_triples() → 知识三元组 → build_knowledge_graph() → 图谱数据 → visualize_knowledge_graph() → 可视化HTML

下面我们逐个解析关键模块的实现逻辑。

1. 大模型调用与三元组提取（extract_triples函数）

该函数是整个流程的核心，负责调用大模型从文本中提取知识三元组。其关键实现思路如下：

大模型提示词设计

为了让大模型精准输出符合要求的三元组，我们设计了严格的系统提示词（System Prompt）：

system_prompt = """你是专业知识三元组提取器，严格按以下规则输出：
1. 仅从文本提取(主体, 关系, 客体)三元组，忽略无关信息。
2. 必须用JSON数组格式返回，每个元素含"subject"、"relation"、"object"字段。
3. 输出仅保留JSON数组，不要任何解释、说明、代码块标记。
4. 确保JSON格式正确：引号用双引号，逗号分隔，无多余逗号。
"""

提示词明确了输出格式要求，这是后续解析三元组的基础。

流式响应处理

大模型通常采用流式输出方式返回结果，我们需要持续接收并拼接响应内容：

full_response = ""
for chunk in stream_invoke(ll_model, messages):full_response += str(chunk)print(f"\r已接收 {len(full_response)} 字符...", end="")

这种处理方式能实时反馈进度，提升用户体验。

格式修复机制

大模型输出可能存在格式问题（如引号不规范、多余逗号等），因此需要异常处理和格式修复：

try:return json.loads(full_response)
except json.JSONDecodeError:# 尝试提取JSON结构并修复json_match = re.search(r'\[.*\]', full_response, re.DOTALL)if json_match:cleaned_response = json_match.group()cleaned_response = cleaned_response.replace("'", '"')  # 单引号转双引号cleaned_response = re.sub(r',\s*]', ']', cleaned_response)  # 移除末尾多余逗号try:return json.loads(cleaned_response)except json.JSONDecodeError as e:print(f"修复后仍解析失败：{e}")return []

这一机制大幅提升了代码的健壮性，即使大模型输出格式略有瑕疵也能尝试修复。

2. 知识图谱构建（build_knowledge_graph函数）

提取三元组后，需要将其转换为结构化的知识图谱数据结构：

def build_knowledge_graph(triples):if not triples:return None  # 处理空三元组情况entities = set()# 收集所有实体（主体和客体都是实体）for triple in triples:entities.add(triple["subject"])entities.add(triple["object"])# 构建实体属性字典entity_attributes = {entity: {"name": entity} for entity in entities}# 构建关系列表relations = [{"source": triple["subject"],"target": triple["object"],"type": triple["relation"]} for triple in triples]return {"entities": [{"id": entity, **attrs} for entity, attrs in entity_attributes.items()],"relations": relations}

这个函数的核心逻辑是：

从三元组中提取所有唯一实体（去重）
为每个实体创建基础属性（目前包含名称）
将三元组转换为"源节点-目标节点-关系类型"的边结构
最终返回包含实体和关系的图谱字典

3. 知识图谱可视化（visualize_knowledge_graph函数）

可视化是知识图谱的重要展示方式，本项目使用pyvis库生成交互式HTML图谱：

可视化配置与节点边添加

# 初始化有向图
net = Network(directed=True, height="700px", width="100%", bgcolor="#f5f5f5", font_color="black",notebook=False  # 关键配置：非Notebook环境
)# 添加节点
for entity in graph["entities"]:net.add_node(entity["id"],label=entity["name"],title=f"实体: {entity['name']}",color="#4CAF50"  # 绿色节点)# 添加边（关系）
for relation in graph["relations"]:net.add_edge(relation["source"],relation["target"],label=relation["type"],title=relation["type"],color="#FF9800"  # 橙色边)

这里的关键配置是notebook=False，解决了非Jupyter环境下的模板渲染错误问题。

布局与交互配置

通过JSON配置定义图谱的视觉样式和交互行为：

net.set_options("""
{"nodes": {"size": 30,"font": {"size": 14}},"edges": {"font": {"size": 12},"length": 200},"interaction": {"dragNodes": true,  # 允许拖拽节点"zoomView": true,   # 允许缩放"dragView": true    # 允许拖拽视图}
}
""")

这些配置确保生成的图谱具有良好的可读性和交互性。

容错机制与备选方案

为应对HTML生成失败的情况，代码设计了备选可视化方案：

try:net.write_html(output_file, open_browser=False)
except Exception as e:# 备选方案：使用matplotlib生成静态PNGimport matplotlib.pyplot as pltplt.figure(figsize=(12, 8))pos = nx.spring_layout(nx.DiGraph([(r["source"], r["target"]) for r in graph["relations"]]))nx.draw_networkx_nodes(pos, node_size=3000, node_color="#4CAF50")nx.draw_networkx_labels(pos, labels={e["id"]: e["name"] for e in graph["entities"]})nx.draw_networkx_edges(pos, edgelist=[(r["source"], r["target"]) for r in graph["relations"]], arrowstyle="->")nx.draw_networkx_edge_labels(pos, edge_labels={(r["source"], r["target"]): r["type"] for r in graph["relations"]})plt.savefig(output_file.replace(".html", ".png"))

这种双重保障机制确保即使pyvis出现问题，也能获得基础的可视化结果。

4. 主流程控制（process_text_to_graph函数）

该函数整合了前面的所有模块，形成完整的"文本→三元组→图谱→可视化"流程：

def process_text_to_graph(text):print("正在从文本中提取知识三元组...")triples = extract_triples(text)if not triples:print("未能提取到任何知识三元组")return Noneprint(f"成功提取 {len(triples)} 个知识三元组：")for i, triple in enumerate(triples, 1):print(f"{i}. ({triple['subject']}, {triple['relation']}, {triple['object']})")print("\n正在构建知识图谱...")graph = build_knowledge_graph(triples)if not graph:print("构建知识图谱失败")return Noneprint("\n正在生成知识图谱可视化...")output_file = visualize_knowledge_graph(graph)return output_file

流程清晰，包含了必要的日志输出和异常判断，方便用户跟踪进度和排查问题。

三、使用方法与示例

运行示例

if __name__ == "__main__":sample_text = """爱因斯坦是一位著名的物理学家，他出生于德国。1905年，爱因斯坦提出了相对论。相对论彻底改变了人们对时间和空间的理解。爱因斯坦因光电效应获得了1921年诺贝尔物理学奖。他后来移居美国，并在普林斯顿大学工作。爱因斯坦与玻尔就量子力学的解释有过著名的争论。"""process_text_to_graph(sample_text)

输出结果

运行后会得到以下输出：

正在从文本中提取知识三元组...
正在接收大模型流式响应...
已接收 236 字符...
流式响应接收完成，开始解析...
成功提取 6 个知识三元组：
1. (爱因斯坦, 是, 物理学家)
2. (爱因斯坦, 出生于, 德国)
3. (爱因斯坦, 提出, 相对论)
4. (相对论, 改变, 人们对时间和空间的理解)
5. (爱因斯坦, 获得, 1921年诺贝尔物理学奖)
6. (爱因斯坦, 工作于, 普林斯顿大学)正在构建知识图谱...
正在生成知识图谱可视化...
知识图谱已保存至 /path/to/knowledge_graph.html

打开生成的knowledge_graph.html文件，可看到交互式知识图谱，支持节点拖拽、缩放和平移操作。

代码运行图示：
在这里插入图片描述

四、完整代码

运行知识图谱完整代码，该代码需要调用大模型构建的代码。我只是作为列子给出知识图谱的prompt方法。你可以根据graphrag等方式来提取知识图谱或更专业的方式来提取。

大模型调用完整代码

from langchain_openai import ChatOpenAI
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))# 给出大语言模型默认参数字典的导入内容
llm_config = {"deepseek_1.5b": {"model_name": "deepseek-r1:1.5b","api_url": "http://162.130.245.26:9542/v1","api_key": "sk-RJaJE4fXaktHAI2MB295F6Ad58004feBcE25B83CdD6F0","embedding_ctx_length": 8191,"chunk_size": 1000,"max_retries": 2,"timeout": None,  # 请求超时时间，默认为 None"default_headers": None,  # 默认请求头"default_query": None,  # 默认查询参数"retry_min_seconds": 4,"retry_max_seconds": 20,},"deepseek_14b": {"model_name": "deepseek-r1:14b","api_url": "http://162.130.245.26:9542/v1","api_key": "sk-RJaJE4fXaktHAI2MB295F6Ad58004feBcE25B83CdD6F0","embedding_ctx_length": 8191,"chunk_size": 1000,"max_retries": 2,"timeout": None,  # 请求超时时间，默认为 None"default_headers": None,  # 默认请求头"default_query": None,  # 默认查询参数"retry_min_seconds": 4,"retry_max_seconds": 20,},"deepseek_32b": {"model_name": "deepseek-r1:32b","api_url": "http://162.130.245.26:9542/v1","api_key": "sk-RJaJE4fXaktHAI2MB295F6Ad58004feBcE25B83CdD6F0","embedding_ctx_length": 8191,"chunk_size": 1000,"max_retries": 2,"timeout": None,  # 请求超时时间，默认为 None"default_headers": None,  # 默认请求头"default_query": None,  # 默认查询参数"retry_min_seconds": 4,"retry_max_seconds": 20,},"qwen3_14b": {"model_name": "qwen3:14b","api_url": "http://192.145.216.20:7542/v1","api_key": "sk-RJaJE4fXaktHAI2M295F6Ad58004f7eBcE25B863CdD6F0","embedding_ctx_length": 8191,"chunk_size": 1000,"max_retries": 2,"timeout": None,  # 请求超时时间，默认为 None"default_headers": None,  # 默认请求头"default_query": None,  # 默认查询参数"retry_min_seconds": 4,"retry_max_seconds": 20,},"qwen3_32b": {"model_name": "qwen3:32b","api_url": "http://192.145.216.20:7542/v1","api_key": "sk-RJaJE4fXaktHAI2MB295F6d58004f7eBcE255B863CdD6F0","embedding_ctx_length": 8191,"chunk_size": 1000,"max_retries": 2,"timeout": 60,  # 请求超时时间，默认为 None"default_headers": None,  # 默认请求头"default_query": None,  # 默认查询参数"retry_min_seconds": 4,"retry_max_seconds": 20,},}def stream_invoke(llm_model,prompt):"""prompt可以做成2种方式，方式一：from langchain.schema import HumanMessagemessages = [HumanMessage(content=prompt)]方式二：{"role": "user", "content": question}"""full_response = ""results = llm_model.stream(prompt)for chunk in results:print(chunk.content, end="", flush=True)  # 逐块输出full_response += chunk.contentreturn full_responsedef invoke( llm_model,prompt):"""调用模型生成响应。:param prompt: 输入的提示文本:return: 模型生成的响应内容"""response = llm_model.invoke(prompt)print(response)return response.content
def build_model(mode="deepseek_32b"):config = llm_config[mode]model_name = config["model_name"]api_key = config["api_key"]    api_url = config["api_url"]LLM = ChatOpenAI(model=model_name,openai_api_key=api_key,openai_api_base=api_url)return LLMdef remove_think(answer, split_token='</think>'):"""处理模型响应，分离 think 内容和实际回答。:param answer: 模型的完整响应:param split_token: 分隔符，默认为 </think>:return: 实际回答和 think 内容"""parts = answer.split(split_token)content = parts[-1].lstrip("\n")think_content = None if len(parts) <= 1 else parts[0]return contentif __name__ == "__main__":llm_model = build_model(mode="qwen3_14b")# print(llm_model)stream_invoke(llm_model,"解释大语言模型LLM")

知识图谱提取完整代码

from Models.LLM_Models import build_model, stream_invoke
import networkx as nx
from pyvis.network import Network
import json
import re
import os  # 新增：用于处理文件路径# 初始化大模型
ll_model = build_model()def extract_triples(text):"""使用stream_invoke从文本中提取知识三元组"""system_prompt = """你是专业知识三元组提取器，严格按以下规则输出：1. 仅从文本提取(主体, 关系, 客体)三元组，忽略无关信息。2. 必须用JSON数组格式返回，每个元素含"subject"、"relation"、"object"字段。3. 输出仅保留JSON数组，** 不要任何解释、说明、代码块标记（如```json）**。4. 确保JSON格式正确：引号用双引号，逗号分隔，无多余逗号。示例输出：[{"subject":"爱因斯坦","relation":"是","object":"物理学家"},{"subject":"爱因斯坦","relation":"提出","object":"相对论"}]"""user_input = f"从以下文本提取三元组，严格按示例格式输出：\n{text}"messages = [{"role": "system", "content": system_prompt},{"role": "user", "content": user_input}]# 接收流式响应print("正在接收大模型流式响应...")full_response = ""for chunk in stream_invoke(ll_model, messages):# 根据实际返回格式调整，有些stream_invoke可能需要chunk["content"]full_response += str(chunk)print(f"\r已接收 {len(full_response)} 字符...", end="")print("\n流式响应接收完成，开始解析...")full_response = full_response.strip()# 格式修复try:return json.loads(full_response)except json.JSONDecodeError:print("首次解析失败，尝试修复格式...")json_match = re.search(r'\[.*\]', full_response, re.DOTALL)if json_match:cleaned_response = json_match.group()cleaned_response = cleaned_response.replace("'", '"')cleaned_response = re.sub(r',\s*]', ']', cleaned_response)try:return json.loads(cleaned_response)except json.JSONDecodeError as e:print(f"修复后仍解析失败：{e}")return []else:print("未找到有效JSON结构")return []def build_knowledge_graph(triples):"""构建知识图谱数据结构"""if not triples:return None  # 新增：处理空三元组情况entities = set()for triple in triples:entities.add(triple["subject"])entities.add(triple["object"])entity_attributes = {entity: {"name": entity} for entity in entities}relations = [{"source": triple["subject"],"target": triple["object"],"type": triple["relation"]} for triple in triples]return {"entities": [{"id": entity, **attrs} for entity, attrs in entity_attributes.items()],"relations": relations}def visualize_knowledge_graph(graph, output_file="knowledge_graph.html"):"""修复可视化函数，解决模板渲染错误"""if not graph:print("无法可视化空图谱")return None# 确保输出目录存在output_dir = os.path.dirname(output_file)if output_dir and not os.path.exists(output_dir):os.makedirs(output_dir, exist_ok=True)# 初始化图时指定notebook=False（关键修复）net = Network(directed=True, height="700px", width="100%", bgcolor="#f5f5f5", font_color="black",notebook=False  # 新增：明确指定非 notebook 环境)# 添加节点和边for entity in graph["entities"]:net.add_node(entity["id"],label=entity["name"],title=f"实体: {entity['name']}",color="#4CAF50")for relation in graph["relations"]:net.add_edge(relation["source"],relation["target"],label=relation["type"],title=relation["type"],color="#FF9800")# 简化配置选项，避免复杂JSON解析问题net.set_options("""{"nodes": {"size": 30,"font": {"size": 14}},"edges": {"font": {"size": 12},"length": 200},"interaction": {"dragNodes": true,"zoomView": true,"dragView": true}}""")# 直接使用write_html方法，避免show()的复杂逻辑try:net.write_html(output_file, open_browser=False)print(f"知识图谱已保存至 {os.path.abspath(output_file)}")return output_fileexcept Exception as e:print(f"生成HTML时出错: {e}")# 尝试备选方案：使用networkx的基本可视化import matplotlib.pyplot as pltplt.figure(figsize=(12, 8))pos = nx.spring_layout(nx.DiGraph([(r["source"], r["target"]) for r in graph["relations"]]))nx.draw_networkx_nodes(pos, node_size=3000, node_color="#4CAF50")nx.draw_networkx_labels(pos, labels={e["id"]: e["name"] for e in graph["entities"]})nx.draw_networkx_edges(pos, edgelist=[(r["source"], r["target"]) for r in graph["relations"]], arrowstyle="->")nx.draw_networkx_edge_labels(pos, edge_labels={(r["source"], r["target"]): r["type"] for r in graph["relations"]})plt.savefig(output_file.replace(".html", ".png"))print(f"已生成PNG备选可视化: {output_file.replace('.html', '.png')}")return output_file.replace(".html", ".png")def process_text_to_graph(text):"""端到端处理流程"""print("正在从文本中提取知识三元组...")triples = extract_triples(text)if not triples:print("未能提取到任何知识三元组")return Noneprint(f"成功提取 {len(triples)} 个知识三元组：")for i, triple in enumerate(triples, 1):print(f"{i}. ({triple['subject']}, {triple['relation']}, {triple['object']})")print("\n正在构建知识图谱...")graph = build_knowledge_graph(triples)if not graph:print("构建知识图谱失败")return Noneprint("\n正在生成知识图谱可视化...")output_file = visualize_knowledge_graph(graph)return output_file# 示例用法
if __name__ == "__main__":sample_text = """爱因斯坦是一位著名的物理学家，他出生于德国。1905年，爱因斯坦提出了相对论。相对论彻底改变了人们对时间和空间的理解。爱因斯坦因光电效应获得了1921年诺贝尔物理学奖。他后来移居美国，并在普林斯顿大学工作。爱因斯坦与玻尔就量子力学的解释有过著名的争论。"""process_text_to_graph(sample_text)