支持OCR和AI解释的Web PDF阅读器:解决大文档阅读难题

支持OCR和AI解释的Web PDF阅读器:解决大文档阅读难题

    • 一、背景:为什么需要这个工具?
      • 问题场景
      • 解决方案
    • 二、技术原理:如何实现这些功能?
      • 1、核心技术组件
      • 2、工作流程
      • 3、关键点
    • 三、操作指南
      • 1、环境准备
      • 2、生成Html代码
      • 3、Web服务端
      • 4、启动服务端
    • 四、效果

一、背景:为什么需要这个工具?

问题场景

当你在手机上阅读扫描版PDF文档(特别是超长文档如2000页的书籍)时,是否遇到过这些问题:

  1. 翻页卡顿:越往后翻页,加载速度越慢
  2. 文字识别失败:尝试复制文字时,OCR识别经常失败或需要长时间等待
  3. 内容理解困难:专业术语或复杂段落难以理解,需要额外查询

技术解释:扫描版PDF本质上是图片合集,手机自带的OCR功能对长文档处理能力有限,特别是:

  • 内存限制导致大文档处理困难
  • 后台进程被系统强制终止
  • 缺乏持续优化的大文档处理机制

解决方案

为此我开发了这款Web版PDF阅读器,核心功能包括:

  • 区域选择识别:自由框选文档任意区域进行OCR
  • 文字即时编辑:直接修改识别结果
  • AI智能解释:一键获取复杂内容的通俗解释
  • 跨平台使用:在电脑/手机浏览器中都能流畅运行

设计理念:将OCR和AI能力转移到服务器端处理,突破移动设备性能限制,同时通过Web技术实现免安装使用


二、技术原理:如何实现这些功能?

1、核心技术组件

组件功能使用技术
前端界面PDF渲染/用户交互PDF.js + HTML5 Canvas
OCR引擎图片转文字百度文字识别API
AI解释引擎文本内容解释DeepSeek LLM大模型
服务端功能调度Python Flask框架

2、工作流程

用户选择PDF
前端渲染
框选区域
发送到服务端
OCR识别
返回识别文字
编辑文本
请求AI解释
返回解释结果

3、关键点

  1. 智能区域选择

    • 自动适配不同分辨率设备
    • 支持触摸屏手势操作
    • 实时显示选择框效果
  2. 阅读记忆功能

    • 自动记录上次阅读位置
    • 本地存储阅读进度
    • 翻页进度可视化展示

三、操作指南

1、环境准备

cat > .env <<-'EOF'
APP_ID = '您的百度APPID'
API_KEY = '您的百度APIKEY'
SECRET_KEY = '您的百度SECRETKEY'
OPENAI_API_KEY = "您的DeepSeek密钥"
OPENAI_BASE_URL = "https://api.deepseek.com"
EOF

注意

  1. 百度OCR服务需在AI开放平台申请
  2. DeepSeek API可在官网获取

2、生成Html代码

mkdir templates
cd templates
cat > index.html <<-'EOF'
<!DOCTYPE html>
<html lang="zh-CN">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>本地化PDF阅读器 - OCR识别与文本解释</title><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css"><style>* {margin: 0;padding: 0;box-sizing: border-box;font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;touch-action: manipulation;}body {background: linear-gradient(135deg, #1a2a6c, #2a5298);min-height: 100vh;padding: 15px;color: #333;display: flex;flex-direction: column;align-items: center;overflow-x: hidden;}.container {width: 100%;max-width: 100%;background: white;border-radius: 12px;box-shadow: 0 10px 25px rgba(0, 0, 0, 0.35);overflow: hidden;display: flex;flex-direction: column;height: calc(100vh - 30px);}header {background: linear-gradient(to right, #2c3e50, #4a6491);color: white;padding: 15px 25px;display: flex;align-items: center;justify-content: space-between;}.logo {display: flex;align-items: center;gap: 12px;}.logo i {font-size: 30px;color: #4dabf7;animation: pulse 2s infinite;}@keyframes pulse {0%, 100% { transform: scale(1); }50% { transform: scale(1.1); }}.logo h1 {font-size: 24px;font-weight: 600;text-shadow: 1px 1px 3px rgba(0,0,0,0.3);}/* 修改开始:移除固定宽度,使用弹性布局 */.controls {display: flex;padding: 12px 15px;background: #f1f3f5;gap: 12px;border-bottom: 1px solid #dee2e6;align-items: center;width: 100%;overflow-x: auto;overflow-y: hidden;flex-wrap: nowrap;}/* 修改结束 */.file-controls, .progress-container {display: flex;align-items: center;gap: 10px;flex-shrink: 0;}.file-controls {flex: 1;min-width: 300px;}.progress-container {flex: 2;min-width: 400px;}button {padding: 9px 16px;border: none;border-radius: 6px;cursor: pointer;font-weight: 500;transition: all 0.2s ease;display: flex;align-items: center;gap: 6px;background: #339af0;color: white;box-shadow: 0 3px 5px rgba(0,0,0,0.1);flex-shrink: 0;}button:hover {background: #228be6;transform: translateY(-2px);box-shadow: 0 5px 10px rgba(0,0,0,0.15);}button:active {transform: translateY(1px);}button:disabled {background: #adb5bd;cursor: not-allowed;transform: none;box-shadow: none;}button i {font-size: 15px;}.page-info {font-weight: 500;background: #fff;padding: 7px 12px;border-radius: 6px;box-shadow: 0 2px 4px rgba(0,0,0,0.08);min-width: 110px;text-align: center;flex-shrink: 0;}.progress-bar {flex: 1;height: 8px;background: #e9ecef;border-radius: 4px;position: relative;overflow: hidden;box-shadow: inset 0 1px 2px rgba(0,0,0,0.1);}.progress-fill {height: 100%;background: linear-gradient(90deg, #4dabf7, #40c057);border-radius: 4px;width: 0%;transition: width 0.3s ease;}input[type="range"] {width: 100%;height: 8px;-webkit-appearance: none;background: transparent;flex: 1;}input[type="range"]::-webkit-slider-thumb {-webkit-appearance: none;width: 18px;height: 18px;border-radius: 50%;background: #339af0;cursor: pointer;box-shadow: 0 2px 6px rgba(0,0,0,0.25);border: 2px solid white;}.viewer-container {position: relative;flex: 1;background: #2c3e50;overflow: hidden;display: flex;justify-content: center;align-items: center;}#pdf-viewer {width: 100%;height: 100%;display: flex;justify-content: center;align-items: center;padding: 8px;overflow: auto;}.canvas-container {position: relative;display: flex;justify-content: center;align-items: center;margin: 0;box-shadow: 0 6px 15px rgba(0, 0, 0, 0.45);border: 1px solid #dee2e6;transition: transform 0.3s ease;max-width: 100%;max-height: 100%;overflow: hidden;}.canvas-container canvas {display: block;cursor: pointer;max-width: 100%;max-height: 100%;touch-action: none;}#selection-overlay {position: absolute;top: 0;left: 0;cursor: crosshair;border: 2px dashed rgba(77, 171, 247, 0.9);background: rgba(77, 171, 247, 0.2);pointer-events: none;z-index: 10;}.status-bar {background: #3d5a80;color: white;padding: 8px 15px;display: flex;justify-content: space-between;font-size: 13px;font-weight: 300;}.loading-overlay {position: absolute;top: 0;left: 0;width: 100%;height: 100%;background: rgba(0, 0, 0, 0.85);display: flex;flex-direction: column;justify-content: center;align-items: center;color: white;z-index: 100;}.spinner {width: 50px;height: 50px;border: 4px solid rgba(255, 255, 255, 0.3);border-radius: 50%;border-top: 4px solid #4dabf7;animation: spin 1s linear infinite;margin-bottom: 15px;}@keyframes spin {0% { transform: rotate(0deg); }100% { transform: rotate(360deg); }}.modal {position: fixed;top: 0;left: 0;width: 100%;height: 100%;background: rgba(0, 0, 0, 0.7);display: flex;justify-content: center;align-items: center;z-index: 1000;opacity: 0;visibility: hidden;transition: all 0.3s ease;}.modal.active {opacity: 1;visibility: visible;}.modal-content {background: white;border-radius: 10px;width: 85%;max-width: 550px;max-height: 85vh;overflow: hidden;box-shadow: 0 12px 35px rgba(0, 0, 0, 0.4);transform: translateY(-15px);transition: transform 0.3s ease;}.modal.active .modal-content {transform: translateY(0);}.modal-header {padding: 16px;background: linear-gradient(to right, #3d5a80, #4dabf7);color: white;display: flex;justify-content: space-between;align-items: center;}.modal-header h3 {font-size: 20px;font-weight: 600;}.close-btn {background: none;border: none;color: white;font-size: 22px;cursor: pointer;width: 32px;height: 32px;border-radius: 50%;display: flex;align-items: center;justify-content: center;transition: all 0.3s ease;}.close-btn:hover {background: rgba(255,255,255,0.2);}.modal-body {padding: 20px;overflow-y: auto;max-height: 55vh;}.modal-footer {padding: 16px;display: flex;justify-content: flex-end;gap: 12px;background: #f8f9fa;border-top: 1px solid #e9ecef;}.btn-secondary {background: #adb5bd;color: white;}.btn-primary {background: #339af0;color: white;}#ocr-text {width: 100%;min-height: 130px;padding: 12px;border: 1px solid #dee2e6;border-radius: 6px;font-size: 15px;line-height: 1.5;resize: vertical;margin-bottom: 15px;background: #f8f9fa;transition: border-color 0.3s;}#ocr-text:focus {border-color: #4dabf7;outline: none;box-shadow: 0 0 0 3px rgba(77, 171, 247, 0.2);}#deepseek-response {background: #f1f3f5;border-radius: 6px;border: 1px solid #e9ecef;padding: 16px;font-size: 14px;line-height: 1.5;max-height: 180px;overflow-y: auto;transition: all 0.3s ease;}.hidden {display: none;}.api-response {padding: 12px;background: #e7f5ff;border-left: 4px solid #4dabf7;border-radius: 4px;margin: 12px 0;animation: fadeIn 0.4s ease;}@keyframes fadeIn {from { opacity: 0; transform: translateY(8px); }to { opacity: 1; transform: translateY(0); }}.ocr-hint {text-align: center;color: #5c7cfa;font-style: italic;margin-top: 8px;padding: 8px;background: #f1f3f5;border-radius: 6px;margin-bottom: 12px;}.error-message {background: #ffe3e3;border: 1px solid #ff6b6b;border-radius: 8px;padding: 12px;margin: 0 auto 15px;text-align: center;max-width: 600px;display: none;}.api-status {display: flex;align-items: center;gap: 6px;margin-top: 8px;font-size: 13px;color: #495057;}.response-header {display: flex;justify-content: space-between;align-items: center;margin-bottom: 8px;}.api-tag {background: #4dabf7;color: white;padding: 3px 8px;border-radius: 4px;font-size: 11px;font-weight: bold;}.api-time {color: #868e96;font-size: 11px;}@media (max-width: 1024px) {.file-controls {min-width: 250px;}.progress-container {min-width: 350px;}}@media (max-width: 900px) {.controls {flex-wrap: wrap;padding: 10px;}.file-controls, .progress-container {min-width: 100%;}.progress-container {margin-top: 10px;}}@media (max-width: 768px) {body {padding: 10px;}.container {height: calc(100vh - 20px);}.logo h1 {font-size: 18px;}.status-bar {flex-direction: column;gap: 6px;text-align: center;}.modal-content {width: 95%;}button {padding: 10px;font-size: 14px;}.modal-footer {flex-wrap: wrap;justify-content: center;}.modal-footer button {flex: 1;min-width: 45%;margin-bottom: 8px;}.file-controls {gap: 6px;min-width: 100%;}.file-controls button {flex: 1;}}@media (max-width: 480px) {.page-info {min-width: auto;padding: 5px 8px;}.file-controls button span {display: none;}.file-controls button i {margin-right: 0;}}</style>
</head>
<body>    <div class="error-message" id="error-message"><i class="fas fa-exclamation-triangle"></i><span id="error-text">发生了错误,请查看控制台获取详细信息</span></div><div class="container">        <div class="controls"><div class="file-controls"><button id="open-file"><i class="fas fa-folder-open"></i> 打开PDF</button><button id="prev-page"><i class="fas fa-arrow-left"></i> 上一页</button><button id="next-page"><i class="fas fa-arrow-right"></i> 下一页</button></div><div class="progress-container"><div class="page-info">页码: <span id="current-page">1</span> / <span id="total-pages">1</span></div><div class="progress-bar"><div class="progress-fill"></div></div><input type="range" id="page-slider" min="1" max="1" value="1"></div></div><div class="viewer-container"><div id="pdf-viewer"></div><div id="selection-overlay" class="hidden"></div><div id="loading-overlay" class="loading-overlay hidden"><div class="spinner"></div><p id="loading-text">加载中...</p></div></div><div class="status-bar"><div>状态: <span id="ocr-status">准备就绪</span></div></div></div><!-- OCR模态框 --><div class="modal" id="ocr-modal"><div class="modal-content"><div class="modal-header"><h3><i class="fas fa-font"></i> OCR识别结果</h3><button class="close-btn" id="close-ocr-modal">&times;</button></div><div class="modal-body"><div class="ocr-hint"><i class="fas fa-lightbulb"></i> 您选择了以下内容(可进行编辑):</div><textarea id="ocr-text" placeholder="识别内容将显示在这里..."></textarea><div id="api-response-section" class="hidden"><div class="response-header"><p><strong><i class="fas fa-robot"></i> AI 响应:</strong></p><div class="api-time" id="api-time"></div></div><div id="deepseek-response">等待AI的回复...</div></div></div><div class="modal-footer"><button class="btn-secondary" id="copy-text"><i class="fas fa-copy"></i> 复制</button><button class="btn-primary" id="explain-text"><i class="fas fa-robot"></i> 解释</button></div></div></div><!-- 使用本地文件 --><script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.min.js"></script><script>// 设置PDF.js工作环境pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.16.105/pdf.worker.min.js';// 常量const STORAGE_PREFIX = 'pdfReader_';// DOM元素const viewer = document.getElementById('pdf-viewer');const fileInput = document.createElement('input');fileInput.type = 'file';fileInput.accept = '.pdf';const openFileButton = document.getElementById('open-file');const prevPageButton = document.getElementById('prev-page');const nextPageButton = document.getElementById('next-page');const currentPageElement = document.getElementById('current-page');const totalPagesElement = document.getElementById('total-pages');const pageSlider = document.getElementById('page-slider');const progressFill = document.querySelector('.progress-fill');const loadingOverlay = document.getElementById('loading-overlay');const loadingText = document.getElementById('loading-text');const ocrStatus = document.getElementById('ocr-status');const ocrModal = document.getElementById('ocr-modal');const closeOcrModal = document.getElementById('close-ocr-modal');const ocrText = document.getElementById('ocr-text');const copyTextButton = document.getElementById('copy-text');const explainTextButton = document.getElementById('explain-text');const apiResponseSection = document.getElementById('api-response-section');const deepseekResponse = document.getElementById('deepseek-response');const selectionOverlay = document.getElementById('selection-overlay');const errorMessage = document.getElementById('error-message');const errorText = document.getElementById('error-text');const apiTimeElement = document.getElementById('api-time');// 全局变量let pdfDoc = null;let currentPage = 1;let currentScale = 1;let pageRendering = false;let pageNumPending = null;let fileName = null;let fileKey = null;let canvasMap = new Map();let selection = {};let currentCanvas = null;let currentCanvasRect = null;let dpr = window.devicePixelRatio || 1;let isMobile = /Mobi|Android/i.test(navigator.userAgent);let viewerContainer = document.querySelector('.viewer-container');// 初始化openFileButton.addEventListener('click', () => fileInput.click());fileInput.addEventListener('change', loadPDF);prevPageButton.addEventListener('click', () => gotoPage(currentPage - 1));nextPageButton.addEventListener('click', () => gotoPage(currentPage + 1));pageSlider.addEventListener('input', () => gotoPage(parseInt(pageSlider.value)));closeOcrModal.addEventListener('click', closeOCRModal);copyTextButton.addEventListener('click', copyOCRText);explainTextButton.addEventListener('click', explainTextWithAI);// 显示错误信息function showError(message) {errorText.textContent = message;errorMessage.style.display = 'block';console.error(message);}// 隐藏错误信息function hideError() {errorMessage.style.display = 'none';}// 加载PDF文件function loadPDF(e) {const file = e.target.files[0];if (!file) return;if (file.type !== 'application/pdf') {alert('请选择PDF文件');return;}fileName = file.name;fileKey = STORAGE_PREFIX + fileName;showLoading('加载PDF文件...');hideError();const fileReader = new FileReader();fileReader.onload = function() {const typedArray = new Uint8Array(this.result);try {// 加载PDF文档pdfjsLib.getDocument(typedArray).promise.then(function(pdf) {pdfDoc = pdf;const numPages = pdf.numPages;// 显示总页数totalPagesElement.textContent = numPages;pageSlider.max = numPages;// 尝试从本地存储获取阅读位置const lastPage = localStorage.getItem(fileKey + '_page');const initPage = lastPage ? parseInt(lastPage) : 1;// 加载第一页(或上次阅读的页面)gotoPage(initPage);// 清除画布映射canvasMap.clear();// 移除加载状态hideLoading();}).catch(function(error) {hideLoading();showError('加载PDF失败: ' + error.message);});} catch (error) {hideLoading();showError('PDF.js初始化失败: ' + error.message);}};fileReader.onerror = function() {hideLoading();showError('读取文件失败');};fileReader.readAsArrayBuffer(file);}// 渲染指定页码function renderPage(num) {if (!pdfDoc) return;pageRendering = true;showLoading(`渲染第 ${num} 页...`);ocrStatus.textContent = '正在渲染页面...';hideError();try {// 获取页面的promisepdfDoc.getPage(num).then(function(page) {const container = document.createElement('div');container.className = 'canvas-container';// 创建Canvasconst canvas = document.createElement('canvas');const ctx = canvas.getContext('2d', { willReadFrequently: true });// 获取PDF页面原始尺寸const viewport = page.getViewport({ scale: 1 });const originalWidth = viewport.width;const originalHeight = viewport.height;// 计算缩放比例以适应容器const viewerContainer = document.querySelector('.viewer-container');const viewerWidth = viewer.clientWidth - 20; // 减去内边距const viewerHeight = viewer.clientHeight - 20;// 计算合适的缩放比例const widthScale = viewerWidth / originalWidth;const heightScale = viewerHeight / originalHeight;const scale = Math.min(widthScale, heightScale) * currentScale;const scaledViewport = page.getViewport({ scale: scale });// 设置Canvas尺寸(考虑设备像素比)const displayWidth = scaledViewport.width;const displayHeight = scaledViewport.height;const pixelWidth = Math.floor(displayWidth * dpr);const pixelHeight = Math.floor(displayHeight * dpr);canvas.width = pixelWidth;canvas.height = pixelHeight;canvas.style.width = displayWidth + 'px';canvas.style.height = displayHeight + 'px';// 缩放上下文以匹配设备像素比ctx.scale(dpr, dpr);container.appendChild(canvas);// 清空查看器并添加新容器viewer.innerHTML = '';viewer.appendChild(container);// 将Canvas存储在映射中canvasMap.set(num, {canvas: canvas,rect: container.getBoundingClientRect(),viewport: scaledViewport,dpr: dpr});// 设置事件监听器用于OCR选择setupSelectionEvents(container);// 渲染PDF页面到Canvasconst renderContext = {canvasContext: ctx,viewport: scaledViewport};const renderTask = page.render(renderContext);renderTask.promise.then(function() {if (pageNumPending !== null) {gotoPage(pageNumPending);pageNumPending = null;}pageRendering = false;hideLoading();updateStatus(`已渲染第 ${num}`);updateFileInfo();}).catch(function(error) {pageRendering = false;hideLoading();showError('渲染页面失败: ' + error.message);});}).catch(function(error) {hideLoading();showError('获取PDF页面失败: ' + error.message);});} catch (error) {hideLoading();showError('渲染页面时出错: ' + error.message);}}// 设置选择事件(同时支持鼠标和触摸)function setupSelectionEvents(container) {container.addEventListener('mousedown', startSelection);container.addEventListener('touchstart', handleTouchStart, { passive: false });}// 处理触摸开始事件function handleTouchStart(e) {if (e.touches.length === 1) {// 单指触摸,开始选择startSelection(e.touches[0]);}}// 处理触摸移动事件function handleTouchMove(e) {if (e.touches.length === 1) {// 单指移动,调整选择区域resizeSelection(e.touches[0]);}}// 处理触摸结束事件function handleTouchEnd(e) {if (e.touches.length === 0) {// 所有手指离开,结束选择finishSelection();}}// 跳转到指定页面function gotoPage(num) {if (!pdfDoc) return;if (pageRendering) {pageNumPending = num;return;}if (num < 1 || num > pdfDoc.numPages) return;currentPage = num;currentPageElement.textContent = num;pageSlider.value = num;// 更新进度条const percent = Math.round((num / pdfDoc.numPages) * 100);progressFill.style.width = percent + '%';// 保存当前页到本地存储if (fileKey) {localStorage.setItem(fileKey + '_page', num);}// 清空当前查看器内容viewer.innerHTML = '';selectionOverlay.classList.add('hidden');// 渲染该页renderPage(num);updateFileInfo();}// 更新底部状态栏信息function updateFileInfo() {}// 更新OCR状态function updateStatus(message) {ocrStatus.textContent = message;}// 显示加载状态function showLoading(message) {loadingText.textContent = message;loadingOverlay.classList.remove('hidden');}// 隐藏加载状态function hideLoading() {loadingOverlay.classList.add('hidden');}// OCR区域选择function startSelection(e) {e.preventDefault();const container = e.currentTarget;if (!container) return;const canvas = container.querySelector('canvas');if (!canvas) return;// 存储当前canvas和其边界currentCanvas = canvas;currentCanvasRect = container.getBoundingClientRect();// 获取事件坐标const clientX = e.clientX || e.pageX;const clientY = e.clientY || e.pageY;// 计算相对于容器的坐标(考虑滚动位置)const viewerRect = viewer.getBoundingClientRect();const containerRect = container.getBoundingClientRect();// 计算容器在viewer中的位置(考虑滚动)const containerXInViewer = containerRect.left - viewerRect.left + viewer.scrollLeft;const containerYInViewer = containerRect.top - viewerRect.top + viewer.scrollTop;// 计算事件在容器内的坐标const x = clientX - containerRect.left;const y = clientY - containerRect.top;// 初始化选择框位置selectionOverlay.style.width = '0';selectionOverlay.style.height = '0';selectionOverlay.style.left = (containerXInViewer + x) + 'px';selectionOverlay.style.top = (containerYInViewer + y) + 'px';selectionOverlay.classList.remove('hidden');// 存储初始位置(相对于容器)selection = {startX: x,startY: y,endX: x,endY: y};// 添加事件监听if (isMobile) {document.addEventListener('touchmove', handleTouchMove, { passive: false });document.addEventListener('touchend', handleTouchEnd);} else {document.addEventListener('mousemove', resizeSelection);document.addEventListener('mouseup', finishSelection);}}// 调整选择框大小function resizeSelection(e) {const container = document.querySelector('.canvas-container');if (!container) return;// 获取事件坐标const clientX = e.clientX || e.pageX;const clientY = e.clientY || e.pageY;// 获取容器和viewer的边界矩形const viewerRect = viewer.getBoundingClientRect();const containerRect = container.getBoundingClientRect();// 计算容器在viewer中的位置(考虑滚动)const containerXInViewer = containerRect.left - viewerRect.left + viewer.scrollLeft;const containerYInViewer = containerRect.top - viewerRect.top + viewer.scrollTop;// 计算事件在容器内的坐标const x = clientX - containerRect.left;const y = clientY - containerRect.top;// 限制在画布显示范围内const clampedX = Math.max(0, Math.min(x, containerRect.width));const clampedY = Math.max(0, Math.min(y, containerRect.height));// 更新选择框尺寸const left = Math.min(selection.startX, clampedX);const top = Math.min(selection.startY, clampedY);const width = Math.abs(clampedX - selection.startX);const height = Math.abs(clampedY - selection.startY);// 设置选择框在viewer中的位置selectionOverlay.style.left = (containerXInViewer + left) + 'px';selectionOverlay.style.top = (containerYInViewer + top) + 'px';selectionOverlay.style.width = width + 'px';selectionOverlay.style.height = height + 'px';// 更新结束位置selection.endX = clampedX;selection.endY = clampedY;}// 完成选择并进行OCR识别function finishSelection() {// 移除事件监听if (isMobile) {document.removeEventListener('touchmove', handleTouchMove);document.removeEventListener('touchend', handleTouchEnd);} else {document.removeEventListener('mousemove', resizeSelection);document.removeEventListener('mouseup', finishSelection);}// 检查选择区域是否有效const minArea = 20;const width = Math.abs(selection.endX - selection.startX);const height = Math.abs(selection.endY - selection.startY);if (width < minArea || height < minArea) {selectionOverlay.classList.add('hidden');return;}// 获取当前页的Canvasconst container = document.querySelector('.canvas-container');if (!container || !currentCanvas) return;const canvas = currentCanvas;const ctx = canvas.getContext('2d');// 计算画布的实际像素与显示尺寸的比率const scaleX = canvas.width / currentCanvasRect.width;const scaleY = canvas.height / currentCanvasRect.height;// 转换为画布的实际像素坐标const pixelX = selection.startX * scaleX;const pixelY = selection.startY * scaleY;const pixelW = width * scaleX;const pixelH = height * scaleY;try {// 获取图像数据const imageData = ctx.getImageData(Math.round(pixelX), Math.round(pixelY), Math.round(pixelW), Math.round(pixelH));// 创建临时Canvas来存储选择区域的图像const tempCanvas = document.createElement('canvas');tempCanvas.width = Math.round(pixelW);tempCanvas.height = Math.round(pixelH);const tempCtx = tempCanvas.getContext('2d');tempCtx.putImageData(imageData, 0, 0);// 显示OCR模态框ocrModal.classList.add('active');ocrText.value = '';apiResponseSection.classList.add('hidden');deepseekResponse.innerHTML = '等待AI的回复...';updateStatus('准备进行OCR识别...');// 将图像转换为DataURLconst imageDataURL = tempCanvas.toDataURL('image/jpeg');// 发送到Flask服务端进行OCR识别fetch('/ocr', {method: 'POST',headers: {'Content-Type': 'application/json'},body: JSON.stringify({ image: imageDataURL })}).then(response => response.json()).then(data => {if (data.success) {ocrText.value = data.text.trim() || '未能识别到文字';updateStatus('OCR识别完成');} else {throw new Error(data.error || 'OCR识别失败');}}).catch(err => {ocrText.value = 'OCR错误: ' + err.message;updateStatus('OCR识别失败');showError('OCR识别失败: ' + err.message);}).finally(() => {selectionOverlay.classList.add('hidden');});} catch (error) {showError('获取图像数据失败: ' + error.message);selectionOverlay.classList.add('hidden');updateStatus('选择区域错误');}}// 关闭OCR模态框function closeOCRModal() {ocrModal.classList.remove('active');}// 复制识别文本function copyOCRText() {ocrText.select();document.execCommand('copy');alert('文本已复制到剪贴板');}// 使用AI解释文本 - 调用Flask服务function explainTextWithAI() {const text = ocrText.value.trim();if (!text) {alert('请先识别出文本内容');return;}apiResponseSection.classList.remove('hidden');updateStatus('正在使用AI解释文本...');deepseekResponse.innerHTML = '<div class="api-response">正在分析文本内容...</div>';const startTime = new Date();// 调用Flask服务的/explain端点fetch('/explain', {method: 'POST',headers: {'Content-Type': 'application/json'},body: JSON.stringify({ text: text })}).then(response => {if (!response.ok) {throw new Error('服务器错误: ' + response.status);}return response.json();}).then(data => {const endTime = new Date();const timeTaken = (endTime - startTime) / 1000;deepseekResponse.innerHTML = `<div class="api-response"><div class="api-tag">解释结果</div><p>${data.explanation || '未能获取解释内容'}</p><div class="api-status"><i class="fas fa-clock"></i> 本次分析耗时 ${timeTaken.toFixed(2)} 秒</div></div>`;updateStatus('AI解释完成');apiTimeElement.textContent = `处理时间: ${timeTaken.toFixed(2)}`;}).catch(err => {deepseekResponse.innerHTML = `<div class="api-response" style="background:#ffecec;border-left-color:#ff6b6b;"><p>错误: ${err.message}</p><p>请检查服务是否正常运行</p></div>`;updateStatus('AI解释失败');showError('调用解释服务失败: ' + err.message);});}// 显示示例PDF加载window.addEventListener('load', function() {updateStatus('准备就绪 | 请打开PDF文件');});</script>
</body>
</html>
EOF
cd -

3、Web服务端

cat > main.py <<-'EOF'
import os
import base64
import io
import re
import logging
from logging.handlers import RotatingFileHandler
from flask import Flask, render_template, jsonify, request, send_from_directory
from PIL import Image
from aip import AipOcr
from dotenv import load_dotenv
import openai# 加载环境变量
load_dotenv()app = Flask(__name__)# 配置日志系统
def configure_logging():# 创建日志目录log_dir = "logs"if not os.path.exists(log_dir):os.makedirs(log_dir)# 设置日志格式log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s'formatter = logging.Formatter(log_format)# 文件日志处理器(滚动日志,最大10MB,保留3个备份)file_handler = RotatingFileHandler(os.path.join(log_dir, 'app.log'),maxBytes=10*1024*1024,backupCount=3)file_handler.setFormatter(formatter)file_handler.setLevel(logging.DEBUG)# 控制台日志处理器console_handler = logging.StreamHandler()console_handler.setFormatter(formatter)console_handler.setLevel(logging.DEBUG)# 获取应用日志器并添加处理器app.logger.setLevel(logging.DEBUG)app.logger.addHandler(file_handler)app.logger.addHandler(console_handler)# 禁用werkzeug的默认日志处理werkzeug_logger = logging.getLogger('werkzeug')werkzeug_logger.setLevel(logging.ERROR)werkzeug_logger.addHandler(file_handler)configure_logging()class OpenAILLM:"""OpenAI语言模型封装类"""def __init__(self, model_name: str = "deepseek-chat"):self.model_name = model_nameself.client = openai.OpenAI()app.logger.info(f"初始化OpenAI模型: {model_name}")def predict(self, query: str) -> str:"""使用LLM生成解释文本"""try:app.logger.debug(f"LLM查询开始: {query[:100]}... (长度:{len(query)})")response = self.client.chat.completions.create(model=self.model_name,messages=[{"role": "system", "content": "请用简洁且通俗易懂的方式解释下面这句话:"},{"role": "user", "content": query}                ],temperature=0.7,)result = response.choices[0].message.content.strip()cleaned_result = re.sub(r'<think>.*?</think>', '', result, flags=re.DOTALL)app.logger.debug(f"LLM原始响应: {result[:200]}...")app.logger.debug(f"LLM清理后结果: {cleaned_result[:200]}...")return cleaned_resultexcept openai.APIError as api_err:app.logger.error(f"OpenAI API错误: {str(api_err)}", exc_info=True)return "API服务错误,请稍后再试"except openai.APIConnectionError as conn_err:app.logger.error(f"OpenAI连接错误: {str(conn_err)}", exc_info=True)return "网络连接错误,请检查网络"except openai.RateLimitError as limit_err:app.logger.error(f"OpenAI限流错误: {str(limit_err)}", exc_info=True)return "请求过于频繁,请稍后再试"except Exception as e:app.logger.exception("LLM处理未知错误")return "解释生成失败,请稍后再试"# 初始化全局模型实例
llm = OpenAILLM()@app.route('/')
def index():"""主页面路由"""app.logger.info("访问首页")return render_template('index.html')@app.route('/ocr', methods=['POST'])
def ocr_processing():"""OCR文字识别接口"""try:app.logger.info("收到OCR请求")data = request.jsonimage_data = data.get('image', '')# 记录图像数据基本信息app.logger.debug(f"收到图像数据: 长度={len(image_data)} 字符, 类型={type(image_data)}")# 提取Base64编码数据if 'base64,' in image_data:image_data = image_data.split('base64,', 1)[1]app.logger.debug("已剥离Base64前缀")# 解码图像img_bytes = base64.b64decode(image_data)app.logger.debug(f"图像解码成功: {len(img_bytes)} 字节")# 使用百度OCR APIclient = AipOcr(os.getenv('APP_ID'), os.getenv('API_KEY'), os.getenv('SECRET_KEY'))app.logger.info("调用百度OCR API...")result = client.basicAccurate(img_bytes)# 检查OCR结果if 'words_result' not in result:app.logger.warning(f"OCR返回异常结果: {result}")return jsonify(success=False, error="OCR识别失败"), 500text = ' '.join(item['words'] for item in result.get('words_result', []))app.logger.info(f"OCR识别成功: 识别到{len(result['words_result'])}个文本块")app.logger.debug(f"OCR识别结果: {text[:200]}...")return jsonify(success=True, text=text)except base64.binascii.Error as e:app.logger.error(f"Base64解码失败: {str(e)}", exc_info=True)return jsonify(success=False, error="无效的图像数据"), 400except KeyError as e:app.logger.error(f"请求数据缺少必要字段: {str(e)}", exc_info=True)return jsonify(success=False, error="请求数据不完整"), 400except Exception as e:app.logger.exception("OCR处理未知错误")return jsonify(success=False, error="服务器内部错误"), 500@app.route('/explain', methods=['POST'])
def text_explanation():"""文本解释接口"""try:app.logger.info("收到解释请求")data = request.jsontext = data.get('text', '')if not text:app.logger.warning("解释请求缺少文本数据")return jsonify(success=False, error='缺少文本数据'), 400app.logger.debug(f"待解释文本: {text[:200]}... (长度:{len(text)})")explanation = llm.predict(text)app.logger.info("解释生成成功")app.logger.debug(f"完整解释结果: {explanation}")return jsonify(success=True, explanation=explanation)except KeyError as e:app.logger.error(f"请求数据缺少必要字段: {str(e)}", exc_info=True)return jsonify(success=False, error="请求数据不完整"), 400except Exception as e:app.logger.exception("解释生成未知错误")return jsonify(success=False, error="服务器内部错误"), 500if __name__ == '__main__':app.run(debug=os.getenv('DEBUG_MODE', 'False').lower() == 'true')
EOF

4、启动服务端

python main.py

四、效果

请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/pingmian/90293.shtml
繁体地址,请注明出处:http://hk.pswp.cn/pingmian/90293.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

研发过程都有哪些

产品规划与定义 (Product Planning & Definition) 在详细的需求调研之前&#xff0c;通常会进行市场分析、竞品分析、确立产品目标和核心价值。这个阶段决定了“我们要做什么”以及“为什么要做”。 系统设计与架构 (System & Architectural Design) 这是开发的“蓝图”…

旧物回收小程序系统开发——开启绿色生活新篇章

在当今社会&#xff0c;环保已经成为全球关注的焦点话题。随着人们生活水平的提高&#xff0c;消费能力不断增强&#xff0c;各类物品的更新换代速度日益加快&#xff0c;大量旧物被随意丢弃&#xff0c;不仅造成了资源的巨大浪费&#xff0c;还对环境产生了严重的污染。在这样…

UE5 UI 水平框

文章目录slot区分尺寸和对齐方式尺寸&#xff1a;自动模式尺寸&#xff1a;填充模式对齐常用设置所有按钮大小一致&#xff0c;不受文本影响靠右排列和unity的HorizontalLayout不太一样slot 以在水平框中放入带文字的按钮为例 UI如下布置 按钮的大小受slot的尺寸、对齐和内部…

【Golang】Go语言变量

Go语言变量 文章目录Go语言变量一、Go语言变量二、变量声明2.1、第一种声明方式2.2、第二种声明方式2.3、第三种声明方式2.4、多变量声明2.5、打印变量占用字节一、Go语言变量 变量来源于数学&#xff0c;是计算机语言中能存储计算结果或能表示值抽象的概念变量可以通过变量名…

Qt WebEngine Widgets的使用

一、Qt WebEngine基本概念Qt WebEngine中主要分为三个模块&#xff1a;Qt WebEngine Widgets模块&#xff0c;主要用于创建基于C Widgets部件的Web程序&#xff1b;Qt WebEngine模块用来创建基于Qt Quick的Web程序&#xff1b;Qt WebEngine Core模块用来与Chromeium交互。网页玄…

【C++】标准模板库(STL)—— 学习算法的利器

【C】标准模板库&#xff08;STL&#xff09;—— 学习算法的利器学习 STL 需要注意的几点及 STL 简介一、什么是 STL&#xff1f;二、学习 STL 前的先修知识三、STL 常见容器特点对比四、学习 STL 的关键注意点五、STL 学习路线建议六、总结七、下一章 vector容器快速上手学习…

YOLO算法演进综述:从YOLOv1到YOLOv13的技术突破与应用实践,一文掌握YOLO家族全部算法!

引言&#xff1a;介绍目标检测技术背景和YOLO算法的演进意义。YOLO算法发展历程&#xff1a;使用阶段划分方式系统梳理各代YOLO的技术演进&#xff0c;包含早期奠基、效率优化、注意力机制和高阶建模四个阶段。YOLOv13的核心技术创新&#xff1a;详细解析HyperACE机制、FullPAD…

快速将前端得依赖打为tar包(yarn.lock版本)并且推送至nexus私有依赖仓库(笔记)

第一步创建js文件 文件名为downloadNpmPackage.jsprocess.env.NODE_TLS_REJECT_UNAUTHORIZED "0";const fs require("fs"); const path require("path"); const request require("request");// 设置依赖目录 const downUrl "…

Unity VS Unreal Engine ,“电影像游戏的时代” 新手如何抉择引擎?(结)

Unity VS Unreal Engine &#xff0c;“电影像游戏的时代” 新手如何抉择引擎&#xff1f;(1)-CSDN博客 这是我的上一篇文章&#xff0c;如果你仍然困惑选择引擎的事情&#xff0c;我们不妨从别的方面看看 注意&#xff1a;我们可能使用"UE5"来表示Unreal Engine系…

EVAL长度限制突破方法

EVAL长度限制突破方法 <?php $param $_REQUEST[param]; If (strlen($param) < 17 && stripos($param, eval) false && stripos($param, assert) false) //长度小于17&#xff0c;没有eval和assert关键字 {eval($param); } //stripos — 查找字符串…

Linux部署.net Core 环境

我的环境 直接下载安装就可以了 wget https://builds.dotnet.microsoft.com/dotnet/Sdk/8.0.315/dotnet-sdk-8.0.315-linux-x64.tar.gzmkdir -p $HOME/dotnet && tar zxf dotnet-sdk-8.0.315-linux-x64.tar.gz -C $HOME/dotnet export DOTNET_ROOT$HOME/dotnet expor…

ARM-定时器-PWM通道输出

学习内容需求点亮4个灯&#xff0c;采用pwm的方式。定时器通道引脚AFLED序号T3CH0PD12AF2LED5CH1PD13AF2LED6CH2PD14AF2LED7CH3PD15AF2LED8实现LED5, LED6, LED7, LED8呼吸灯效果通用定时器多通道点亮T3定时器下的多个通道的灯。开发流程添加Timer依赖初始化PWM相关GPIO初始化P…

javaSE(List集合ArrayList实现类与LinkedList实现类)day15

目录 List集合&#xff1a; 1、ArrayList类&#xff1a; &#xff08;1&#xff09;数据结构&#xff1a; &#xff08;2&#xff09;扩容机制 &#xff08;3&#xff09;ArrayList的初始化&#xff1a; &#xff08;4&#xff09;ArrayList的添加元素方法 &#xff08;5…

解决 WSL 中无法访问 registry-1.docker.io/v2/,无法用 docker 拉取 image

文章目录无法拉取docker镜像补充迁移 WSL 位置Install Docker无法拉取docker镜像 docker run hello-world Unable to find image hello-world:latest locally docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline excee…

【C++】简单学——list类

模拟实现之前需要了解的概念带头双向链表&#xff08;double-linked&#xff09;&#xff0c;允许在任何位置进行插入区别相比vector和string&#xff0c;多了这个已经没有下标[ ]了&#xff0c;因为迭代器其实才是主流&#xff08;要包头文件<list>&#xff09;方法构造…

Qt 国际化与本地化完整解决方案

在全球化的今天&#xff0c;软件支持多语言和本地化&#xff08;Internationalization & Localization&#xff0c;简称i18n & l10n&#xff09;已成为基本需求。Qt提供了一套完整的解决方案&#xff0c;帮助开发者轻松实现应用程序的国际化支持。本文将从原理到实践&a…

MNIST 手写数字识别模型分析

功能概述 这段代码实现了一个基于TensorFlow和Keras的MNIST手写数字识别模型。主要功能包括&#xff1a; 加载并预处理MNIST数据集构建一个简单的全连接神经网络模型训练模型并评估其性能使用训练好的模型进行预测保存和加载模型 代码解析 1. 导入必要的库 import matplot…

进阶系统策略

该策略主要基于价格动态分析,结合多种技术指标和数学计算来生成交易信号。其核心逻辑包括: 1. 价格极值计算:首先,策略计算给定周期(由`Var3`定义)内的最高价和最低价,分别存储在`Var12`和`Var13`中。这一步骤旨在捕捉价格的短期波动范围。 2. 相对位置计算:接着,策…

【Linux内核】Linux驱动开发

推荐书籍&#xff1a; 《Linux内核探秘&#xff1a;深入解析文件系统和设备驱动的架构与设计》 知识点 x86的IO地址空间和内存地址空间是独立的两套地址空间&#xff0c;并且使用不同的指令访问。MOV, IN, OUT。内存映射I/O可以将IO映射到内存。ARM等RISC采用统一编编址&#x…

MySQL用户管理(15)

文章目录前言一、用户用户信息创建用户修改密码删除用户二、数据库的权限MySQL中的权限给用户授权回收权限总结前言 其实与 Linux 操作系统类似&#xff0c;MySQL 中也有 超级用户 和 普通用户 之分 如果一个用户只需要访问 MySQL 中的某一个数据库&#xff0c;甚至数据库中的某…