第一次提交。

其中爬取是tophub_scraper.py 数据入库是 tophub_add_data_to_db.py 查看当前数据内容是 db_viewer.py
2025-11-09 17:20:44 +08:00
commit 25da264413
29 changed files with 28508 additions and 0 deletions
--- a/2025年11月9日131545.txt
+++ b/2025年11月9日131545.txt
--- a/README.md
+++ b/README.md
@@ -0,0 +1,149 @@
+# TopHub数据处理系统
+
+本项目用于处理TopHub网站抓取的临时文件，对数据进行分类并存储到SQLite数据库中。
+
+## 功能特点
+
+1. **文件解析**：读取临时文件（格式为"日期+时间.txt"），每5行作为一个数据单元
+2. **数据提取**：从每个数据单元中提取标题和链接
+3. **智能分类**：调用本地API（Ollama）对标题进行自动分类
+4. **去重处理**：检查标题+日期是否已存在于数据库中，避免重复录入
+5. **进度显示**：使用进度条显示处理进度
+6. **分类标准化**：将相似分类合并为标准分类
+
+## 文件说明
+
+### 核心脚本
+
+1. **process_temp_files.py** - 主处理脚本
+   - 解析临时文件
+   - 调用API进行分类
+   - 存储到数据库
+
+2. **cleanup_categories.py** - 分类清理脚本
+   - 清理分类中的特殊字符
+   - 统一分类格式
+
+3. **standardize_categories.py** - 分类标准化脚本
+   - 将相似分类合并为标准分类
+   - 提供分类映射规则
+
+### 辅助脚本
+
+1. **check_db.py** - 数据库结构检查脚本
+2. **test_api.py** - API测试脚本
+3. **view_categories.py** - 查看分类示例脚本
+
+## 使用方法
+
+### 1. 处理临时文件
+
+```bash
+python process_temp_files.py
+```
+
+该脚本会：
+- 扫描当前目录下的所有临时文件（格式为"日期+时间.txt"）
+- 解析文件内容，提取标题和链接
+- 调用本地API对标题进行分类
+- 检查并避免重复数据
+- 存储到tophub_data.db数据库
+
+### 2. 清理和标准化分类
+
+```bash
+# 清理分类中的特殊字符
+python cleanup_categories.py
+
+# 标准化分类
+python standardize_categories.py
+```
+
+### 3. 查看数据
+
+```bash
+# 查看分类示例
+python view_categories.py
+
+# 检查数据库结构
+python check_db.py
+```
+
+## 数据库结构
+
+数据库文件为`tophub_data.db`，包含以下表：
+
+1. **tophub_entries** - 主数据表
+   - id: 主键
+   - text_content: 标题内容（非空）
+   - link: 链接
+   - category: 分类
+   - scrape_time: 抓取时间
+
+2. **classification_progress** - 分类进度表
+   - id: 主键
+   - total_count: 总数量
+   - processed_count: 已处理数量
+   - last_updated: 最后更新时间
+
+## API配置
+
+脚本使用本地Ollama API进行分类：
+- API地址：http://localhost:11434/api/generate
+- 模型：gemma3:4b
+- 请求格式：JSON
+
+## 分类标准
+
+系统支持以下标准分类：
+
+1. 科技 - 新质科技、互联网等
+2. 社会 - 社会新闻、生活服务等
+3. 体育 - 体育新闻、足球等
+4. 历史 - 历史事件、历史人物等
+5. 安全 - 安全漏洞、安全科技等
+6. 军事 - 军事新闻、国防等
+7. 金融 - 金融新闻、市场分析等
+8. 购物 - 电商、购物等
+9. 游戏 - 游戏新闻等
+10. 娱乐 - 娱乐八卦、音乐等
+11. 健康 - 健康医疗、健康生活等
+12. 其他 - 其他未分类内容
+
+## 注意事项
+
+1. 确保本地Ollama服务已启动并可访问
+2. 临时文件格式必须为"日期+时间.txt"
+3. 每个数据单元包含5行：节点ID、分类、标题、链接和分隔线
+4. 数据库文件会自动创建，无需手动创建
+
+## 日志文件
+
+系统会生成以下日志文件：
+- process_temp_files.log - 主处理日志
+- cleanup_categories.log - 分类清理日志
+- standardize_categories.log - 分类标准化日志
+
+## 示例
+
+### 临时文件格式示例
+
+```
+节点ID: 102
+分类: 宽带山
+标题: 女机器人
+链接: http://club.kdslife.com/t_11502693.html
+--------------------------------------------------
+节点ID: 103
+分类: 宽带山
+标题: 这个应该属于底盘不行吗
+链接: http://club.kdslife.com/t_11502686.html
+--------------------------------------------------
+```
+
+### 处理结果示例
+
+```
+标题 '女机器人' 分类为: 科技
+标题 '这个应该属于底盘不行吗' 分类为: 其他
+```
--- a/pycache/db_viewer.cpython-38.pyc
+++ b/pycache/db_viewer.cpython-38.pyc
--- a/pycache/tophub_add_data_to_db.cpython-38.pyc
+++ b/pycache/tophub_add_data_to_db.cpython-38.pyc
--- a/pycache/tophub_scraper.cpython-38.pyc
+++ b/pycache/tophub_scraper.cpython-38.pyc
--- a/add_interested_field.py
+++ b/add_interested_field.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+"""
+添加感兴趣标记字段脚本
+为articles表添加is_interested字段，默认值为0
+"""
+
+import sqlite3
+import os
+from loguru import logger
+
+def add_interested_field():
+    """为articles表添加is_interested字段"""
+    # 获取当前脚本所在目录的数据库文件路径
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    db_path = os.path.join(script_dir, "tophub_data.db")
+    
+    # 检查数据库文件是否存在
+    if not os.path.exists(db_path):
+        logger.error(f"数据库文件不存在: {db_path}")
+        return False
+    
+    try:
+        # 连接数据库
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        
+        # 检查is_interested字段是否已存在
+        cursor.execute("PRAGMA table_info(articles)")
+        columns = cursor.fetchall()
+        column_names = [column[1] for column in columns]
+        
+        if "is_interested" in column_names:
+            logger.info("is_interested字段已存在，无需添加")
+            conn.close()
+            return True
+        
+        # 添加is_interested字段，默认值为0
+        logger.info("正在添加is_interested字段...")
+        cursor.execute("ALTER TABLE articles ADD COLUMN is_interested INTEGER DEFAULT 0")
+        
+        # 提交更改
+        conn.commit()
+        logger.info("成功添加is_interested字段")
+        
+        # 验证字段是否添加成功
+        cursor.execute("PRAGMA table_info(articles)")
+        columns = cursor.fetchall()
+        column_names = [column[1] for column in columns]
+        
+        if "is_interested" in column_names:
+            logger.info("验证成功：is_interested字段已添加到articles表")
+        else:
+            logger.error("验证失败：is_interested字段未成功添加")
+            conn.close()
+            return False
+            
+        conn.close()
+        return True
+        
+    except sqlite3.Error as e:
+        logger.error(f"数据库操作出错: {str(e)}")
+        return False
+    except Exception as e:
+        logger.error(f"添加字段时出错: {str(e)}")
+        return False
+
+if __name__ == "__main__":
+    logger.add("db_modify.log", rotation="10 MB", level="INFO")
+    if add_interested_field():
+        logger.info("数据库修改完成")
+    else:
+        logger.error("数据库修改失败")
--- a/check_db.py
+++ b/check_db.py
@@ -0,0 +1,23 @@
+#!/usr/bin/env python3
+import sqlite3
+
+# 连接数据库
+conn = sqlite3.connect('tophub_data.db')
+cursor = conn.cursor()
+
+# 查看所有表
+cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
+tables = cursor.fetchall()
+print('Tables:', tables)
+
+# 查看表结构
+for table in tables:
+    table_name = table[0]
+    print(f'\nTable {table_name}:')
+    cursor.execute(f'PRAGMA table_info({table_name});')
+    columns = cursor.fetchall()
+    for col in columns:
+        print(col)
+
+# 关闭连接
+conn.close()
--- a/check_db_structure.py
+++ b/check_db_structure.py
@@ -0,0 +1,25 @@
+#!/usr/bin/env python3
+import sqlite3
+
+# 连接到数据库
+conn = sqlite3.connect('tophub_data.db')
+cursor = conn.cursor()
+
+# 获取所有表名
+cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
+tables = cursor.fetchall()
+
+print("数据库中的表:")
+for table in tables:
+    print(f"  - {table[0]}")
+
+# 获取每个表的结构
+for table in tables:
+    table_name = table[0]
+    print(f"\n表 '{table_name}' 的结构:")
+    cursor.execute(f"PRAGMA table_info({table_name});")
+    columns = cursor.fetchall()
+    for column in columns:
+        print(f"  {column[1]} ({column[2]})")
+
+conn.close()
--- a/check_interested_values.py
+++ b/check_interested_values.py
@@ -0,0 +1,50 @@
+#!/usr/bin/env python3
+import sqlite3
+from loguru import logger
+
+def check_interested_values():
+    """检查is_interested字段的值范围"""
+    try:
+        # 连接数据库
+        conn = sqlite3.connect('tophub_data.db')
+        cursor = conn.cursor()
+        
+        # 查询is_interested字段的最小值、最大值和平均值
+        cursor.execute("SELECT MIN(is_interested), MAX(is_interested), AVG(is_interested) FROM articles")
+        result = cursor.fetchone()
+        min_val, max_val, avg_val = result
+        
+        logger.info(f"is_interested字段统计:")
+        logger.info(f"  最小值: {min_val}")
+        logger.info(f"  最大值: {max_val}")
+        logger.info(f"  平均值: {avg_val:.2f}")
+        
+        # 查询不同值的分布
+        cursor.execute("SELECT is_interested, COUNT(*) FROM articles GROUP BY is_interested ORDER BY is_interested")
+        distribution = cursor.fetchall()
+        
+        logger.info("\nis_interested值分布:")
+        for value, count in distribution:
+            logger.info(f"  {value}: {count} 条记录")
+        
+        # 查询一些示例记录
+        cursor.execute("SELECT id, title, is_interested FROM articles ORDER BY is_interested DESC LIMIT 5")
+        examples = cursor.fetchall()
+        
+        logger.info("\n示例记录:")
+        for example in examples:
+            logger.info(f"  ID: {example[0]}, 标题: {example[1][:30]}..., is_interested: {example[2]}")
+        
+        conn.close()
+        return True
+        
+    except sqlite3.Error as e:
+        logger.error(f"数据库操作出错: {str(e)}")
+        return False
+    except Exception as e:
+        logger.error(f"查询数据时出错: {str(e)}")
+        return False
+
+if __name__ == "__main__":
+    logger.add("check_interested_values.log", rotation="10 MB", level="INFO")
+    check_interested_values()
--- a/db_modify.log
+++ b/db_modify.log
--- a/db_modify.py
+++ b/db_modify.py
@@ -0,0 +1,220 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+打开tophub_data.db数据库，读取表单，提取所有的类
+访问本地ollama的api，修改类的名称为2-4个字，去掉中间的空格、特殊字符等字符
+
+"""
+
+import requests
+import sqlite3
+import re
+import time
+from loguru import logger
+
+# 配置日志
+logger.add("db_modify.log", rotation="10 MB", level="INFO")
+
+class CategoryModifier:
+    """类别修改器，用于优化数据库中的类别名称"""
+    
+    def __init__(self, db_path="tophub_data.db"):
+        """
+        初始化类别修改器
+        
+        Args:
+            db_path (str): 数据库路径
+        """
+        self.db_path = db_path
+        self.ollama_url = "http://localhost:11434/api/generate"
+        self.model = "qwen3:8b"
+        
+    def get_all_categories(self):
+        """
+        从数据库中获取所有唯一的类别
+        
+        Returns:
+            list: 包含所有唯一类别的列表
+        """
+        try:
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            cursor.execute("SELECT DISTINCT category FROM articles")
+            categories = [row[0] for row in cursor.fetchall() if row[0]]
+            
+            conn.close()
+            logger.info(f"成功获取 {len(categories)} 个唯一类别")
+            return categories
+        except Exception as e:
+            logger.error(f"获取类别时出错: {e}")
+            return []
+            
+    def clean_category_name(self, category):
+        """
+        清理类别名称，移除特殊字符和多余空格
+        
+        Args:
+            category (str): 原始类别名称
+            
+        Returns:
+            str: 清理后的类别名称
+        """
+        # 移除特殊字符，只保留中文、英文和数字
+        cleaned = re.sub(r'[^\u4e00-\u9fa5a-zA-Z0-9]', '', category)
+        # 移除多余的空格
+        cleaned = re.sub(r'\s+', '', cleaned)
+        return cleaned
+        
+    def optimize_category_with_ollama(self, category):
+        """
+        使用Ollama API优化类别名称
+        
+        Args:
+            category (str): 原始类别名称
+            
+        Returns:
+            str: 优化后的类别名称
+        """
+        try:
+            # 构造提示词
+            prompt = f"请将以下类别名称简化为3-6个汉字，去除空格和特殊符号，更容易理解，并保持原意：'{category}'。" + \
+                     "例子一：'新科科技'，优化为'新质生产力'。例子二：'产设'，优化为'产品设计'。例子三：'史人'，优化为'历史人物'。"
+            
+            # 准备请求数据
+            data = {
+                "model": self.model,
+                "prompt": prompt,
+                "stream": False
+            }
+            
+            # 发送请求到Ollama API
+            response = requests.post(self.ollama_url, json=data, timeout=30)
+            response.raise_for_status()
+            
+            # 解析响应
+            result = response.json()
+            optimized = result.get("response", "").strip()
+            
+            # 清理优化后的名称
+            optimized = self.clean_category_name(optimized)
+            
+            logger.info(f"类别 '{category}' 优化为 '{optimized}'")
+            return optimized
+        except Exception as e:
+            logger.error(f"优化类别 '{category}' 时出错: {e}")
+            # 如果API调用失败，返回清理后的原始名称
+            return self.clean_category_name(category)
+            
+    def update_category_in_db(self, old_category, new_category):
+        """
+        更新数据库中的类别名称
+        
+        Args:
+            old_category (str): 原始类别名称
+            new_category (str): 新的类别名称
+        """
+        try:
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            cursor.execute(
+                "UPDATE articles SET category = ? WHERE category = ?",
+                (new_category, old_category)
+            )
+            
+            count = cursor.rowcount
+            conn.commit()
+            conn.close()
+            
+            logger.info(f"成功更新类别 '{old_category}' 为 '{new_category}'，影响 {count} 条记录")
+        except Exception as e:
+            logger.error(f"更新类别 '{old_category}' 时出错: {e}")
+            
+    def process_all_categories(self):
+        """
+        处理所有类别
+        """
+        logger.info("开始处理所有类别...")
+        
+        # 获取所有类别
+        categories = self.get_all_categories()
+        
+        if not categories:
+            logger.warning("未找到任何类别")
+            return
+            
+        # 初始化进度统计
+        total_categories = len(categories)
+        processed_count = 0
+        unchanged_count = 0
+        updated_count = 0
+        start_time = time.time()
+        
+        logger.info(f"总共需要处理 {total_categories} 个类别")
+        
+        # 处理每个类别
+        for i, category in enumerate(categories, 1):
+            category_start_time = time.time()
+            logger.info(f"处理进度: {i}/{total_categories} ({i/total_categories*100:.1f}%) - 类别: {category}")
+            
+            # 使用Ollama API优化类别名称
+            optimized_category = self.optimize_category_with_ollama(category)
+            
+            # 如果优化后的名称与原始名称不同，则更新数据库
+            if optimized_category != category:
+                self.update_category_in_db(category, optimized_category)
+                updated_count += 1
+                logger.info(f"类别 '{category}' 已更新为 '{optimized_category}'")
+            else:
+                unchanged_count += 1
+                logger.info(f"类别 '{category}' 无需更改")
+                
+            processed_count += 1
+            category_end_time = time.time()
+            category_duration = category_end_time - category_start_time
+            
+            # 显示当前类别处理时间和平均处理时间
+            elapsed_time = time.time() - start_time
+            avg_time_per_category = elapsed_time / processed_count
+            estimated_remaining = avg_time_per_category * (total_categories - processed_count)
+            
+            logger.info(f"类别 '{category}' 处理完成，耗时: {category_duration:.2f}秒")
+            logger.info(f"累计处理: {processed_count}/{total_categories} | "
+                       f"已更新: {updated_count} | 未更改: {unchanged_count} | "
+                       f"平均耗时: {avg_time_per_category:.2f}秒/类别 | "
+                       f"预计剩余时间: {estimated_remaining:.2f}秒")
+                
+        # 显示总体统计信息
+        total_duration = time.time() - start_time
+        logger.info("="*60)
+        logger.info("所有类别处理完成!")
+        logger.info(f"总计处理类别数: {total_categories}")
+        logger.info(f"更新类别数: {updated_count}")
+        logger.info(f"未更改类别数: {unchanged_count}")
+        logger.info(f"总耗时: {total_duration:.2f}秒")
+        logger.info(f"平均每类别处理时间: {total_duration/total_categories:.2f}秒")
+        logger.info("="*60)
+
+def main():
+    """主函数"""
+    modifier = CategoryModifier()
+    
+    # 检查Ollama服务是否可用
+    try:
+        response = requests.get("http://localhost:11434/api/tags", timeout=5)
+        if response.status_code == 200:
+            logger.info("Ollama服务可用")
+        else:
+            logger.warning("Ollama服务不可用，请确保服务已启动")
+            return
+    except Exception as e:
+        logger.warning(f"无法连接到Ollama服务: {e}")
+        logger.info("请确保Ollama服务已在本地运行")
+        return
+    
+    # 处理所有类别
+    modifier.process_all_categories()
+
+if __name__ == "__main__":
+    main()
--- a/db_modify_score.log
+++ b/db_modify_score.log
@@ -0,0 +1,6 @@
+2025-11-07 23:49:35.277 | INFO     | __main__:modify_database_structure:44 - 正在添加score字段...
+2025-11-07 23:49:35.281 | INFO     | __main__:modify_database_structure:48 - 正在转换is_interested数据到score字段...
+2025-11-07 23:49:35.288 | INFO     | __main__:modify_database_structure:63 - 成功添加score字段并转换数据
+2025-11-07 23:49:35.289 | INFO     | __main__:modify_database_structure:71 - 验证成功：score字段已添加到articles表
+2025-11-07 23:49:35.289 | INFO     | __main__:modify_database_structure:84 - 数据转换结果: score=7的记录数: 1, score=5的记录数: 1196
+2025-11-07 23:49:35.290 | INFO     | __main__:<module>:99 - 数据库结构修改完成
--- a/db_modify_zhipu.log
+++ b/db_modify_zhipu.log
--- a/db_modify_zhipu.py
+++ b/db_modify_zhipu.py
@@ -0,0 +1,101 @@
+# 调用智谱的api，修改每一个项目的分类
+# 从db文件读取表，读取第二个项，标题，根据标题，提交到api，获取回复，返回，并更新到db文件
+
+import sqlite3
+import time
+from loguru import logger
+from zhipuai import ZhipuAI
+
+# 配置日志
+logger.add("db_modify_zhipu.log", rotation="10 MB", level="INFO")
+
+# 初始化客户端
+client = ZhipuAI(api_key="fad3d9f9a45f4d939f0e7a7133fa07bf.X4bOO053GAIPKLE5")
+
+def get_simplified_category(title):
+    """
+    调用智谱API获取简化的分类名称
+    """
+    try:
+        # 创建聊天完成请求
+        response = client.chat.completions.create(
+            model="glm-4-flash",
+            messages=[
+                {
+                    "role": "system",
+                    "content": "你是一个专业的分类助手。请根据文章标题，提供一个3-6个汉字的简化分类名称，去除空格和特殊符号，更容易理解，并保持原意。"
+                },
+                {
+                    "role": "user",
+                    "content": f"对以下文字内容进行分类，返回结果为类别，如\"社会新闻\"，\"机器人\"，\"金融\"，\"历史\"，\"购物\"，\"新质生产力\"等等。目的：只返回2-6个汉字，不返回其它内容。内容：'{title}'"
+                }
+            ],
+            temperature=0.7
+        )
+        
+        # 提取回复内容
+        category = response.choices[0].message.content.strip()
+        logger.info(f"标题: {title[:30]}... -> 分类: {category}")
+        return category
+        
+    except Exception as e:
+        logger.error(f"获取分类失败: {str(e)}")
+        return None
+
+def update_database_categories():
+    """
+    更新数据库中的分类信息
+    """
+    # 连接到数据库
+    conn = sqlite3.connect('tophub_data.db')
+    cursor = conn.cursor()
+    
+    try:
+        # 获取所有记录
+        cursor.execute("SELECT id, title, category FROM articles")
+        records = cursor.fetchall()
+        
+        logger.info(f"共找到 {len(records)} 条记录需要处理")
+        
+        updated_count = 0
+        failed_count = 0
+        
+        # 处理每条记录
+        for record in records:
+            record_id, title, current_category = record
+            
+            # 跳过已经简化的分类（长度<=6且不包含特殊字符）
+            if current_category and len(current_category) <= 6 and not any(c in current_category for c in " ,.!?;:，。！？；："):
+                logger.info(f"跳过记录 {record_id}，分类已简化: {current_category}")
+                continue
+                
+            logger.info(f"处理记录 {record_id}: {title[:30]}...")
+            
+            # 获取新的分类
+            new_category = get_simplified_category(title)
+            
+            if new_category:
+                # 更新数据库
+                cursor.execute("UPDATE articles SET category = ? WHERE id = ?", (new_category, record_id))
+                conn.commit()
+                updated_count += 1
+                logger.info(f"已更新记录 {record_id} 的分类为: {new_category}")
+            else:
+                failed_count += 1
+                logger.error(f"无法获取记录 {record_id} 的新分类")
+            
+            # 添加延迟，避免API调用过于频繁
+            time.sleep(1)
+        
+        logger.info(f"处理完成! 成功更新 {updated_count} 条记录，失败 {failed_count} 条记录")
+        
+    except Exception as e:
+        logger.error(f"更新数据库时出错: {str(e)}")
+        conn.rollback()
+    finally:
+        conn.close()
+
+if __name__ == "__main__":
+    logger.info("开始更新数据库分类...")
+    update_database_categories()
+    logger.info("程序执行完成")
--- a/db_viewer.py
+++ b/db_viewer.py
@@ -0,0 +1,835 @@
+#!/usr/bin/env python3
+"""
+TopHub数据查看器 - PySide5界面应用程序
+用于显示SQLite数据库中的TopHub抓取数据
+"""
+
+import sys
+import os
+import sqlite3
+import webbrowser
+from datetime import datetime
+from loguru import logger
+from PySide6.QtWidgets import (
+    QApplication, QMainWindow, QTableWidget, QTableWidgetItem, QVBoxLayout, 
+    QHBoxLayout, QWidget, QLabel, QLineEdit, QPushButton, QComboBox, 
+    QGroupBox, QStatusBar, QMenuBar, QMenu, QMessageBox, QHeaderView,
+    QAbstractItemView, QDialog, QFormLayout, QTextEdit, QInputDialog
+)
+from PySide6.QtCore import Qt, QUrl, QTimer, QEvent
+from PySide6.QtGui import QAction, QFont, QIcon, QDesktopServices, QClipboard
+
+
+class DatabaseViewer(QMainWindow):
+    """主窗口类，用于显示数据库内容"""
+    
+    def __init__(self):
+        super().__init__()
+        # 获取当前脚本所在目录的数据库文件路径
+        script_dir = os.path.dirname(os.path.abspath(__file__))
+        self.db_path = os.path.join(script_dir, "tophub_data.db")
+        
+        # 检查数据库文件是否存在
+        if not os.path.exists(self.db_path):
+            QMessageBox.critical(self, "错误", f"数据库文件不存在: {self.db_path}")
+            sys.exit(1)
+            
+        self.init_ui()
+        self.load_data()
+        
+    def init_ui(self):
+        """初始化用户界面"""
+        # 设置窗口属性
+        self.setWindowTitle("TopHub数据查看器")
+        self.setGeometry(100, 100, 1200, 800)
+        
+        # 创建中央部件
+        central_widget = QWidget()
+        self.setCentralWidget(central_widget)
+        
+        # 创建主布局
+        main_layout = QVBoxLayout(central_widget)
+        
+        # 创建搜索和筛选区域
+        filter_group = QGroupBox("搜索和筛选")
+        filter_layout = QHBoxLayout(filter_group)
+        
+        # 搜索框
+        self.search_edit = QLineEdit()
+        self.search_edit.setPlaceholderText("输入搜索关键词...")
+        self.search_edit.textChanged.connect(self.filter_data)
+        filter_layout.addWidget(QLabel("搜索:"))
+        filter_layout.addWidget(self.search_edit)
+        
+        # 分类筛选
+        self.category_combo = QComboBox()
+        self.category_combo.addItem("全部分类")
+        self.category_combo.currentTextChanged.connect(self.filter_data)
+        filter_layout.addWidget(QLabel("分类:"))
+        filter_layout.addWidget(self.category_combo)
+        
+        # 刷新按钮
+        self.refresh_button = QPushButton("刷新数据")
+        self.refresh_button.clicked.connect(self.load_data)
+        filter_layout.addWidget(self.refresh_button)
+        
+        # 批量删除相关控件
+        self.select_by_keyword_button = QPushButton("按关键字选中")
+        self.select_by_keyword_button.clicked.connect(self.select_by_keyword)
+        filter_layout.addWidget(self.select_by_keyword_button)
+        
+        self.delete_selected_button = QPushButton("删除选中项")
+        self.delete_selected_button.clicked.connect(self.delete_selected_items)
+        filter_layout.addWidget(self.delete_selected_button)
+        
+        # 标记感兴趣按钮
+        self.mark_interested_button = QPushButton("标记为感兴趣")
+        self.mark_interested_button.clicked.connect(self.mark_as_interested)
+        filter_layout.addWidget(self.mark_interested_button)
+        
+        # 添加筛选区域到主布局
+        main_layout.addWidget(filter_group)
+        
+        # 创建分类统计显示区域
+        self.category_stats_group = QGroupBox("分类统计")
+        self.category_stats_layout = QHBoxLayout(self.category_stats_group)
+        self.category_stats_label = QLabel("暂无数据")
+        self.category_stats_layout.addWidget(self.category_stats_label)
+        main_layout.addWidget(self.category_stats_group)
+        
+        # 创建表格
+        self.table = QTableWidget()
+        self.table.setColumnCount(6)  # 保留6列，最后一列显示评分
+        self.table.setHorizontalHeaderLabels(["ID", "标题", "链接", "分类", "来源日期", "评分"])
+        
+        # 设置表格属性
+        self.table.setAlternatingRowColors(True)
+        self.table.setSelectionBehavior(QAbstractItemView.SelectRows)
+        self.table.setEditTriggers(QAbstractItemView.NoEditTriggers)
+        self.table.setSortingEnabled(True)
+        
+        # 设置表格选择模式
+        self.table.setSelectionMode(QAbstractItemView.SingleSelection)
+        
+        # 设置列宽
+        header = self.table.horizontalHeader()
+        header.setSectionResizeMode(0, QHeaderView.ResizeToContents)  # ID列
+        header.setSectionResizeMode(1, QHeaderView.Stretch)  # 文本内容列
+        header.setSectionResizeMode(2, QHeaderView.ResizeToContents)  # 链接列
+        header.setSectionResizeMode(3, QHeaderView.ResizeToContents)  # 分类列
+        header.setSectionResizeMode(4, QHeaderView.ResizeToContents)  # 时间列
+        header.setSectionResizeMode(5, QHeaderView.ResizeToContents)  # 评分列
+        
+        # 启用链接点击
+        self.table.cellClicked.connect(self.on_cell_clicked)
+        
+        # 安装事件过滤器以处理链接点击
+        self.table.viewport().installEventFilter(self)
+        
+        # 启用右键菜单
+        self.table.setContextMenuPolicy(Qt.CustomContextMenu)
+        self.table.customContextMenuRequested.connect(self.show_context_menu)
+        
+        # 添加表格到主布局
+        main_layout.addWidget(self.table)
+        
+        # 创建状态栏
+        self.status_bar = QStatusBar()
+        self.setStatusBar(self.status_bar)
+        
+        # 创建菜单栏
+        self.create_menu_bar()
+        
+    def create_menu_bar(self):
+        """创建菜单栏"""
+        menubar = self.menuBar()
+        
+        # 文件菜单
+        file_menu = menubar.addMenu("文件")
+        
+        # 刷新动作
+        refresh_action = QAction("刷新数据", self)
+        refresh_action.setShortcut("F5")
+        refresh_action.triggered.connect(self.load_data)
+        file_menu.addAction(refresh_action)
+        
+        # 退出动作
+        exit_action = QAction("退出", self)
+        exit_action.setShortcut("Ctrl+Q")
+        exit_action.triggered.connect(self.close)
+        file_menu.addAction(exit_action)
+        
+        # 帮助菜单
+        help_menu = menubar.addMenu("帮助")
+        
+        # 关于动作
+        about_action = QAction("关于", self)
+        about_action.triggered.connect(self.show_about)
+        help_menu.addAction(about_action)
+        
+    def load_data(self):
+        """从数据库加载数据"""
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 检查表是否存在
+            cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='articles'")
+            if not cursor.fetchone():
+                QMessageBox.critical(self, "错误", "数据库中不存在articles表")
+                conn.close()
+                return
+            
+            # 查询数据 - 修改为查询score字段而不是is_interested
+            cursor.execute('''
+            SELECT id, title, url, category, source_date, score 
+            FROM articles 
+            ORDER BY id DESC
+            ''')
+            
+            rows = cursor.fetchall()
+            conn.close()
+            
+            # 更新表格
+            self.table.setRowCount(len(rows))
+            
+            # 获取所有分类和统计信息
+            categories = set()
+            category_counts = {}  # 用于存储每个分类的数量
+            
+            for row_idx, row in enumerate(rows):
+                id_val, title, url, category, source_date, score = row
+                
+                # 添加到分类集合和统计字典
+                if category:
+                    categories.add(category)
+                    category_counts[category] = category_counts.get(category, 0) + 1
+                else:
+                    # 处理空分类的情况
+                    category_counts["未分类"] = category_counts.get("未分类", 0) + 1
+                
+                # 设置表格项
+                self.table.setItem(row_idx, 0, QTableWidgetItem(str(id_val)))
+                self.table.setItem(row_idx, 1, QTableWidgetItem(title))
+                
+                # 链接项 - 设置为蓝色并加下划线
+                link_item = QTableWidgetItem(url if url else "")
+                if url:
+                    link_item.setForeground(Qt.blue)
+                    link_item.setFont(QFont("", -1, QFont.Bold))
+                self.table.setItem(row_idx, 2, link_item)
+                
+                self.table.setItem(row_idx, 3, QTableWidgetItem(category if category else "未分类"))
+                self.table.setItem(row_idx, 4, QTableWidgetItem(source_date))
+                
+                # 感兴趣状态项
+                score_item = QTableWidgetItem(str(score))
+                # 根据分数设置颜色
+                if score >= 8:
+                    score_item.setForeground(Qt.green)
+                    score_item.setFont(QFont("", -1, QFont.Bold))
+                elif score >= 6:
+                    score_item.setForeground(Qt.blue)
+                elif score <= 3:
+                    score_item.setForeground(Qt.red)
+                self.table.setItem(row_idx, 5, score_item)
+            
+            # 更新分类下拉框
+            current_category = self.category_combo.currentText()
+            self.category_combo.clear()
+            self.category_combo.addItem("全部分类")
+            for cat in sorted(categories):
+                self.category_combo.addItem(cat)
+            
+            # 恢复之前选择的分类
+            index = self.category_combo.findText(current_category)
+            if index >= 0:
+                self.category_combo.setCurrentIndex(index)
+            
+            # 更新分类统计显示
+            self.update_category_stats(category_counts)
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已加载 {len(rows)} 条记录")
+            
+        except sqlite3.Error as e:
+            logger.error(f"数据库操作出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"数据库操作出错: {str(e)}")
+            self.status_bar.showMessage("加载数据失败")
+        except Exception as e:
+            logger.error(f"加载数据时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"加载数据时出错: {str(e)}")
+            self.status_bar.showMessage("加载数据失败")
+            
+    def update_category_stats(self, category_counts):
+        """更新分类统计显示"""
+        if not category_counts:
+            self.category_stats_label.setText("暂无数据")
+            return
+        
+        # 按数量降序排列分类
+        sorted_categories = sorted(category_counts.items(), key=lambda x: x[1], reverse=True)
+        
+        # 构建统计信息文本
+        stats_text = " | ".join([f"{category}: {count}" for category, count in sorted_categories])
+        
+        # 如果文本过长，进行截断并添加提示
+        if len(stats_text) > 200:
+            stats_text = stats_text[:200] + "... (更多分类请查看完整数据)"
+        
+        self.category_stats_label.setText(stats_text)
+        self.category_stats_label.setToolTip(" | ".join([f"{category}: {count}" for category, count in sorted_categories]))
+            
+    def update_category_stats_after_filter(self):
+        """在筛选后更新分类统计显示"""
+        # 统计可见行的分类
+        category_counts = {}
+        
+        for row in range(self.table.rowCount()):
+            # 跳过隐藏的行
+            if self.table.isRowHidden(row):
+                continue
+                
+            # 获取分类项
+            category_item = self.table.item(row, 3)
+            if category_item:
+                category = category_item.text()
+                category_counts[category] = category_counts.get(category, 0) + 1
+            else:
+                category_counts["未分类"] = category_counts.get("未分类", 0) + 1
+        
+        # 更新分类统计显示
+        self.update_category_stats(category_counts)
+            
+    def filter_data(self):
+        """根据搜索条件和分类筛选数据"""
+        search_text = self.search_edit.text().lower()
+        selected_category = self.category_combo.currentText()
+        
+        # 遍历所有行
+        for row in range(self.table.rowCount()):
+            show_row = True
+            
+            # 检查搜索条件
+            if search_text:
+                text_match = False
+                for col in range(1, 6):  # 检查标题、链接、分类、日期、感兴趣列
+                    item = self.table.item(row, col)
+                    if item and search_text in item.text().lower():
+                        text_match = True
+                        break
+                show_row = show_row and text_match
+            
+            # 检查分类条件
+            if selected_category != "全部分类":
+                category_item = self.table.item(row, 3)
+                category_match = category_item and category_item.text() == selected_category
+                show_row = show_row and category_match
+            
+            # 显示或隐藏行
+            self.table.setRowHidden(row, not show_row)
+        
+        # 计算可见行数
+        visible_count = sum(1 for row in range(self.table.rowCount()) 
+                           if not self.table.isRowHidden(row))
+        self.status_bar.showMessage(f"显示 {visible_count}/{self.table.rowCount()} 条记录")
+        
+        # 重新计算并显示分类统计
+        self.update_category_stats_after_filter()
+        
+    def eventFilter(self, obj, event):
+        """事件过滤器，用于处理链接点击而不触发行选择"""
+        if obj == self.table.viewport() and event.type() == QEvent.MouseButtonPress:
+            # 获取点击位置
+            pos = event.position()
+            # 获取点击位置的行和列
+            row = self.table.rowAt(int(pos.y()))
+            column = self.table.columnAt(int(pos.x()))
+            
+            # 如果点击的是链接列（第2列，索引为2）
+            if column == 2 and row >= 0:
+                item = self.table.item(row, column)
+                if item and item.text() and item.text().startswith("http"):
+                    # 直接打开链接
+                    webbrowser.open(item.text())
+                    # 返回True表示事件已处理，不再传递给原始处理器
+                    # 这样就不会触发行选择，避免鼠标跳动
+                    return True
+        
+        # 其他事件交给原始处理器处理
+        return super().eventFilter(obj, event)
+        
+    def on_cell_clicked(self, row, column):
+        """处理单元格点击事件"""
+        # 链接列的点击已经由eventFilter处理，这里不再处理
+        # 只处理非链接列的点击，保持原有选择行为
+        if column != 2:
+            # 可以在这里添加其他列的点击处理逻辑
+            pass
+                    
+    def show_context_menu(self, position):
+        """显示右键菜单"""
+        # 获取点击位置的行
+        row = self.table.rowAt(position.y())
+        if row < 0:
+            return
+            
+        # 选中该行
+        self.table.selectRow(row)
+        
+        # 创建右键菜单
+        menu = QMenu(self)
+        
+        # 添加"增加评分(+1)"动作
+        increase_score_action = QAction("增加评分(+1)", self)
+        increase_score_action.triggered.connect(self.increase_score)
+        menu.addAction(increase_score_action)
+        
+        # 添加"减少评分(-1)"动作
+        decrease_score_action = QAction("减少评分(-1)", self)
+        decrease_score_action.triggered.connect(self.decrease_score)
+        menu.addAction(decrease_score_action)
+        
+        # 添加分隔线
+        menu.addSeparator()
+        
+        # 添加"复制信息"动作
+        copy_info_action = QAction("复制信息", self)
+        copy_info_action.triggered.connect(self.copy_info)
+        menu.addAction(copy_info_action)
+        
+        # 添加分隔线
+        menu.addSeparator()
+        
+        # 添加"删除"动作
+        delete_action = QAction("删除选中项", self)
+        delete_action.triggered.connect(self.delete_selected_items)
+        menu.addAction(delete_action)
+        
+        # 显示菜单
+        menu.exec_(self.table.mapToGlobal(position))
+        
+    def copy_info(self):
+        """复制选中行的标题、链接、日期等信息"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要复制信息的行")
+            return
+            
+        # 收集所有选中行的信息
+        all_info = []
+        for row in sorted(selected_rows):
+            # 获取标题、链接、日期
+            title_item = self.table.item(row, 1)
+            url_item = self.table.item(row, 2)
+            date_item = self.table.item(row, 4)
+            
+            title = title_item.text() if title_item else ""
+            url = url_item.text() if url_item else ""
+            date = date_item.text() if date_item else ""
+            
+            # 用空格组合信息
+            info = f"{title} {url} {date}".strip()
+            all_info.append(info)
+            
+        # 将所有信息用换行符连接
+        clipboard_text = "\n".join(all_info)
+        
+        # 复制到剪贴板
+        clipboard = QApplication.clipboard()
+        clipboard.setText(clipboard_text)
+        
+        # 更新状态栏
+        self.status_bar.showMessage(f"已复制 {len(selected_rows)} 行信息到剪贴板")
+        
+    def show_about(self):
+        """显示关于对话框"""
+        about_text = """
+        <h3>TopHub数据查看器</h3>
+        <p>版本: 1.0</p>
+        <p>用于查看TopHub网站抓取数据的PySide5应用程序</p>
+        <p>功能特性:</p>
+        <ul>
+            <li>显示SQLite数据库中的抓取数据</li>
+            <li>支持点击链接在浏览器中打开</li>
+            <li>支持搜索和分类筛选</li>
+            <li>支持排序功能</li>
+            <li>支持标记感兴趣的项目</li>
+        </ul>
+        """
+        QMessageBox.about(self, "关于", about_text)
+        
+    def select_by_keyword(self):
+        """按关键字选中行"""
+        # 弹出输入对话框获取关键字
+        keyword, ok = QInputDialog.getText(self, "按关键字选中", "请输入关键字:")
+        
+        if not ok or not keyword:
+            return
+            
+        keyword = keyword.lower()
+        selected_count = 0
+        
+        # 遍历所有可见行
+        for row in range(self.table.rowCount()):
+            # 跳过隐藏的行
+            if self.table.isRowHidden(row):
+                continue
+                
+            # 检查该行是否包含关键字
+            match = False
+            for col in range(self.table.columnCount()):
+                item = self.table.item(row, col)
+                if item and keyword in item.text().lower():
+                    match = True
+                    break
+            
+            # 如果匹配，则选中该行
+            if match:
+                self.table.selectRow(row)
+                selected_count += 1
+                
+        # 更新状态栏
+        self.status_bar.showMessage(f"已选中 {selected_count} 行")
+        
+    def delete_selected_items(self):
+        """删除选中的项目"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要删除的行")
+            return
+            
+        # 弹出确认对话框
+        reply = QMessageBox.question(
+            self, 
+            "确认删除", 
+            f"确定要删除选中的 {len(selected_rows)} 行数据吗？此操作不可撤销！",
+            QMessageBox.Yes | QMessageBox.No,
+            QMessageBox.No
+        )
+        
+        if reply == QMessageBox.No:
+            return
+            
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 删除选中的行
+            deleted_count = 0
+            for row in sorted(selected_rows, reverse=True):  # 从后往前删除，避免索引变化
+                # 获取ID
+                id_item = self.table.item(row, 0)
+                if id_item:
+                    article_id = id_item.text()
+                    # 从数据库中删除
+                    cursor.execute("DELETE FROM articles WHERE id = ?", (article_id,))
+                    # 从表格中移除行
+                    self.table.removeRow(row)
+                    deleted_count += 1
+                    
+            # 提交更改
+            conn.commit()
+            conn.close()
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已删除 {deleted_count} 行数据")
+            
+            # 重新加载数据以更新分类统计
+            self.load_data()
+            
+        except sqlite3.Error as e:
+            logger.error(f"删除数据时出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"删除数据时出错: {str(e)}")
+            self.status_bar.showMessage("删除失败")
+        except Exception as e:
+            logger.error(f"删除数据时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"删除数据时出错: {str(e)}")
+            self.status_bar.showMessage("删除失败")
+            
+    def mark_as_interested(self):
+        """将选中的项目标记为感兴趣"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要标记的行")
+            return
+            
+        # 弹出确认对话框
+        reply = QMessageBox.question(
+            self, 
+            "确认标记", 
+            f"确定要将选中的 {len(selected_rows)} 行标记为感兴趣吗？",
+            QMessageBox.Yes | QMessageBox.No,
+            QMessageBox.Yes
+        )
+        
+        if reply == QMessageBox.No:
+            return
+            
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 更新选中的行
+            updated_count = 0
+            for row in selected_rows:
+                # 获取ID
+                id_item = self.table.item(row, 0)
+                if id_item:
+                    article_id = id_item.text()
+                    # 更新数据库中的is_interested字段
+                    cursor.execute("UPDATE articles SET is_interested = 1 WHERE id = ?", (article_id,))
+                    
+                    # 更新表格中的显示
+                    interested_item = QTableWidgetItem("是")
+                    interested_item.setForeground(Qt.green)
+                    interested_item.setFont(QFont("", -1, QFont.Bold))
+                    self.table.setItem(row, 5, interested_item)
+                    
+                    updated_count += 1
+                    
+            # 提交更改
+            conn.commit()
+            conn.close()
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已标记 {updated_count} 行为感兴趣")
+            
+        except sqlite3.Error as e:
+            logger.error(f"标记数据时出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
+            self.status_bar.showMessage("标记失败")
+        except Exception as e:
+            logger.error(f"标记数据时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
+            self.status_bar.showMessage("标记失败")
+
+
+    def mark_as_not_interested(self):
+        """将选中的项目标记为不感兴趣"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要标记的行")
+            return
+            
+        # 弹出确认对话框
+        reply = QMessageBox.question(
+            self, 
+            "确认标记", 
+            f"确定要将选中的 {len(selected_rows)} 行标记为不感兴趣吗？",
+            QMessageBox.Yes | QMessageBox.No,
+            QMessageBox.Yes
+        )
+        
+        if reply == QMessageBox.No:
+            return
+            
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 更新选中的行
+            updated_count = 0
+            for row in selected_rows:
+                # 获取ID
+                id_item = self.table.item(row, 0)
+                if id_item:
+                    article_id = id_item.text()
+                    # 更新数据库中的is_interested字段
+                    cursor.execute("UPDATE articles SET is_interested = 0 WHERE id = ?", (article_id,))
+                    
+                    # 更新表格中的显示
+                    interested_item = QTableWidgetItem("否")
+                    # 不感兴趣项使用普通字体和颜色
+                    self.table.setItem(row, 5, interested_item)
+                    
+                    updated_count += 1
+                    
+            # 提交更改
+            conn.commit()
+            conn.close()
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已标记 {updated_count} 行为不感兴趣")
+            
+        except sqlite3.Error as e:
+            logger.error(f"标记数据时出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
+            self.status_bar.showMessage("标记失败")
+        except Exception as e:
+            logger.error(f"标记数据时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
+            self.status_bar.showMessage("标记失败")
+
+
+    def increase_score(self):
+        """增加选中项目的评分(+1)"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要增加评分的行")
+            return
+            
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 更新选中的行
+            updated_count = 0
+            for row in selected_rows:
+                # 获取ID
+                id_item = self.table.item(row, 0)
+                if id_item:
+                    article_id = id_item.text()
+                    # 获取当前分数
+                    cursor.execute("SELECT score FROM articles WHERE id = ?", (article_id,))
+                    result = cursor.fetchone()
+                    if result:
+                        current_score = result[0]
+                        # 增加分数，但不超过10
+                        new_score = min(current_score + 1, 10)
+                        # 更新数据库中的score字段
+                        cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (new_score, article_id))
+                        
+                        # 更新表格中的显示
+                        score_item = QTableWidgetItem(str(new_score))
+                        # 根据分数设置颜色
+                        if new_score >= 8:
+                            score_item.setForeground(Qt.green)
+                            score_item.setFont(QFont("", -1, QFont.Bold))
+                        elif new_score >= 6:
+                            score_item.setForeground(Qt.blue)
+                        elif new_score <= 3:
+                            score_item.setForeground(Qt.red)
+                        self.table.setItem(row, 5, score_item)
+                        
+                        updated_count += 1
+                    
+            # 提交更改
+            conn.commit()
+            conn.close()
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已增加 {updated_count} 行的评分")
+            
+        except sqlite3.Error as e:
+            logger.error(f"增加评分时出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"增加评分时出错: {str(e)}")
+            self.status_bar.showMessage("增加评分失败")
+        except Exception as e:
+            logger.error(f"增加评分时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"增加评分时出错: {str(e)}")
+            self.status_bar.showMessage("增加评分失败")
+
+    def decrease_score(self):
+        """减少选中项目的评分(-1)"""
+        # 获取选中的行
+        selected_rows = set()
+        for item in self.table.selectedItems():
+            selected_rows.add(item.row())
+            
+        # 如果没有选中的行，直接返回
+        if not selected_rows:
+            QMessageBox.information(self, "提示", "请先选中要减少评分的行")
+            return
+            
+        try:
+            # 连接数据库
+            conn = sqlite3.connect(self.db_path)
+            cursor = conn.cursor()
+            
+            # 更新选中的行
+            updated_count = 0
+            for row in selected_rows:
+                # 获取ID
+                id_item = self.table.item(row, 0)
+                if id_item:
+                    article_id = id_item.text()
+                    # 获取当前分数
+                    cursor.execute("SELECT score FROM articles WHERE id = ?", (article_id,))
+                    result = cursor.fetchone()
+                    if result:
+                        current_score = result[0]
+                        # 减少分数，但不低于0
+                        new_score = max(current_score - 1, 0)
+                        # 更新数据库中的score字段
+                        cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (new_score, article_id))
+                        
+                        # 更新表格中的显示
+                        score_item = QTableWidgetItem(str(new_score))
+                        # 根据分数设置颜色
+                        if new_score >= 8:
+                            score_item.setForeground(Qt.green)
+                            score_item.setFont(QFont("", -1, QFont.Bold))
+                        elif new_score >= 6:
+                            score_item.setForeground(Qt.blue)
+                        elif new_score <= 3:
+                            score_item.setForeground(Qt.red)
+                        self.table.setItem(row, 5, score_item)
+                        
+                        updated_count += 1
+                    
+            # 提交更改
+            conn.commit()
+            conn.close()
+            
+            # 更新状态栏
+            self.status_bar.showMessage(f"已减少 {updated_count} 行的评分")
+            
+        except sqlite3.Error as e:
+            logger.error(f"减少评分时出错: {str(e)}")
+            QMessageBox.critical(self, "数据库错误", f"减少评分时出错: {str(e)}")
+            self.status_bar.showMessage("减少评分失败")
+        except Exception as e:
+            logger.error(f"减少评分时出错: {str(e)}")
+            QMessageBox.critical(self, "错误", f"减少评分时出错: {str(e)}")
+            self.status_bar.showMessage("减少评分失败")
+
+
+def main():
+    """主函数"""
+    app = QApplication(sys.argv)
+    
+    # 设置应用程序属性
+    app.setApplicationName("TopHub数据查看器")
+    app.setOrganizationName("TopHub")
+    
+    # 创建并显示主窗口
+    viewer = DatabaseViewer()
+    viewer.show()
+    
+    # 运行应用程序
+    sys.exit(app.exec())
+
+
+if __name__ == "__main__":
+    main()
--- a/fix_db_viewer.py
+++ b/fix_db_viewer.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python3
+"""
+修复db_viewer.py文件中的方法位置问题
+将increase_score和decrease_score方法从文件末尾移动到DatabaseViewer类内部
+"""
+
+import re
+
+def fix_db_viewer():
+    """修复db_viewer.py文件"""
+    try:
+        # 读取原始文件
+        with open('db_viewer.py', 'r', encoding='utf-8') as f:
+            content = f.read()
+        
+        # 找到increase_score和decrease_score方法
+        increase_score_match = re.search(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
+        decrease_score_match = re.search(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', content, re.DOTALL)
+        
+        if not increase_score_match or not decrease_score_match:
+            print("未找到increase_score或decrease_score方法")
+            return False
+        
+        # 提取方法内容
+        increase_score_method = increase_score_match.group(0)
+        decrease_score_method = decrease_score_match.group(0)
+        
+        # 从文件末尾移除这两个方法
+        content = re.sub(r'\n\s*def increase_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
+        content = re.sub(r'\n\s*def decrease_score\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z)', '', content, flags=re.DOTALL)
+        
+        # 找到mark_as_not_interested方法的结束位置，在其后插入新方法
+        mark_as_not_interested_match = re.search(r'(\n\s*def mark_as_not_interested\(self\):.*?(?=\n\s*def|\n\nclass|\n\ndef|\n\nif __name__|\Z))', content, re.DOTALL)
+        
+        if not mark_as_not_interested_match:
+            print("未找到mark_as_not_interested方法")
+            return False
+        
+        # 在mark_as_not_interested方法后插入新方法
+        insertion_point = mark_as_not_interested_match.end(1)
+        new_content = content[:insertion_point] + increase_score_method + decrease_score_method + content[insertion_point:]
+        
+        # 写入修复后的文件
+        with open('db_viewer.py', 'w', encoding='utf-8') as f:
+            f.write(new_content)
+        
+        print("成功修复db_viewer.py文件")
+        return True
+        
+    except Exception as e:
+        print(f"修复文件时出错: {str(e)}")
+        return False
+
+if __name__ == "__main__":
+    fix_db_viewer()
--- a/gui_test.log
+++ b/gui_test.log
@@ -0,0 +1,2 @@
+2025-11-07 23:39:42.157 | INFO     | __main__:<module>:42 - 开始GUI测试
+2025-11-07 23:39:47.875 | INFO     | __main__:close_app:30 - 测试完成，关闭应用程序
--- a/modify_db_to_score.py
+++ b/modify_db_to_score.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""
+修改数据库结构脚本
+将is_interested字段改为score字段，实现10分评分制度
+"""
+
+import sqlite3
+import os
+from loguru import logger
+
+def modify_database_structure():
+    """修改数据库结构，将is_interested字段改为score字段"""
+    # 获取当前脚本所在目录的数据库文件路径
+    script_dir = os.path.dirname(os.path.abspath(__file__))
+    db_path = os.path.join(script_dir, "tophub_data.db")
+    
+    # 检查数据库文件是否存在
+    if not os.path.exists(db_path):
+        logger.error(f"数据库文件不存在: {db_path}")
+        return False
+    
+    try:
+        # 连接数据库
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        
+        # 检查is_interested字段是否存在
+        cursor.execute("PRAGMA table_info(articles)")
+        columns = cursor.fetchall()
+        column_names = [column[1] for column in columns]
+        
+        if "is_interested" not in column_names:
+            logger.info("is_interested字段不存在，无需修改")
+            conn.close()
+            return True
+            
+        # 检查score字段是否已存在
+        if "score" in column_names:
+            logger.info("score字段已存在，无需添加")
+            conn.close()
+            return True
+        
+        # 添加score字段，默认值为5
+        logger.info("正在添加score字段...")
+        cursor.execute("ALTER TABLE articles ADD COLUMN score INTEGER DEFAULT 5")
+        
+        # 将is_interested的值转换为score
+        logger.info("正在转换is_interested数据到score字段...")
+        
+        # 获取所有记录
+        cursor.execute("SELECT id, is_interested FROM articles")
+        records = cursor.fetchall()
+        
+        # 转换数据
+        for record in records:
+            article_id, is_interested = record
+            # 转换逻辑：is_interested=1转为score=7，is_interested=0转为score=5
+            score = 7 if is_interested == 1 else 5
+            cursor.execute("UPDATE articles SET score = ? WHERE id = ?", (score, article_id))
+        
+        # 提交更改
+        conn.commit()
+        logger.info("成功添加score字段并转换数据")
+        
+        # 验证字段是否添加成功
+        cursor.execute("PRAGMA table_info(articles)")
+        columns = cursor.fetchall()
+        column_names = [column[1] for column in columns]
+        
+        if "score" in column_names:
+            logger.info("验证成功：score字段已添加到articles表")
+        else:
+            logger.error("验证失败：score字段未成功添加")
+            conn.close()
+            return False
+            
+        # 检查数据转换结果
+        cursor.execute("SELECT COUNT(*) FROM articles WHERE score = 7")
+        count_7 = cursor.fetchone()[0]
+        
+        cursor.execute("SELECT COUNT(*) FROM articles WHERE score = 5")
+        count_5 = cursor.fetchone()[0]
+        
+        logger.info(f"数据转换结果: score=7的记录数: {count_7}, score=5的记录数: {count_5}")
+            
+        conn.close()
+        return True
+        
+    except sqlite3.Error as e:
+        logger.error(f"数据库操作出错: {str(e)}")
+        return False
+    except Exception as e:
+        logger.error(f"修改数据库结构时出错: {str(e)}")
+        return False
+
+if __name__ == "__main__":
+    logger.add("db_modify_score.log", rotation="10 MB", level="INFO")
+    if modify_database_structure():
+        logger.info("数据库结构修改完成")
+    else:
+        logger.error("数据库结构修改失败")
--- a/ollama_model_viewer.py
+++ b/ollama_model_viewer.py
@@ -0,0 +1,79 @@
+import sys
+import requests
+import json
+from PySide6.QtWidgets import QApplication, QMainWindow, QListWidget, QVBoxLayout, QWidget, QLabel, QPushButton
+from PySide6.QtCore import Qt
+from loguru import logger
+
+class OllamaModelViewer(QMainWindow):
+    def __init__(self):
+        super().__init__()
+        self.setWindowTitle("Ollama 模型查看器")
+        self.setGeometry(100, 100, 600, 400)
+        
+        # 创建主窗口部件
+        self.central_widget = QWidget()
+        self.setCentralWidget(self.central_widget)
+        
+        # 创建布局
+        self.layout = QVBoxLayout()
+        self.central_widget.setLayout(self.layout)
+        
+        # 创建标题标签
+        self.title_label = QLabel("当前安装的Ollama模型:")
+        self.title_label.setStyleSheet("font-weight: bold; font-size: 14px;")
+        self.layout.addWidget(self.title_label)
+        
+        # 创建列表部件
+        self.model_list = QListWidget()
+        self.model_list.setStyleSheet("font-family: monospace;")
+        self.layout.addWidget(self.model_list)
+        
+        # 创建刷新按钮
+        self.refresh_button = QPushButton("刷新模型列表")
+        self.refresh_button.clicked.connect(self.fetch_models)
+        self.layout.addWidget(self.refresh_button)
+        
+        # 初始加载模型
+        self.fetch_models()
+    
+    def fetch_models(self):
+        """从Ollama API获取模型列表"""
+        self.model_list.clear()
+        
+        try:
+            logger.info("正在获取Ollama模型列表...")
+            response = requests.get("http://localhost:11434/api/tags", timeout=5)
+            
+            if response.status_code == 200:
+                data = response.json()
+                models = data.get("models", [])
+                
+                if models:
+                    for model in models:
+                        model_name = model.get("model", "")
+                        if model_name:
+                            self.model_list.addItem(model_name)
+                            logger.info(f"找到模型: {model_name}")
+                else:
+                    self.model_list.addItem("未找到任何模型")
+                    logger.info("未找到任何模型")
+            else:
+                self.model_list.addItem(f"API请求失败，状态码: {response.status_code}")
+                logger.error(f"API请求失败，状态码: {response.status_code}")
+                
+        except requests.exceptions.RequestException as e:
+            self.model_list.addItem("无法连接到Ollama API")
+            logger.error(f"无法连接到Ollama API: {str(e)}")
+        except json.JSONDecodeError as e:
+            self.model_list.addItem("API响应格式错误")
+            logger.error(f"API响应格式错误: {str(e)}")
+        except Exception as e:
+            self.model_list.addItem(f"发生错误: {str(e)}")
+            logger.error(f"发生未知错误: {str(e)}")
+
+if __name__ == "__main__":
+    app = QApplication(sys.argv)
+    window = OllamaModelViewer()
+    window.show()
+    sys.exit(app.exec())
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,6 @@
+requests>=2.25.1
+lxml>=4.6.3
+tqdm>=4.61.2
+loguru>=0.5.3
+zhipuai>=2.1.0
+PySide6>=6.0.0
--- a/tophub_add_data_to_db.log
+++ b/tophub_add_data_to_db.log
--- a/tophub_add_data_to_db.py
+++ b/tophub_add_data_to_db.py
@@ -0,0 +1,213 @@
+#!/usr/bin/env python3
+"""
+处理临时文件并写入数据库的脚本
+读取指定格式的临时文件，提取标题和链接，调用API进行分类，然后写入SQLite数据库
+"""
+
+import sqlite3
+import requests
+import os
+import re
+from datetime import datetime
+from tqdm import tqdm
+from loguru import logger
+import glob
+
+# 配置日志
+logger.add("tophub_add_data_to_db.log", rotation="10 MB", level="INFO")
+
+# API配置
+API_URL = "http://localhost:11434/api/generate"
+API_MODEL = "gemma3:4b"
+
+def init_database():
+    """初始化数据库，创建表结构"""
+    conn = sqlite3.connect('tophub_data.db')
+    cursor = conn.cursor()
+    
+    cursor.execute('''
+        CREATE TABLE IF NOT EXISTS articles (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            title TEXT NOT NULL,
+            url TEXT NOT NULL,
+            category TEXT,
+            source_date TEXT NOT NULL,
+            created_at TEXT NOT NULL,
+            UNIQUE(title, source_date)
+        )
+    ''')
+    
+    conn.commit()
+    conn.close()
+    logger.info("数据库初始化完成")
+
+def find_temp_files():
+    """查找符合格式的临时文件"""
+    pattern = "*年*月*日*.txt"
+    files = glob.glob(pattern)
+    logger.info(f"找到 {len(files)} 个临时文件: {files}")
+    return files
+
+def parse_file_content(file_path):
+    """解析文件内容，按5行一个循环提取数据"""
+    articles = []
+    
+    try:
+        with open(file_path, 'r', encoding='utf-8') as f:
+            lines = f.readlines()
+        
+        # 按5行一组进行解析
+        for i in range(0, len(lines), 5):
+            if i + 4 < len(lines):
+                node_id = lines[i].strip()
+                category = lines[i+1].strip()
+                title = lines[i+2].strip()
+                url = lines[i+3].strip()
+                separator = lines[i+4].strip() if i+4 < len(lines) else ""
+                
+                # 提取关键信息
+                title_match = re.search(r'标题: (.+)', title)
+                url_match = re.search(r'链接: (.+)', url)
+                
+                if title_match and url_match:
+                    articles.append({
+                        'title': title_match.group(1),
+                        'url': url_match.group(1),
+                        'category': category.split(': ')[1] if ': ' in category else '未知'
+                    })
+        
+        logger.info(f"从文件 {file_path} 解析出 {len(articles)} 条数据")
+        return articles
+        
+    except Exception as e:
+        logger.error(f"解析文件 {file_path} 失败: {e}")
+        return []
+
+def check_duplicate(title, date_str):
+    """检查标题+日期是否已存在"""
+    conn = sqlite3.connect('tophub_data.db')
+    cursor = conn.cursor()
+    
+    cursor.execute('''
+        SELECT COUNT(*) FROM articles 
+        WHERE title = ? AND source_date = ?
+    ''', (title, date_str))
+    
+    count = cursor.fetchone()[0]
+    conn.close()
+    
+    return count > 0
+
+def classify_title(title):
+    """调用API对标题进行分类"""
+    try:
+        prompt = f"目标：对以下文字内容进行分类，返回结果为类别，如\"社会新闻\"，\"金融\"，\"历史\"，\"购物\"，\"新质科技\"等等。目的：只返回2-4个字，不返回其它内容。内容：{title}"
+        
+        data = {
+            "model": API_MODEL,
+            "prompt": prompt,
+            "stream": False
+        }
+        
+        response = requests.post(API_URL, json=data, timeout=30)
+        response.raise_for_status()
+        
+        result = response.json()
+        category = result.get('response', '').strip()
+        
+        # 验证分类结果长度
+        if len(category) < 2 or len(category) > 8:
+            category = '其他'
+            
+        logger.info(f"标题 '{title}' 分类为: {category}")
+        return category
+        
+    except Exception as e:
+        logger.error(f"API调用失败，标题 '{title}': {e}")
+        return '其他'
+
+def insert_article(title, url, category, source_date):
+    """插入文章到数据库"""
+    conn = sqlite3.connect('tophub_data.db')
+    cursor = conn.cursor()
+    
+    try:
+        created_at = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
+        cursor.execute('''
+            INSERT INTO articles (title, url, category, source_date, created_at)
+            VALUES (?, ?, ?, ?, ?)
+        ''', (title, url, category, source_date, created_at))
+        
+        conn.commit()
+        logger.info(f"成功插入文章: {title}")
+        return True
+        
+    except sqlite3.IntegrityError:
+        logger.warning(f"文章已存在，跳过: {title}")
+        return False
+        
+    except Exception as e:
+        logger.error(f"插入文章失败: {e}")
+        return False
+        
+    finally:
+        conn.close()
+
+def process_temp_files():
+    """主处理函数"""
+    logger.info("开始处理临时文件...")
+    
+    # 初始化数据库
+    init_database()
+    
+    # 查找临时文件
+    temp_files = find_temp_files()
+    
+    if not temp_files:
+        logger.warning("未找到临时文件")
+        return
+    
+    total_processed = 0
+    total_inserted = 0
+    
+    # 处理每个文件
+    for file_path in temp_files:
+        logger.info(f"处理文件: {file_path}")
+        
+        # 从文件名提取日期
+        date_match = re.search(r'(\d{4})年(\d{1,2})月(\d{1,2})日', file_path)
+        if date_match:
+            source_date = f"{date_match.group(1)}-{int(date_match.group(2)):02d}-{int(date_match.group(3)):02d}"
+        else:
+            source_date = datetime.now().strftime('%Y-%m-%d')
+        
+        # 解析文件内容
+        articles = parse_file_content(file_path)
+        
+        if not articles:
+            continue
+        
+        # 处理每篇文章
+        for article in tqdm(articles, desc=f"处理 {file_path}"):
+            total_processed += 1
+            
+            # 检查重复
+            if check_duplicate(article['title'], source_date):
+                logger.info(f"跳过重复文章: {article['title']}")
+                continue
+            
+            # 分类标题
+            category = classify_title(article['title'])
+            
+            # 插入数据库
+            if insert_article(article['title'], article['url'], category, source_date):
+                total_inserted += 1
+    
+    logger.info(f"处理完成! 总计处理: {total_processed}, 成功插入: {total_inserted}")
+
+if __name__ == "__main__":
+    try:
+        process_temp_files()
+    except Exception as e:
+        logger.error(f"程序执行失败: {e}")
+        raise
--- a/tophub_ban_column.txt
+++ b/tophub_ban_column.txt
@@ -0,0 +1,14 @@
+淘宝
+音乐
+电影
+猫眼
+IMDB
+视频
+七猫
+读书
+TapTap
+Music
+即刻
+站酷
+App
+彩票
--- a/tophub_data.db
+++ b/tophub_data.db
--- a/tophub_scraper.log
+++ b/tophub_scraper.log
--- a/tophub_scraper.py
+++ b/tophub_scraper.py
@@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+TopHub网站数据抓取脚本
+负责从tophub.today网站抓取数据，根据指定规则过滤并保存
+"""
+
+import requests
+from lxml import html
+import json
+import time
+import os
+import re
+from datetime import datetime
+from loguru import logger
+
+# 配置日志
+logger.add("tophub_scraper.log", rotation="10 MB", level="INFO")
+
+class TopHubScraper:
+    """TopHub网站数据抓取器"""
+    
+    def __init__(self):
+        """
+        初始化抓取器
+        """
+        self.base_url = "https://tophub.today/"
+        self.ban_list_file = "tophub_ban_column.txt"
+        self.session = requests.Session()
+        self.session.headers.update({
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
+        })
+        self.ban_list = self.load_ban_list()
+        
+    def load_ban_list(self):
+        """
+        加载需要过滤的栏目列表
+        
+        Returns:
+            set: 需要过滤的栏目集合
+        """
+        ban_list = set()
+        try:
+            if os.path.exists(self.ban_list_file):
+                with open(self.ban_list_file, 'r', encoding='utf-8') as f:
+                    for line in f:
+                        line = line.strip()
+                        if line:
+                            ban_list.add(line)
+                logger.info(f"已加载 {len(ban_list)} 个需要过滤的栏目")
+            else:
+                logger.warning(f"过滤文件 {self.ban_list_file} 不存在，将不过滤任何栏目")
+        except Exception as e:
+            logger.error(f"加载过滤文件失败: {e}")
+        return ban_list
+        
+    def fetch_webpage(self):
+        """
+        获取网页内容
+        
+        Returns:
+            str: 网页HTML内容
+        """
+        logger.info(f"正在获取网页内容: {self.base_url}")
+        try:
+            response = self.session.get(self.base_url, timeout=10)
+            response.raise_for_status()
+            logger.info("网页内容获取成功")
+            return response.text
+        except requests.RequestException as e:
+            logger.error(f"获取网页内容失败: {e}")
+            raise
+            
+    def scrape_by_node_ids(self):
+        """
+        根据节点ID范围抓取数据
+        
+        Returns:
+            list: 包含已抓取数据的列表
+        """
+        try:
+            # 1. 获取网页内容
+            html_content = self.fetch_webpage()
+            tree = html.fromstring(html_content)
+            
+            # 2. 创建输出文件名（基于当前日期时间）
+            now = datetime.now()
+            output_file = f"{now.year}年{now.month}月{now.day}日{now.hour}{now.minute}{now.second}.txt"
+            
+            scraped_data = []
+            
+            # 3. 遍历节点ID范围
+            for node_id in range(1, 1000):  # 从1到999
+                xpath = f'//*[@id="node-{node_id}"]'
+                logger.info(f"正在查找节点: {xpath}")
+                
+                # 查找节点
+                nodes = tree.xpath(xpath)
+                if not nodes:
+                    continue  # 没有找到节点，跳过下一个数字
+                
+                node = nodes[0]
+                
+                # 查找span标签
+                spans = node.xpath('.//span')
+                if not spans:
+                    logger.info(f"节点 {node_id} 中未找到span标签，跳过")
+                    continue
+                
+                # 获取第一个span的文本内容
+                span_text = spans[0].text_content().strip()
+                if not span_text:
+                    logger.info(f"节点 {node_id} 的span标签为空，跳过")
+                    continue
+                
+                # 检查是否在过滤列表中（部分匹配）
+                should_skip = False
+                for ban_word in self.ban_list:
+                    if ban_word in span_text:
+                        logger.info(f"节点 {node_id} 的内容 '{span_text}' 包含过滤词 '{ban_word}'，跳过")
+                        should_skip = True
+                        break
+                
+                if should_skip:
+                    continue
+                
+                logger.info(f"节点 {node_id} 的内容 '{span_text}' 通过过滤，继续处理")
+                
+                # 查找a元素
+                links = node.xpath('.//a')
+                if not links:
+                    logger.info(f"节点 {node_id} 中未找到a元素，跳过")
+                    continue
+                
+                # 提取所有链接和文本
+                for link in links:
+                    link_text = link.text_content().strip()
+                    href = link.get('href', '')
+                    
+                    if link_text and href:
+                        # 补全相对链接
+                        if not href.startswith('http'):
+                            href = f"https://tophub.today{href}"
+                        
+                        # 当category和text的值相同时，跳过当前循环
+                        if span_text == link_text:
+                            logger.info(f"节点 {node_id} 的分类和标题相同 ({span_text})，跳过")
+                            continue
+                        
+                        scraped_data.append({
+                            'node_id': node_id,
+                            'category': span_text,
+                            'text': link_text,
+                            'link': href
+                        })
+            
+            # 4. 保存数据到文件
+            if scraped_data:
+                self.save_to_file(scraped_data, output_file)
+                logger.info(f"成功抓取 {len(scraped_data)} 条数据，保存到 {output_file}")
+            else:
+                logger.warning("未抓取到任何数据")
+            
+            return scraped_data
+            
+        except Exception as e:
+            logger.error(f"抓取数据时出错: {e}")
+            raise
+    
+    def save_to_file(self, data, filename):
+        """
+        将数据保存到文件
+        
+        Args:
+            data (list): 要保存的数据
+            filename (str): 文件名
+        """
+        try:
+            with open(filename, 'w', encoding='utf-8') as f:
+                for item in data:
+                    f.write(f"节点ID: {item['node_id']}\n")
+                    f.write(f"分类: {item['category']}\n")
+                    
+                    # 使用正则表达式清洗标题，去除数字序号和多余空白
+                    title_text = item['text']
+                    # 处理多行标题，提取实际内容
+                    lines = title_text.strip().split('\n')
+                    if len(lines) >= 2:
+                        # 第二行通常是实际标题内容
+                        cleaned_title = lines[1].strip()
+                    else:
+                        # 如果只有一行，尝试使用正则表达式
+                        match = re.match(r'^\d+\s+(.+)$', title_text.strip(), re.DOTALL)
+                        if match:
+                            cleaned_title = match.group(1).strip()
+                        else:
+                            cleaned_title = title_text.strip()
+                    
+                    f.write(f"标题: {cleaned_title}\n")
+                    f.write(f"链接: {item['link']}\n")
+                    f.write("-" * 50 + "\n")
+            logger.info(f"数据已保存到 {filename}")
+        except Exception as e:
+            logger.error(f"保存文件失败: {e}")
+            raise
+            
+if __name__ == "__main__":
+    scraper = TopHubScraper()
+    scraper.scrape_by_node_ids()
--- a/右键菜单功能说明.md
+++ b/右键菜单功能说明.md
@@ -0,0 +1,155 @@
+# 右键菜单功能说明
+
+## 功能概述
+
+TopHub数据查看器的右键菜单功能允许用户通过右键点击表格中的项目，快速执行常用操作，提高操作效率。
+
+## 新增功能
+
+### 1. 标记为感兴趣
+- **功能描述**：将选中的项目标记为感兴趣状态
+- **数据库操作**：将对应记录的`is_interested`字段设置为1
+- **界面显示**：在"感兴趣"列显示为"是"，使用绿色粗体字体
+
+### 2. 标记为不感兴趣
+- **功能描述**：将选中的项目标记为不感兴趣状态
+- **数据库操作**：将对应记录的`is_interested`字段设置为0
+- **界面显示**：在"感兴趣"列显示为"否"，使用普通字体和颜色
+
+### 3. 删除选中项
+- **功能描述**：删除选中的项目
+- **数据库操作**：从数据库中删除对应记录
+- **界面显示**：从表格中移除对应行
+
+## 使用方法
+
+1. 打开TopHub数据查看器
+2. 在表格中右键点击任意项目
+3. 在弹出的右键菜单中选择所需操作：
+   - 点击"标记为感兴趣"将项目标记为感兴趣
+   - 点击"标记为不感兴趣"将项目标记为不感兴趣
+   - 点击"删除选中项"删除选中的项目
+
+## 技术实现
+
+### 右键菜单实现
+```python
+# 启用右键菜单
+self.table.setContextMenuPolicy(Qt.CustomContextMenu)
+self.table.customContextMenuRequested.connect(self.show_context_menu)
+
+def show_context_menu(self, position):
+    """显示右键菜单"""
+    # 获取点击位置的行
+    row = self.table.rowAt(position.y())
+    if row < 0:
+        return
+        
+    # 选中该行
+    self.table.selectRow(row)
+    
+    # 创建右键菜单
+    menu = QMenu(self)
+    
+    # 添加"标记为感兴趣"动作
+    mark_action = QAction("标记为感兴趣", self)
+    mark_action.triggered.connect(self.mark_as_interested)
+    menu.addAction(mark_action)
+    
+    # 添加"标记为不感兴趣"动作
+    unmark_action = QAction("标记为不感兴趣", self)
+    unmark_action.triggered.connect(self.mark_as_not_interested)
+    menu.addAction(unmark_action)
+    
+    # 添加分隔线
+    menu.addSeparator()
+    
+    # 添加"删除"动作
+    delete_action = QAction("删除选中项", self)
+    delete_action.triggered.connect(self.delete_selected_items)
+    menu.addAction(delete_action)
+    
+    # 显示菜单
+    menu.exec_(self.table.mapToGlobal(position))
+```
+
+### 标记为不感兴趣方法实现
+```python
+def mark_as_not_interested(self):
+    """将选中的项目标记为不感兴趣"""
+    # 获取选中的行
+    selected_rows = set()
+    for item in self.table.selectedItems():
+        selected_rows.add(item.row())
+        
+    # 如果没有选中的行，直接返回
+    if not selected_rows:
+        QMessageBox.information(self, "提示", "请先选中要标记的行")
+        return
+        
+    # 弹出确认对话框
+    reply = QMessageBox.question(
+        self, 
+        "确认标记", 
+        f"确定要将选中的 {len(selected_rows)} 行标记为不感兴趣吗？",
+        QMessageBox.Yes | QMessageBox.No,
+        QMessageBox.Yes
+    )
+    
+    if reply == QMessageBox.No:
+        return
+        
+    try:
+        # 连接数据库
+        conn = sqlite3.connect(self.db_path)
+        cursor = conn.cursor()
+        
+        # 更新选中的行
+        updated_count = 0
+        for row in selected_rows:
+            # 获取ID
+            id_item = self.table.item(row, 0)
+            if id_item:
+                article_id = id_item.text()
+                # 更新数据库中的is_interested字段
+                cursor.execute("UPDATE articles SET is_interested = 0 WHERE id = ?", (article_id,))
+                
+                # 更新表格中的显示
+                interested_item = QTableWidgetItem("否")
+                # 不感兴趣项使用普通字体和颜色
+                self.table.setItem(row, 5, interested_item)
+                
+                updated_count += 1
+                
+        # 提交更改
+        conn.commit()
+        conn.close()
+        
+        # 更新状态栏
+        self.status_bar.showMessage(f"已标记 {updated_count} 行为不感兴趣")
+        
+    except sqlite3.Error as e:
+        logger.error(f"标记数据时出错: {str(e)}")
+        QMessageBox.critical(self, "数据库错误", f"标记数据时出错: {str(e)}")
+        self.status_bar.showMessage("标记失败")
+    except Exception as e:
+        logger.error(f"标记数据时出错: {str(e)}")
+        QMessageBox.critical(self, "错误", f"标记数据时出错: {str(e)}")
+        self.status_bar.showMessage("标记失败")
+```
+
+## 测试
+
+测试脚本`test_mark_not_interested.py`验证了"标记为不感兴趣"功能的正确性。测试结果显示功能正常工作，能够正确地将项目标记为不感兴趣，并更新数据库和界面显示。
+
+## 注意事项
+
+1. 右键菜单操作前必须先选中要操作的项目
+2. 删除操作不可撤销，请谨慎使用
+3. 标记操作会直接更新数据库，确保操作前已确认选择
+4. 批量操作时，所有选中的项目都会被同时处理
+
+## 更新记录
+
+- 2023-11-07：添加"标记为不感兴趣"功能到右键菜单
+- 2023-11-07：完成功能测试和文档编写
--- a/数据库字段添加总结.md
+++ b/数据库字段添加总结.md
@@ -0,0 +1,44 @@
+# 数据库字段添加总结
+
+## 任务概述
+为TopHub数据库查看器添加一个"感兴趣"字段，允许用户标记感兴趣的文章。
+
+## 实施步骤
+
+### 1. 数据库结构修改
+- 创建了`add_interested_field.py`脚本，用于向`articles`表添加`is_interested`字段
+- 字段类型：INTEGER，默认值：0
+- 脚本包含字段存在性检查、添加逻辑和验证功能
+
+### 2. 数据库验证
+- 创建了`check_db_structure.py`脚本，用于检查数据库结构
+- 创建了`test_interested_field.py`脚本，用于验证字段功能
+- 创建了`show_data_with_interested.py`脚本，用于显示包含感兴趣状态的记录
+
+### 3. GUI界面修改
+- 修改了`db_viewer.py`文件，添加了以下功能：
+  - 在表格中添加"感兴趣"列，显示`is_interested`字段值
+  - 添加"标记为感兴趣"按钮，允许用户将选中的文章标记为感兴趣
+  - 更新查询语句，包含`is_interested`字段
+  - 更新筛选功能，包含感兴趣列
+
+## 测试结果
+- 数据库字段成功添加，默认值为0
+- 可以成功将记录标记为感兴趣（值为1）
+- GUI应用程序能够正常显示和操作感兴趣字段
+- 统计功能正常工作，可以显示感兴趣和不感兴趣的记录数量
+
+## 使用方法
+1. 运行`python db_viewer.py`启动应用程序
+2. 在表格中选择一条记录
+3. 点击"标记为感兴趣"按钮将记录标记为感兴趣
+4. 可以使用筛选功能查看感兴趣的记录
+5. 统计面板会显示感兴趣和不感兴趣的记录数量
+
+## 文件清单
+- `add_interested_field.py` - 添加数据库字段的脚本
+- `check_db_structure.py` - 检查数据库结构的脚本
+- `test_interested_field.py` - 测试字段功能的脚本
+- `show_data_with_interested.py` - 显示记录的命令行工具
+- `test_gui.py` - GUI测试脚本
+- `db_viewer.py` - 修改后的主应用程序
--- a/评分系统使用说明.md
+++ b/评分系统使用说明.md
@@ -0,0 +1,70 @@
+# TopHub数据查看器 - 评分系统使用说明
+
+## 概述
+
+TopHub数据查看器已从简单的"感兴趣/不感兴趣"标记系统升级为10分评分制度。新系统提供了更精细的内容评价能力，让您能够更准确地标记和管理抓取的内容。
+
+## 评分系统说明
+
+### 评分范围
+- **最低分**: 0分 (完全不感兴趣)
+- **默认分**: 5分 (中立态度)
+- **最高分**: 10分 (非常感兴趣)
+
+### 颜色编码
+为了便于快速识别内容质量，系统根据分数自动显示不同颜色：
+- **绿色加粗**: 8分及以上 (高价值内容)
+- **蓝色**: 6-7分 (中等价值内容)
+- **默认颜色**: 4-5分 (一般内容)
+- **红色**: 3分及以下 (低价值内容)
+
+## 使用方法
+
+### 增加评分
+1. 在表格中选择一行或多行
+2. 右键点击选中的行
+3. 从菜单中选择"增加评分(+1)"
+4. 系统会将选中项的评分增加1分，最高不超过10分
+
+### 减少评分
+1. 在表格中选择一行或多行
+2. 右键点击选中的行
+3. 从菜单中选择"减少评分(-1)"
+4. 系统会将选中项的评分减少1分，最低不低于0分
+
+### 批量操作
+- 可以同时选择多行进行批量评分调整
+- 使用"按关键字选中"功能可以快速选择包含特定关键词的行
+- 然后通过右键菜单进行批量评分调整
+
+## 数据迁移
+
+原有的"感兴趣/不感兴趣"数据已自动转换为新的评分系统：
+- 原标记为"感兴趣"的项目已转换为7分
+- 原标记为"不感兴趣"的项目已转换为5分(默认值)
+
+## 技术细节
+
+### 数据库结构
+- 新增了`score`字段(INTEGER类型)替代原来的`is_interested`字段
+- `score`字段默认值为5，范围限制为0-10
+
+### 界面更新
+- 表格中的"感兴趣"列已更新为"评分"列，显示具体分数
+- 右键菜单已更新为"增加评分(+1)"和"减少评分(-1)"选项
+- 根据分数自动应用颜色编码，便于快速识别
+
+## 常见问题
+
+**Q: 为什么默认分数是5分而不是0分？**
+A: 5分代表中立态度，更符合日常评分习惯。0分通常用于表示完全不相关或质量极差的内容。
+
+**Q: 如何快速找到高评分内容？**
+A: 高评分内容(8分及以上)会以绿色加粗显示，非常醒目。您也可以使用排序功能按评分列排序。
+
+**Q: 可以直接设置任意分数吗？**
+A: 当前版本只支持通过+1/-1的方式调整分数，这样可以保持评分的一致性和可追溯性。
+
+---
+
+如有其他问题或建议，请随时反馈。