第17章：进阶实战

实战5：简单网络爬虫

17.1 需求分析

实现一个简单的网络爬虫，具有以下功能：

使用requests模块爬取简单网页（如小说章节、新闻标题）
将爬取的内容保存到文本文件
处理网络请求异常

17.2 核心实现

requests模块：使用requests模块发送HTTP请求
文件操作：使用open()函数和with语句写入文件
字符串处理：使用字符串方法提取和处理内容
异常处理：捕获网络请求异常、文件操作异常等

17.3 实操

python

"""
简单网络爬虫
功能：使用requests模块爬取网页内容，保存到文本文件
"""

import requests

def crawl_website(url, output_file):
    """爬取网页内容并保存到文件"""
    try:
        # 发送HTTP请求
        print(f"正在爬取：{url}")
        response = requests.get(url, timeout=10)
        response.raise_for_status()  # 检查请求是否成功
        
        # 提取内容（这里以简单的HTML页面为例，实际爬取需要根据页面结构调整）
        content = response.text
        
        # 简单处理：提取<title>标签内容作为标题
        title_start = content.find('<title>')
        title_end = content.find('</title>')
        if title_start != -1 and title_end != -1:
            title = content[title_start+7:title_end].strip()
        else:
            title = "网页内容"
        
        # 生成保存内容
        save_content = f"# {title}\n\n"
        save_content += f"爬取时间：{response.headers.get('Date', '未知')}\n"
        save_content += f"状态码：{response.status_code}\n\n"
        save_content += "## 页面内容\n\n"
        save_content += content[:5000]  # 只保存前5000字符，避免文件过大
        
        # 写入文件
        with open(output_file, "w", encoding="utf-8") as f:
            f.write(save_content)
        
        print(f"爬取完成！内容已保存到 {output_file}")
        print(f"状态码：{response.status_code}")
        print(f"标题：{title}")
        
    except requests.exceptions.RequestException as e:
        print(f"网络请求错误：{e}")
    except Exception as e:
        print(f"错误：{e}")

# 测试
if __name__ == "__main__":
    # 替换为你要爬取的网页URL
    url = "https://www.example.com"
    output_file = "crawled_content.txt"
    crawl_website(url, output_file)

运行结果

正在爬取：https://www.example.com
爬取完成！内容已保存到 crawled_content.txt
状态码：200
标题：Example Domain

crawled_content.txt文件内容：

# Example Domain

爬取时间：Thu, 03 Apr 2026 03:12:34 GMT
状态码：200

## 页面内容

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>

实战6：自动化办公

17.4 需求分析

实现一个自动化办公程序，具有以下功能：

使用pandas模块读取Excel文件
进行简单数据清洗（去除空值、重复数据）
将处理后的数据保存为新的Excel文件

17.5 核心实现

pandas模块：使用pandas模块读取和处理Excel数据
数据处理：使用pandas的方法进行数据清洗
文件操作：使用pandas的to_excel方法保存数据

17.6 实操

python

"""
自动化办公 - Excel数据处理
功能：读取Excel文件，进行数据清洗，保存为新文件
"""

import pandas as pd

def process_excel(input_file, output_file):
    """处理Excel文件"""
    try:
        # 读取Excel文件
        print(f"正在读取：{input_file}")
        df = pd.read_excel(input_file)
        
        # 显示原始数据信息
        print(f"原始数据形状：{df.shape}")
        print("原始数据前5行：")
        print(df.head())
        
        # 数据清洗
        print("\n正在进行数据清洗...")
        
        # 去除空值
        before_dropna = df.shape[0]
        df = df.dropna()
        after_dropna = df.shape[0]
        print(f"去除空值：{before_dropna} → {after_dropna}（删除了{before_dropna-after_dropna}行）")
        
        # 去除重复数据
        before_dropduplicates = df.shape[0]
        df = df.drop_duplicates()
        after_dropduplicates = df.shape[0]
        print(f"去除重复数据：{before_dropduplicates} → {after_dropduplicates}（删除了{before_dropduplicates-after_dropduplicates}行）")
        
        # 保存处理后的数据
        df.to_excel(output_file, index=False)
        print(f"\n处理完成！数据已保存到 {output_file}")
        print(f"处理后数据形状：{df.shape}")
        print("处理后数据前5行：")
        print(df.head())
        
    except FileNotFoundError:
        print(f"错误：文件 {input_file} 不存在")
    except Exception as e:
        print(f"错误：{e}")

# 测试
if __name__ == "__main__":
    # 替换为实际的Excel文件路径
    input_file = "data.xlsx"  # 包含原始数据的Excel文件
    output_file = "processed_data.xlsx"  # 保存处理后数据的文件
    process_excel(input_file, output_file)

运行结果

假设data.xlsx文件包含以下数据：

姓名	年龄	成绩
张三	18	95
李四	19	88
王五		92
张三	18	95
赵六	20	85

运行结果：

正在读取：data.xlsx
原始数据形状：(5, 3)
原始数据前5行：
   姓名    年龄    成绩
0  张三  18.0  95.0
1  李四  19.0  88.0
2  王五   NaN  92.0
3  张三  18.0  95.0
4  赵六  20.0  85.0

正在进行数据清洗...
去除空值：5 → 4（删除了1行）
去除重复数据：4 → 3（删除了1行）

处理完成！数据已保存到 processed_data.xlsx
处理后数据形状：(3, 3)
处理后数据前5行：
   姓名    年龄    成绩
0  张三  18.0  95.0
1  李四  19.0  88.0
4  赵六  20.0  85.0

实战7：简单GUI界面开发

17.7 需求分析

实现一个简单的GUI登录界面，具有以下功能：

使用tkinter模块（内置GUI模块）
包含用户名、密码输入框
包含登录按钮、重置按钮
实现登录验证功能

17.8 核心实现

tkinter模块：使用tkinter模块创建GUI界面
组件布局：使用pack()或grid()方法布局组件
按钮事件绑定：使用command参数绑定按钮点击事件
条件判断：验证用户名和密码是否正确

17.9 实操

python

"""
简单GUI登录界面
功能：使用tkinter创建登录界面，实现登录验证
"""

import tkinter as tk
from tkinter import messagebox

def login():
    """登录验证"""
    username = entry_username.get()
    password = entry_password.get()
    
    # 简单的登录验证（实际应用中应该连接数据库或其他验证方式）
    if username == "admin" and password == "123456":
        messagebox.showinfo("登录成功", "欢迎回来，管理员！")
    else:
        messagebox.showerror("登录失败", "用户名或密码错误")

def reset():
    """重置输入"""
    entry_username.delete(0, tk.END)
    entry_password.delete(0, tk.END)
    entry_username.focus_set()

def create_login_window():
    """创建登录窗口"""
    # 创建主窗口
    window = tk.Tk()
    window.title("登录界面")
    window.geometry("400x300")
    window.resizable(False, False)
    
    # 创建标题
    title_label = tk.Label(window, text="用户登录", font=("Arial", 20))
    title_label.pack(pady=30)
    
    # 创建用户名标签和输入框
    frame_username = tk.Frame(window)
    frame_username.pack(pady=10)
    label_username = tk.Label(frame_username, text="用户名：", font=("Arial", 12))
    label_username.pack(side=tk.LEFT, padx=10)
    global entry_username
    entry_username = tk.Entry(frame_username, font=("Arial", 12), width=20)
    entry_username.pack(side=tk.LEFT)
    
    # 创建密码标签和输入框
    frame_password = tk.Frame(window)
    frame_password.pack(pady=10)
    label_password = tk.Label(frame_password, text="密码：", font=("Arial", 12))
    label_password.pack(side=tk.LEFT, padx=10)
    global entry_password
    entry_password = tk.Entry(frame_password, font=("Arial", 12), width=20, show="*")
    entry_password.pack(side=tk.LEFT)
    
    # 创建按钮框架
    frame_buttons = tk.Frame(window)
    frame_buttons.pack(pady=30)
    
    # 创建登录按钮
    button_login = tk.Button(frame_buttons, text="登录", font=("Arial", 12), width=10, command=login)
    button_login.pack(side=tk.LEFT, padx=10)
    
    # 创建重置按钮
    button_reset = tk.Button(frame_buttons, text="重置", font=("Arial", 12), width=10, command=reset)
    button_reset.pack(side=tk.LEFT, padx=10)
    
    # 设置用户名输入框获得焦点
    entry_username.focus_set()
    
    # 运行主循环
    window.mainloop()

# 运行登录界面
if __name__ == "__main__":
    create_login_window()

运行结果

运行程序后，会弹出一个登录界面，包含：

标题"用户登录"
用户名输入框
密码输入框（输入时显示为*）
登录按钮
重置按钮

当输入正确的用户名（admin）和密码（123456）时，会弹出"登录成功"的提示框。当输入错误时，会弹出"登录失败"的提示框。点击重置按钮会清空输入框。

第17章：进阶实战 ​

实战5：简单网络爬虫 ​

17.1 需求分析 ​

17.2 核心实现 ​

17.3 实操 ​

运行结果 ​

实战6：自动化办公 ​

17.4 需求分析 ​

17.5 核心实现 ​

17.6 实操 ​

运行结果 ​

实战7：简单GUI界面开发 ​

17.7 需求分析 ​

17.8 核心实现 ​

17.9 实操 ​

运行结果 ​

第17章：进阶实战

实战5：简单网络爬虫

17.1 需求分析

17.2 核心实现

17.3 实操

运行结果

实战6：自动化办公

17.4 需求分析

17.5 核心实现

17.6 实操

运行结果

实战7：简单GUI界面开发

17.7 需求分析

17.8 核心实现

17.9 实操

运行结果