Python杂记

基础

创建时间:2025-02-26 21:16

阅读:

解决爬取的内容乱码
 匹配多行内容
 selenium多标签页切换
 使用 set_page_load_timeout 设置页面加载超时
 f.truncate(0)清空日志
 os.path.join 路径拼接
 Counter 统计列表某个数出现的次数

解决爬取的内容乱码

response.content.decode('utf-8')

匹配多行内容

result = """<body>
  <div class="content">
      <div>
           <h1>xxxxxx</h1>
           <h1>xxxxxx</h1>
    </div> 
</body>"""

# 由于 . 默认不匹配换行符，<body>.*</body> 无法匹配到完整的内容。
# 使用 re.DOTALL 标志，让 . 匹配换行符
content = re.findall('<body>.*</body>', result, re.DOTALL)

多标签页切换

from selenium import webdriver
 
# 初始化WebDriver
driver = webdriver.Chrome()
 
# 打开第一个标签页
driver.get("http://www.example1.com")
 
# 打开第二个标签页
driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[-1])  # 切换到新打开的标签页
driver.get("http://www.example2.com")
  
# 关闭浏览器（可选）
# driver.quit()

使用 set_page_load_timeout 设置页面加载超时

#  使用 set_page_load_timeout 设置页面加载超时，Selenium 提供了 set_page_load_timeout 方法，
   可以设置页面加载的最大时间。如果页面在指定时间内没有加载完成，会抛出 TimeoutException 异常。


示例代码：
from selenium import webdriver
from selenium.common.exceptions import TimeoutException

# 初始化浏览器
driver = webdriver.Chrome()

# 设置页面加载超时时间为 10 秒
driver.set_page_load_timeout(10)

try:
    # 打开网页
    driver.get("https://example.com")
    print("页面加载成功！")
except TimeoutException:
    print("页面加载超时，执行下一步操作...")

# 关闭浏览器
driver.quit()

f.truncate(0)清空日志

清空nginx日志内容
with open('access.log', 'w+') as f:
    f.truncate(0)

将文件内容清空，文件大小变为 0。
相当于删除文件中的所有内容，但保留文件本身。

os.path.join 路径拼接

# 使用 os.path.join 进行路径拼接
import os

file_path = os.path.join("data", "test", "a.txt")
print(file_path)
# data\test\a.txt

# 拼接字符串列表 （不推荐使用这个方式进行拼接路径，因为没有处理操作系统差异））
words = ['Hello', 'world', '!']
sentence = ' '.join(words)
print(sentence)  # 输出: Hello world !

Counter 统计列表某个数出现的次数

# 统计列表某个数出现的次数
from collections import Counter
list1=[1,2,1,1]
print( Counter(list1))

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。