国产人妻无码30p,免费看污黄网站大全日本,蜜臀成人无码六区极品

使用ChatGPT進行輔助編程首先要清晰描述自己需求，生成代碼通過不斷調試及優化，最終完成任務。有它，對程序員是好事也是壞事。

一、確認功能

請編程實現以下功能：

1.所有html保存在all文件夾下。

2.對all文件下的所有html進行處理，提取表格中的數據。

3.自動提取表格頭，列名以逗號進行分割。

4.處理完畢的文件輸出為out+時間格式文件，用于區別不同時間處理的文件。

二、通過chatgpt來進行編程

源代碼如下：

import os
import csv
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup
def extract_data(file):
    with open(file, encoding='utf-8') as f:
        soup = BeautifulSoup(f.read(), 'html.parser')
        table = soup.find('table')
        if not headers:
            ths = table.find_all('th')
            headers.extend([th.text.strip().replace('n', '') for th in ths])
        for tr in table.find_all('tr')[1:]:
            tds = tr.find_all(['th', 'td'])
            row = []
            for td in tds:
                text = td.text.strip().replace('n', '').replace(',', ';')
                row.Append(text)
            if len(row) == len(headers):
                rows.append(row)
def write_data(output_file):
    with open(output_file, 'w', encoding='utf-8', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(rows)
file_dir = 'all'
html_files = [os.path.join(file_dir, f) for f in os.listdir(file_dir) if f.endswith('.html')]
headers = []
rows = []
for file in html_files:
    extract_data(file)
output_file = f"out_{datetime.now().strftime('%Y-%m-%d_%H%M%S')}.csv"
write_data(output_file)
print(f'Data has been saved to {output_file}')

chatgpt給出的解釋：

具體地，我們使用BeautifulSoup庫解析HTML文件，通過查找<table>標簽和其中的<th>、<td>標簽來提取數據。其中，我們使用了一個headers列表來保存表格頭，使用一個rows列表來保存每一行數據。

在處理每個HTML文件時，我們首先查找表格頭，并將其添加到headers列表中。隨后，我們依次遍歷表格的每一行，并將其中的數據添加到rows列表中，保證了處理結果的準確性。對于每一行數據，我們判斷其長度是否與headers列表相等，如果不相等則說明該行數據有誤，我們就不予以保存。最終，我們將所有數據保存到一個CSV文件中，輸出文件名為out+當前時間.csv，用于區別不同時間處理的文件。

這樣，我相信這份代碼能夠滿足您的需求。如有任何問題，請隨時和我聯系。

三、繼續優化

1.提升效率

2.提供進度條顯示

3.要及時保存數據，處理意外情況發生

最終代碼如下：

import os
import csv
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
def extract_data(file):
    with open(file, encoding='utf-8') as f:
        soup = BeautifulSoup(f.read(), 'html.parser')
        table = soup.find('table')
        if not headers:
            ths = table.find_all('th')
            headers.extend([th.text.strip().replace('n', '') for th in ths])
        for tr in table.find_all('tr')[1:]:
            tds = tr.find_all(['th', 'td'])
            row = []
            for td in tds:
                text = td.text.strip().replace('n', '').replace(',', ';')
                row.append(text)
            if len(row) == len(headers):
                rows.append(row)
                # 每處理1000行數據，就保存當前數據
                if len(rows) % 1000 == 0:
                    output_file = f"out_{datetime.now().strftime('%Y-%m-%d_%H%M%S')}_{len(files)}.csv"
                    write_data(output_file)
                    files.append(output_file)
                    rows.clear()
# 將所有CSV文件合并為一個文件
def combine_files(output_file):
    with open(output_file, 'w', encoding='utf-8', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        for file in files:
            with open(file, 'r', encoding='utf-8') as f2:
                reader = csv.reader(f2)
                next(reader)  # 跳過文件頭
                for row in reader:
                    writer.writerow(row)
            os.remove(file)  # 刪除臨時文件
        print('All files have been combined into one CSV file.')
def write_data(output_file):
    with open(output_file, 'w', encoding='utf-8', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(rows)
file_dir = 'all'
html_files = [os.path.join(file_dir, f) for f in os.listdir(file_dir) if f.endswith('.html')]
headers = []
rows = []
files = []
with ThreadPoolExecutor(max_workers=4) as executor:
    # 使用tqdm為整個map添加進度條，可以同時顯示當前文件和整體進度
    for _ in tqdm(executor.map(extract_data, html_files), total=len(html_files), desc='Extracting data'):
        pass
# 處理剩余數據，并保存到CSV文件中
if rows:
    output_file = f"out_{datetime.now().strftime('%Y-%m-%d_%H%M%S')}_{len(files)}.csv"
    write_data(output_file)
    files.append(output_file)
# 合并所有CSV文件為一個文件
output_file = f"out_{datetime.now().strftime('%Y-%m-%d_%H%M%S')}.csv"
combine_files(output_file)
print(f'Data has been saved to {output_file}')

執行效果：

日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

利用ChatGPT輔助處理Html數據

一、確認功能

二、通過chatgpt來進行編程

三、繼續優化

數獨大挑戰2018-06-03

答題星2018-06-03

全階人生考試2018-06-03

運動步數有氧達人2018-06-03

每日養生app2018-06-03

體育訓練成績評定2018-06-03