2021年3月12日 星期五

簡單爬蟲

assets.txt
<div class="table">
	<div class="trt">
		<div class="td td0 titles">INC</div>
		<div class="td td1 titles">FA NO</div>
		<div class="td td2 titles">DESCRP</div>
		<div class="td td3 titles">SPEC</div>
		<div class="td td4 titles pc">AMT</div>
		<div class="td td5 titles pc">ACCEPT DT</div>
		<div class="td td6 titles pc">TYPE</div>
		<div class="td td7 titles pc">ITEM</div>
		<div class="td td8 titles pc">MANUF</div>
		<div class="td td9 titles pc">MANUF PN</div>
		<div class="td td10 titles pc">MANUF SPEC</div>
	</div>






















<div class="tr tr1"><div class="td td0 top">KPTW</div><div class="td td1 top">2142000000D</div><div class="td td2 top">伺服器設備</div><div class="td td3 top">INTEL XEON E5 8 CORE 8G RAM * 4 300G HDD</div><div class="td td4 pc top">460000</div><div class="td td5 pc top">2016-03-29</div><div class="td td6 pc top">SERVER COMPUTER</div><div class="td td7 pc top">2142 RGEH14A0002</div><div class="td td8 pc top">GENESIS TECH 84981323</div><div class="td td9 pc top">HPML350-ROB01</div><div class="td td10 pc top">INTEL XEON E5 8 CORE 8G RAM * 4 300G HDD</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">21420000001</div><div class="td td2">伺服器設備</div><div class="td td3">INTEL XEON E3 QUAD CORE 8G RAM * 2 2TB HDD#(KPTW/2142000000B)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-12</div><div class="td td6 pc">SERVER COMPUTER</div><div class="td td7 pc">2142 RDEL13A0007</div><div class="td td8 pc">DELL 27580987</div><div class="td td9 pc">POWEREDGE T110 II</div><div class="td td10 pc">INTEL XEON E3 QUAD CORE 8G RAM * 2 2TB HDD</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G4B100003B</div><div class="td td2">一般家電:電話</div><div class="td td3">CABLE IP PHONE SPEAKERPHONE YEALINK</div><div class="td td4 pc">1900</div><div class="td td5 pc">2019-07-31</div><div class="td td6 pc">TELEPHONE</div><div class="td td7 pc">2G4B1 G4BTB000053</div><div class="td td8 pc">YEALINK 16740369</div><div class="td td9 pc">SIP-T19P E2</div><div class="td td10 pc">CABLE IP PHONE SPEAKERPHONE YEALINK</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1EA000003</div><div class="td td2">周邊設備:電腦主板</div><div class="td td3">30.5CM*21.8CM INTEL CORE I7/I5/I3 INTEL SOCKET 1150 32GB IN#(KPTW/2G1EA000005)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">COMPUTER MOTHERBOARD</div><div class="td td7 pc">2G1EA G1EYL000000</div><div class="td td8 pc">ANONYMOUS 27904640</div><div class="td td9 pc">MODEL : Z87-A</div><div class="td td10 pc">30.5CM*21.8CM INTEL CORE I7/I5/I3 INTEL SOCKET 1150 32GB INTEL Z87</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G4B100001C</div><div class="td td2">一般家電:電話</div><div class="td td3">CABLE GENERAL NO SPEAKERPHONE SWEETONE#(KPTW/2G4B100007N)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">TELEPHONE</div><div class="td td7 pc">2G4B1 G4B10000000</div><div class="td td8 pc">SWEETONE 27580987</div><div class="td td9 pc">RS-802HF</div><div class="td td10 pc">CABLE GENERAL NO SPEAKERPHONE SWEETONE</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G4B100001J</div><div class="td td2">一般家電:電話</div><div class="td td3">CABLE GENERAL NO SPEAKERPHONE SWEETONE#(KPTW/2G4B100008G)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-11-27</div><div class="td td6 pc">TELEPHONE</div><div class="td td7 pc">2G4B1 G4B10000000</div><div class="td td8 pc">SWEETONE 27580987</div><div class="td td9 pc">RS-802HF</div><div class="td td10 pc">CABLE GENERAL NO SPEAKERPHONE SWEETONE</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">52355000001</div><div class="td td2">自製設備材料:燒錄器轉接插座</div><div class="td td3">LABTOOL-48UXP#(KPTW/52355000029)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">-</div><div class="td td7 pc">AG7EB </div><div class="td td8 pc"> 27580987</div><div class="td td9 pc">-</div><div class="td td10 pc">-</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2141000001C</div><div class="td td2">個人電腦組合設備</div><div class="td td3">INTEL DUAL CORE 4G RAM 500G HDD#(KPTW/2141000008V)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">PERSONAL COMPUTER</div><div class="td td7 pc">2141 PHPU1100014</div><div class="td td8 pc">HP 27580987</div><div class="td td9 pc">PHPU1100014</div><div class="td td10 pc">INTEL DUAL CORE I3 4G RAM 500G HDD SDC EN2101G../HP PRO 3330MT (WIN7 PRO)</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">21419000007</div><div class="td td2">平板電腦</div><div class="td td3">OTHER QUAD CORE*2 3G RAM 16G ROM#(KPTW/21419000023)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">TPC</div><div class="td td7 pc">21419 TSAM64B0001</div><div class="td td8 pc">SAMSUNG 27580987</div><div class="td td9 pc">SAMSUNG GALAXY TAB S 10.5 4G LTE</div><div class="td td10 pc">OTHER QUAD CORE*2 3G RAM 16G ROM</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">25214000001</div><div class="td td2">儲存:編程\調試器</div><div class="td td3">EPI MAJICⅢ-LT DEBUGGER  除錯器#(KPTW/25214000011)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">-</div><div class="td td7 pc">2G1B9 </div><div class="td td8 pc"> 27580987</div><div class="td td9 pc">-</div><div class="td td10 pc">-</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2AG1D800001</div><div class="td td2">周邊配件:介面卡</div><div class="td td3">HD CAPTURE CARD SDK II(C729) 1080P#(KPTW/2AG1D800001)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">INTERFACE CARD</div><div class="td td7 pc">AG1D8 G1DT0000087</div><div class="td td8 pc">AVERMEDIA 27580987</div><div class="td td9 pc">*</div><div class="td td10 pc">HD CAPTURE CARD SDK II(C729) 1080P</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1A100001G</div><div class="td td2">螢幕:液晶顯示器</div><div class="td td3">24IN 1920*1080 250CD/M2 100000000:1(ACM)ASCR 4MS#(KPTW/2G1A100005I)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">LCD</div><div class="td td7 pc">2G1A1 G1ATA000014</div><div class="td td8 pc">ACER 13130792</div><div class="td td9 pc">G247HYL</div><div class="td td10 pc">24IN 1920*1080 250CD/M2 100000000:1(ACM)ASCR 4MS</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1B1000001</div><div class="td td2">儲存:硬碟</div><div class="td td3">INTERNAL STORAGE 3.5INCH 1TB 7200RPM#(KPTW/2G1B1000010)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">HDD</div><div class="td td7 pc">2G1B1 G1BT0000092</div><div class="td td8 pc">WESTERN 13130792</div><div class="td td9 pc">WD10EFRX</div><div class="td td10 pc">INTERNAL STORAGE 3.5INCH 1TB 7200RPM</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1B1000003</div><div class="td td2">儲存:硬碟</div><div class="td td3">INTERNAL STORAGE 3.5INCH 1TB 7200RPM#(KPTW/2G1B1000011)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">HDD</div><div class="td td7 pc">2G1B1 G1BT0000092</div><div class="td td8 pc">WESTERN 13130792</div><div class="td td9 pc">WD10EFRX</div><div class="td td10 pc">INTERNAL STORAGE 3.5INCH 1TB 7200RPM</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1B1000004</div><div class="td td2">儲存:硬碟</div><div class="td td3">INTERNAL STORAGE 3.5INCH 1TB 7200RPM#(KPTW/2G1B1000012)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">HDD</div><div class="td td7 pc">2G1B1 G1BT0000092</div><div class="td td8 pc">WESTERN 13130792</div><div class="td td9 pc">WD10EFRX</div><div class="td td10 pc">INTERNAL STORAGE 3.5INCH 1TB 7200RPM</div></div><div class="tr"><div class="td td0">RBTW</div><div class="td td1">2G1B1000005</div><div class="td td2">儲存:硬碟</div><div class="td td3">INTERNAL STORAGE 3.5INCH 1TB 7200RPM#(KPTW/2G1B1000013)</div><div class="td td4 pc">1</div><div class="td td5 pc">2017-10-16</div><div class="td td6 pc">HDD</div><div class="td td7 pc">2G1B1 G1BT0000092</div><div class="td td8 pc">WESTERN 13130792</div><div class="td td9 pc">WD10EFRX</div><div class="td td10 pc">INTERNAL STORAGE 3.5INCH 1TB 7200RPM</div></div></div>
細看了一下資料是由td0~td10來組成的,所以就爬蟲了一下

soup.py
import requests
from bs4 import BeautifulSoup

def main():
    txt_file = open("/home/ubuntu/assets.txt", "r")
    soup = BeautifulSoup(txt_file.read(), 'html.parser') #or lxml
    target=["td0", "td1", "td2", "td3", "td4",
            "td5", "td6", "td7", "td8", "td9",
            "td10"]

    fp = open("/home/ubuntu/assets.csv", 'w', encoding='big5')
datas = [[] for y in range(len(target))] #2D array datas[0]=soup.find_all('div',
class_=target[0]) j_len=len(datas[0]) for j in range(j_len): for i in range(len(target)): datas[i]=soup.find_all('div', class_=target[i]) tmp=datas[i][j].get_text().strip('\t\r\n') #remove \t,\r,\n tmp=tmp+"," fp.write(tmp) fp.write("\n") fp.close() if __name__ == '__main__': main()

沒有留言:

張貼留言