0
我有以下代碼:文本提取線分割與Python
f = open('./dat.txt', 'r')
array = []
for line in f:
# if "1\t\"Overall evaluation" in line:
# words = line.split("1\t\"Overall evaluation")
# print words[0]
number = int(line.split(':')[1].strip('"\n'))
print number
這是能夠從我的數據,它看起來像這樣抓住了最後的int:
299 1 "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 2
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 2
Market opportunity and timing (1): 4
Market opportunity and timing (2): 4
Triple bottom line impact (1): 4
Triple bottom line impact (2): 2
Triple bottom line impact (3): 2
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 3
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 3"
299 2 "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 3
Strength or novelty of the idea (2): 2
Strength or novelty of the idea (3): 4
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 3
""Open by default"" (2): 2
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 4
Market opportunity and timing (2): 3
Triple bottom line impact (1): 3
Triple bottom line impact (2): 2
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 4
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 4
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 2"
364 1 "Overall evaluation: 3
Invite to interview: 3
...
我還需要抓取「記錄標識符」,在上面的例子中,前兩個實例爲299
,然後364
爲下一個實例。
上面的註釋掉的代碼,如果我刪除的最後幾行,只是使用它,像這樣:
f = open('./dat.txt', 'r')
array = []
for line in f:
if "1\t\"Overall evaluation" in line:
words = line.split("1\t\"Overall evaluation")
print words[0]
# number = int(line.split(':')[1].strip('"\n'))
# print number
可以抓住的記錄標識。
但我很難把兩者放在一起。
理想的情況是我想要的是類似如下:
368
=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2
=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2
等的所有記錄。
我該如何結合上述兩個腳本組件來實現?
你看起來像一個有經驗的用戶,應該知道_that_不是用Python處理數據的方式。相反,我建議你處理字典。 –
看起來可能是騙人的。你什麼意思? –
我的意思是,該dat.txt文件不是以有利的方式爲您解析它。你應該試着讓它(比如說,從哪裏得到)適當地構造,比如作爲字典,所以你唯一需要做的就是傳遞你想要的密鑰(記錄標識符,你稱它爲) –