填充python矩陣

我正在從python中的文本文件中拆分單詞。我收到了包含索引的行（c）和字典（word_positions）的數量。然後我創建一個零矩陣（c，index）。下面是代碼：填充python矩陣

from collections import defaultdict 
import re 
import numpy as np 

c=0 

f = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r') 

for line in f: 
    c = c + 1 

word_positions = {} 

with open('/Users/Half_Pint_Boy/Desktop/sentenses.txt', 'r') as f: 
    index = 0 
    for word in re.findall(r'[a-z]+', f.read().lower()): 
     if word not in word_positions: 
      word_positions[word] = index 
      index += 1 
print(word_positions) 

matrix=np.zeros(c,index)

我的問題：我如何填充矩陣能夠得到這樣的：matrix[c,index] = count，其中c - 是行號，index -the索引位置和count -The數連續計數單詞

來源

2016-07-25 HalfPintBoy

目前還不清楚是什麼你正在嘗試做的。你能添加更多的解釋/一個簡單的例子嗎？ – Amoss

如果你有一個行（字符串格式）名稱'lines'，你可以通過使用'len（lines.split（））'（通過在每個空白處分割字符串所得到的數組的長度） – HolyDanna

我在文本中有22行和254個獨特的單詞。所以這將是我的矩陣的大小，然後我只需要計算每個單詞的行數爲每個索引的獨特單詞，我有。現在更清晰了 – HalfPintBoy

嘗試下一個：

import re 
import numpy as np 
from itertools import chain 

text = open('/Users/Half_Pint_Boy/Desktop/sentenses.txt') 

text_list = text.readlines() 

c=0 

for i in range(len(text_list)): 
    c=c+1 

text_niz = [] 

for i in range(len(text_list)): 
    text_niz.append(text_list[i].lower()) # перевел к нижнему регистру 

slovo = [] 

for j in range(len(text_niz)): 
    slovo.append(re.split('[^a-z]', text_niz[j])) # токенизация 

for e in range(len(slovo)): 

    while slovo[e].count('') != 0: 
     slovo[e].remove('') # удалил пустые слова 

slovo_list = list(chain(*slovo)) 
print (slovo_list) # составил список слов 

slovo_list=list(set(slovo_list)) # удалил повторяющиеся 
x=len(slovo_list) 

s = [] 

for i in range(len(slovo)): 
    for j in range(len(slovo_list)): 
     s.append(slovo[i].count(slovo_list[j])) # посчитал количество слов в каждом предложении 

matr = np.array(s) # матрица вхождений слов в предложения 
d = matr.reshape((c, x)) # преобразовал в матрицу 22*254

來源

2016-07-25 16:41:04 user3882036

看起來您正在嘗試創建類似於n-dimensional list的內容。這些被嵌套列表裏面自己這樣實現的：

two_d_list = [[0, 1], [1, 2], [example, blah, blah blah]] 
words = two_d_list[2] 
single_word = two_d_list[2][1] # Notice the second index operator

這個概念是非常靈活的Python和也可以嵌套在一個字典做，你想：

two_d_list = [{"word":1}, {"example":1, "blah":3}] 
words = two_d_list[1] # type(words) == dict 
single_word = two_d_list[2]["example"] # Similar index operator, but for the dictionary

這實現了你想要的功能，但不使用語法matrix[c,index]，但是這種語法在python中並不存在索引。方括號內的逗號通常描述列表文字的元素。相反，你可以用matrix[c][index] = count

訪問行的字典中的元素您可以重載索引運算符來實現你想要的syntx。 Here是一個關於實現你想要的語法的問題。總結：

在列表類的包裝中重載__getitem__(self, inex)函數，並將函數設置爲接受元組。元組可以在沒有括號創建，使語法matrix[c, index] = count

來源

2016-07-25 13:12:20 rtmh

填充python矩陣

回答

相關問題