簡單的numpy數組參考

我有問題了解X和y如何參考培訓。我有5數字列，即時通訊加載到numpy的陣列如下一個簡單的CSV文件：簡單的numpy數組參考

url = "http://www.xyz/shortDataFinal.data" 
# download the file 
raw_data = urllib.urlopen(url) 
# load the CSV file as a numpy matrix 
dataset = np.loadtxt(raw_data, delimiter=",") 
print(dataset.shape) 
# separate the data from the target attributes 

X = dataset[:,0:3] #Does this mean columns 1-4? 
y = dataset[:,4] #Is this the 5th column?

我覺得我錯誤地引用我的X值。這是我需要的：

X值參考列1-4和我的y值是最後一列，這是第5。如果我理解正確，我應該像上面所做的那樣，引用數組索引0：3作爲X值，數字4作爲y，但這些值不正確。換句話說，數組返回的值不匹配數據中的值 - 它們被一列（索引）關閉。

來源

2016-05-16 DataGuy

你想'0：4'（得到4列）。 – hpaulj

是的，你的解釋是正確的。在這種情況下，dataset是矩陣，所以numpy索引操作符（[]）使用傳統的行，列格式。

X = dataset[:,0:3]被解釋爲「列0到3的所有行」，並且y = dataset[:,4]被解釋爲「列4的所有行」。

來源

2016-05-16 03:44:35

使用多行字符串作爲csv文件一個替身：

In [332]: txt=b"""0, 1, 2, 4, 5 
    .....: 6, 7, 8, 9, 10 
    .....: """ 

In [333]: data=np.loadtxt(txt.splitlines(), delimiter=',') 

In [334]: data 
Out[334]: 
array([[ 0., 1., 2., 4., 5.], 
     [ 6., 7., 8., 9., 10.]]) 

In [335]: data.shape 
Out[335]: (2, 5) 

In [336]: data[:,0:4] 
Out[336]: 
array([[ 0., 1., 2., 4.], 
     [ 6., 7., 8., 9.]]) 

In [337]: data[:,4] 
Out[337]: array([ 5., 10.])

numpy索引從0開始; [0：4]是相同的（或多或少）爲從0開始，到號碼列表，但不包括4

In [339]: np.arange(0,4) 
Out[339]: array([0, 1, 2, 3])

另一種方式來獲得所有，但最後一列是使用-1索引

In [352]: data[:,:-1] 
Out[352]: 
array([[ 0., 1., 2., 4.], 
     [ 6., 7., 8., 9.]])

通常，CSV文件是數字和字符串值的混合。參數loadtxtdtype有一個簡短的解釋，說明如何加載和訪問它作爲結構化數組。 genfromtxt更容易使用（儘管不會混淆）。

來源

2016-05-16 06:19:01 hpaulj

簡單的numpy數組參考

回答

相關問題