我的問題來自於Phil的this answer。 代碼如何獲得<class'numpy.str'>而不是<class'numpy.object _'>
df = pd.DataFrame([[1,31,2.5,1260759144], [1,1029,3,1260759179],
[1,1061,3,1260759182],[1,1129,2,1260759185],
[1,1172,4,1260759205],[2,31,3,1260759134],
[2,1111,4.5,1260759256]],
index=list(['a','c','h','g','e','b','f',]),
columns=list(['userId','movieId','rating','timestamp']))
df.index.names=['ID No.']
df.columns.names=['Information']
def df_to_sarray(df):
"""
Convert a pandas DataFrame object to a numpy structured array.
This is functionally equivalent to but more efficient than
np.array(df.to_array())
:param df: the data frame to convert
:return: a numpy structured array representation of df
"""
v = df.values
cols = df.columns
# df[k].dtype.type is <class 'numpy.object_'>,I want to convert it to numpy.str
types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
dtype = np.dtype(types)
z = np.zeros(v.shape[0], dtype)
for (i, k) in enumerate(z.dtype.names):
z[k] = v[:, i]
return z
sa = df_to_sarray(df.reset_index())
print(sa)
菲爾的回答運作良好,而如果我運行
sa = df_to_sarray(df.reset_index())
我會得到下面的結果。
array([('a', 1, 31, 2.5, 1260759144), ('c', 1, 1029, 3.0, 1260759179),
('h', 1, 1061, 3.0, 1260759182), ('g', 1, 1129, 2.0, 1260759185),
('e', 1, 1172, 4.0, 1260759205), ('b', 2, 31, 3.0, 1260759134),
('f', 2, 1111, 4.5, 1260759256)],
dtype=[('ID No.', 'O'), ('userId', '<i8'), ('movieId', '<i8'), ('rating', '<f8'), ('timestamp', '<i8')])
我希望我能得到dtype如下。
dtype=[('ID No.', 'S'), ('userId', '<i8'), ('movieId', '<i8'), ('rating', '<f8'), ('timestamp', '<i8')]
字符串而不是對象。
我測試了df [k] .dtype.type的類型,我發現它是<class 'numpy.object_'>
,我想將它轉換爲numpy.str。怎麼做?
你試過'''df [col] .astype(str)'''? –
'types'是iist。所以你應該能夠改變第一個元組。這可能是'('ID號','O')'。 – hpaulj
我只會將'object'類型轉換爲'string',對於類型爲'int'的其他列,我想將它們保留爲'int'。 – Renke