0
我具有以下形式的數據:如何根據Python中的列值將數據分組爲變量?
p q r y_1 y_2 y_3 y_4 y_5 y_6 y_7
2 8 14 0748 0748 0748 0790 0804 0818 0832
2 9 22 1262 1262 1262 1328 1350 1372 1394
1 5 19 0512 0512 0512 0569 0588 0607 0626
2 7 19 0748 0748 0748 0805 0824 0843 0862
3 11 13 1608 1608 1608 1647 1660 1673 1686
2 7 20 0788 0788 0788 0848 0868 0888 0908
1 4 15 0310 0310 0310 0355 0370 0385 0400
3 12 17 2130 2130 2130 2181 2198 2215 2232
1 4 14 0280 0280 0280 0322 0336 0350 0364
1 5 20 0552 0552 0552 0612 0632 0652 0672
2 7 17 0674 0674 0674 0725 0742 0759 0776
3 10 13 1276 1276 1276 1315 1328 1341 1354
3 11 20 1846 1846 1846 1906 1926 1946 1966
3 11 14 1636 1636 1636 1678 1692 1706 1720
1 6 18 0566 0566 0566 0620 0638 0656 0674
3 12 16 2096 2096 2096 2144 2160 2176 2192
2 9 21 1218 1218 1218 1281 1302 1323 1344
3 10 19 1474 1474 1474 1531 1550 1569 1588
2 8 13 0720 0720 0720 0759 0772 0785 0798
1 6 22 0730 0730 0730 0796 0818 0840 0862
1 4 13 0252 0252 0252 0291 0304 0317 0330
2 8 15 0778 0778 0778 0823 0838 0853 0868
3 12 15 2064 2064 2064 2109 2124 2139 2154
3 10 16 1366 1366 1366 1414 1430 1446 1462
2 9 16 1028 1028 1028 1076 1092 1108 1124
1 5 16 0404 0404 0404 0452 0468 0484 0500
1 6 21 0686 0686 0686 0749 0770 0791 0812
我想這些陣列與NumPy的和組的數據爲基於Q的相同值的變量進行排序,如下所示:
1 4 13 0252 0252 0252 0291 0304 0317 0330
q_a 1 4 14 0280 0280 0280 0322 0336 0350 0364
1 4 15 0310 0310 0310 0355 0370 0385 0400
--------------------------------------------------------------------
1 5 16 0404 0404 0404 0452 0468 0484 0500
q_b 1 5 19 0512 0512 0512 0569 0588 0607 0626
1 5 20 0552 0552 0552 0612 0632 0652 0672
--------------------------------------------------------------------
1 6 18 0566 0566 0566 0620 0638 0656 0674
q_c 1 6 21 0686 0686 0686 0749 0770 0791 0812
1 6 22 0730 0730 0730 0796 0818 0840 0862
--------------------------------------------------------------------
2 7 17 0674 0674 0674 0725 0742 0759 0776
q_d 2 7 19 0748 0748 0748 0805 0824 0843 0862
2 7 20 0788 0788 0788 0848 0868 0888 0908
--------------------------------------------------------------------
2 8 13 0720 0720 0720 0759 0772 0785 0798
q_e 2 8 14 0748 0748 0748 0790 0804 0818 0832
2 8 15 0778 0778 0778 0823 0838 0853 0868
--------------------------------------------------------------------
2 9 16 1028 1028 1028 1076 1092 1108 1124
q_f 2 9 21 1218 1218 1218 1281 1302 1323 1344
2 9 22 1262 1262 1262 1328 1350 1372 1394
--------------------------------------------------------------------
3 10 13 1276 1276 1276 1315 1328 1341 1354
q_g 3 10 16 1366 1366 1366 1414 1430 1446 1462
3 10 19 1474 1474 1474 1531 1550 1569 1588
--------------------------------------------------------------------
3 11 13 1608 1608 1608 1647 1660 1673 1686
q_h 3 11 14 1636 1636 1636 1678 1692 1706 1720
3 11 20 1846 1846 1846 1906 1926 1946 1966
--------------------------------------------------------------------
3 12 15 2064 2064 2064 2109 2124 2139 2154
q_i 3 12 16 2096 2096 2096 2144 2160 2176 2192
3 12 17 2130 2130 2130 2181 2198 2215 2232
我仍然努力根據q的值對數據進行分組。我的努力,到目前爲止,能夠理清只有數據:
import numpy as np
data = open('data.dat', "r")
line = data.readline()
while line.startswith('#'):
line = data.readline()
data_header = line.split("\t")
data_header[-1] = data_header[-1].strip()
_data_ = np.genfromtxt(data, comments='#', delimiter='\t', names = data_header, dtype = None, unpack = True).transpose()
sorted_index = np.lexsort((_data_['r'], _data_['q'], _data_['p']))
_data_ = _data_[sorted_index]
p_ind = np.nonzero(np.diff(_data_['p']))[0]
q_ind = np.nonzero(np.diff(_data_['q']))[0]
r_ind = np.nonzero(np.diff(_data_['r']))[0]
n_p = len(np.nonzero(np.diff(_data_['p']))[0]) + 1
n_q = len(np.nonzero(np.diff(_data_['q']))[0]) + 1
n_r = len(np.nonzero(np.diff(_data_['r']))[0]) + 1
那裏正在以與NumPy/SciPy的任何功能,可以根據值分組?