2017-10-15 122 views
1

我想暗算「MJD」與「MJD_DUPLICATE」與(13MB)數據集 DR14Q_pruned_repeats.csv」找到這裏:: https://www.dropbox.com/s/1dyong27bre3p9j/DR14Q_pruned_repeats.csv?dl=0字符串轉換的熊貓系列到湘江邊

這裏是我的代碼:

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from astropy.table import Table 
from astropy.io import ascii 
from astropy.io import fits 

filename = 'DR14Q_pruned_repeats.csv' 
df = pd.read_csv(filename) 

multiples = df[df["N_SPEC"] >2] 

multiples.plot.scatter(x='MJD', y='N_SPEC') 
plt.show() 

multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE') 
plt.show() 

的MJD與MJD_DUPLICATE繪製線返回一個錯誤::

ValueError: scatter requires y column to be numeric 

和pd.to_numeric線返回只是 NaNs。

回答

0

您需要:從string小號

import ast 

doubles = df[df["N_SPEC"] ==2].copy() 
multiples = df[df["N_SPEC"] >2].copy() 
repeats = df[df["N_SPEC"] >1].copy() 

multiples.plot.scatter(x='MJD', y='N_SPEC') 
plt.show() 

將列MJD_DUPLICATE到元組,然後選擇值按位置 - 例如str[1]元組的第二個值:

print (multiples['MJD_DUPLICATE'].head(10)) 
5  (0, 56279, 0, 56539, 0, 56957, -1, -1, -1, -1,... 
85  (0, 56243, 0, 56543, 0, 57328, -1, -1, -1, -1,... 
170 (0, 52262, 0, 55447, 0, 57011, -1, -1, -1, -1,... 
200 (0, 52262, 0, 55443, 0, 57006, -1, -1, -1, -1,... 
262 (0, 52525, 0, 55443, 0, 57011, -1, -1, -1, -1,... 
277 (0, 51793, 0, 55531, 0, 57006, -1, -1, -1, -1,... 
287 (0, 55182, 0, 55184, 0, 55443, -1, -1, -1, -1,... 
313 (0, 56248, 0, 56245, 0, 56572, -1, -1, -1, -1,... 
314 (0, 55182, 0, 55184, 0, 55444, -1, -1, -1, -1,... 
324 (0, 52261, 0, 55184, 0, 55444, -1, -1, -1, -1,... 
Name: MJD_DUPLICATE, dtype: object 

ser = multiples['MJD_DUPLICATE'].apply(ast.literal_eval).str[1] 
multiples['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce') 

print (multiples['MJD_DUPLICATE'].head(10)) 
5  56279 
85  56243 
170 52262 
200 52262 
262 52525 
277 51793 
287 55182 
313 56248 
314 55182 
324 52261 
Name: MJD_DUPLICATE, dtype: int64 

multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE') 
plt.show() 
+0

這是有效的,但不會做我以後的事情。我需要保留MJD_DUPLICATES中的所有數字數據,而不僅僅是第二列。 – npross

+0

是的,然後使用新名稱'multiples ['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser,errors ='coerce')'創建新列並繪製它'multiples.plot.scatter(x ='MJD',y =' MJD_DUPLICATE_NEW')' – jezrael

+0

根本無法繪製元組,需要標量。 – jezrael