2015-02-10 405 views
0

我有一個數據框,我想刪除所有以#開頭的行。任何人都可以告訴我如何去做。提前致謝。如何刪除以R中的特殊字符開頭的行

#ID_REF = The name of the probe set, blank for control probes   
    #VALUE = The signal value calculated by MAS5, normalized    
    #ABS_CALL = The detection value calculated by the MAS5   
    #DETECTION P-VALUE = The detection p-value calculated by the MAS5   
    *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
    AFFX-BioB-5_at 757.7 P 0.00039 
    AFFX-BioB-M_at 933.7 P 0.000095 
    AFFX-BioB-3_at 525.6 P 0.000095 
    AFFX-BioC-5_at 1999.5 P 0.000044 
    AFFX-BioC-3_at 2339.5 P 0.000044 
    AFFX-BioDn-5_at 4321.3 P 0.000044 
    AFFX-BioDn-3_at 9229.4 P 0.00007 
    AFFX-CreX-5_at 21949.9 P 0.000044 
    AFFX-CreX-3_at 26022.8 P 0.000044 
    AFFX-DapX-5_at 1171.1 P 0.00006 
+2

嘗試'read.delim( 'yourfile',comment.char = '#')' – akrun 2015-02-10 16:33:00

+0

http://stackoverflow.com/questions/28433328/skip-comment-line-in-csv的可能的複製-file-using-r – akrun 2015-02-10 16:36:07

+0

@akrun,它使用'#'刪除一些行,但合併一行中的所有數據 – AwaitedOne 2015-02-10 16:38:37

回答

0

部分行中的註釋字符(#)不是第一個字符。一種方法是刪除其使用grep註釋字符(#)(「lines2」)的線,然後用read.csv

lines <- readLines('awaited.csv') 
lines1 <- gsub('^ +| +$', '', lines) 
lines2 <- lines1[!grepl('^#|^.*#', lines1)] 
d1 <- read.csv(text=lines2, check.names=FALSE, stringsAsFactors=FALSE) 
str(d1) 
#'data.frame': 54682 obs. of 4 variables: 
# $ *ID_REF**   : chr "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" ... 
# $ VALUE**   : num 758 934 526 2000 2340 ... 
# $ ABS_CALL**  : chr "P" "P" "P" "P" ... 
# $ DETECTION P-VALUE*: num 3.9e-04 9.5e-05 9.5e-05 4.4e-05 4.4e-05 4.4e-05 7.0e-05 4.4e-05 4.4e-05 6.0e-05 ... 
head(d1,3) 
#  *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
#1 AFFX-BioB-5_at 757.7   P   3.9e-04 
#2 AFFX-BioB-M_at 933.7   P   9.5e-05 
#3 AFFX-BioB-3_at 525.6   P   9.5e-05 

讀或你可以#之前刪除所有其他字符後使用comment.char='#'論點read.csv#sub(.*...))的行中。

d2 <- read.csv(text=sub('.*(#.*)', '\\1', lines), 
    check.names=FALSE, stringsAsFactors=FALSE, comment.char='#') 
dim(d2) 
#[1] 54682  4 
head(d2,3) 
#  *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* 
#1 AFFX-BioB-5_at 757.7   P   3.9e-04 
#2 AFFX-BioB-M_at 933.7   P   9.5e-05 
#3 AFFX-BioB-3_at 525.6   P   9.5e-05 
+0

對我來說同樣的錯誤:'在掃描中的錯誤(文件,什麼,nmax,sep,dec,引用,跳過,nlines,na.strings,: 行2沒有2個元素' – AwaitedOne 2015-02-10 17:16:24

+0

@AwaitedOne嘗試'fill = TRUE ' – akrun 2015-02-10 17:24:15

+0

我給了一個嘗試,但似乎不工作。我運行你的上面的代碼。 – AwaitedOne 2015-02-10 17:26:07