讀線附近與評論函數read.table

我讀了一些包含在包含數據信息上有一些標題行的數據線，像這樣的文本文件：讀線附近與評論函數read.table

Test file 
# 
File information 
1 2 3 4 
# 
a 2 
b 4 
c 6 
d 8

我想從這個文件中單獨閱讀各種信息。我能像這樣就好了實現這一目標：

file <- read.table(txt, nrow = 1) 
name <- read.table(txt, nrow = 1, skip = 2) 
vals <- read.table(txt, nrow = 1, skip = 3) 
data <- read.table(txt,   skip = 5)

由於兩個空白註釋行，我也可能讀取的數據是這樣的：

file <- read.table(txt, nrow = 1) 
name <- read.table(txt, nrow = 1, skip = 1) # Skip changed from 2 
vals <- read.table(txt, nrow = 1, skip = 3) 
data <- read.table(txt,   skip = 4) # Skip changed from 5

這是很好，但文本文件並不總是具有相同數量的空白註釋行;有時他們在場，有時他們不在。如果我在示例文本文件中丟失了（或兩者）註釋行，我的解決方案都不能繼續工作。

在文本文件中，skip變量永遠不會計算註釋行嗎？

來源

2017-01-13 hfisch

類似於'lines < - readLines（txt）; lines_clean < - lines [substr（lines，1，1）！=「＃」]' –

（假設：在頂部的文件元數據，一旦數據開始，沒有更多的評論後）。

（採用textConnection(...)是欺騙功能期待文件連接到處理字符串替換函數調用文件名）

一種技術是讀取文件的第一行n行（某些數字「保證」包含所有註釋/非數據行），找到最後一行，然後之前和之後全部處理：

txt <- "Test file 
# 
File information 
1 2 3 4 
# 
a 2 
b 4 
c 6 
d 8" 
max_comment_lines <- 8 
(dat <- readLines(textConnection(txt), n = max_comment_lines)) 
# [1] "Test file"  "#"    "File information" "1 2 3 4"   
# [5] "#"    "a 2"    "b 4"    "c 6"    
(skip <- max(grep("^\\s*#", dat))) 
# [1] 5

（順便說一句：或許應該做一個檢查，以確保有實際上的意見...這將返回integer(0)否則，和read*功能不一樣，作爲參數）

現在我們「知道」，最後找到的評論是在第5行，我們可以用前4行獲得的頭信息...

meta <- readLines(textConnection(txt), n = skip - 1) 
meta <- meta[! grepl("^\\s*#", meta) ] # remove the comment rows themselves 
meta 
# [1] "Test file"  "File information" "1 2 3 4"

...並跳過5行獲取數據。

dat <- read.table(textConnection(txt), skip = skip) 
str(dat) 
# 'data.frame': 4 obs. of 2 variables: 
# $ V1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 
# $ V2: int 2 4 6 8

來源

2017-01-13 23:01:01 r2evans

當然......呃。謝謝。 – r2evans

感謝'textConnection'技巧，這是一個很好的獎金信息！ – hfisch

從技術上講，'read.table'和朋友有一個'text ='參數，它將接受字符串而不是查找文件。由於'readLines'沒有'text ='，爲了保持一致性，我使用了'textConnection'，儘管這不是必須的。 – r2evans

讀線附近與評論函數read.table

回答

相關問題