我有一個製表符分隔的文本文件,我想將它讀入R.該文件就是我稱之爲「半結構化」的文件 - 也就是說,大部分文件由具有相同大小(32 * 30)的數據幀(300+),以及colname和rownames。在每個數據幀之前的行上,該數據幀(ID1 ...等)有一個唯一的ID,每個數據幀之間都有一個空行。閱讀R中的半結構化文本文件
我試圖把文件讀入R中的以下內容:
read.table(file = "my.file", header = TRUE, sep = "\t",
na.strings = " ", blank.lines.skip = FALSE)
它工作得很好,但整個文件被讀爲一個單一的因素是水平。理想情況下,我最終會喜歡列表中的數據,每個數據框都是列表中的一個元素,並且每個列表元素都有唯一的ID作爲鍵。下面是使用上述命令從文件開頭讀入的兩個矩陣的示例(儘管它們被表示爲單一因子,數據的形狀與文本文件中的相同)。關於如何閱讀並重塑成清單的任何想法?
Ind <- structure(list(ID.1 = structure(c(2L, 43L, 41L, 39L, 37L, 35L,
33L, 31L, 30L, 29L, 28L, 27L, 26L, 25L, 22L, 20L, 18L, 17L, 16L,
15L, 14L, 13L, 12L, 11L, 10L, 9L, 8L, 7L, 6L, 4L, 3L, 1L, 5L,
2L, 43L, 42L, 40L, 38L, 36L, 34L, 32L, 30L, 29L, 28L, 27L, 26L,
24L, 23L, 21L, 19L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L, 9L,
8L, 7L, 6L, 4L, 3L), .Label = c("", " 66.5E 67.5E 68.5E 69.5E 70.5E 71.5E 72.5E 73.5E 74.5E 75.5E 76.5E 77.5E 78.5E 79.5E 80.5E 81.5E 82.5E 83.5E 84.5E 85.5E 86.5E 87.5E 88.5E 89.5E 90.5E 91.5E 92.5E 93.5E 94.5E 95.5E 96.5E 97.5E",
" 8.5N 0.0 0.0 0.0 ",
" 9.5N 0.0 0.0 0.0 0.0 ",
" ID=2", " 10.5N 0.0 0.0 0.0 0.0 0.0 ",
" 11.5N 0.0 0.0 0.0 0.0 0.0 ",
" 12.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 13.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 14.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 15.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 16.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 17.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 18.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 19.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 20.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 21.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 22.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 22.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 23.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 23.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 1.5 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 24.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 24.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 25.5N 0.0 0.0 0.0 0.0 0.0 0.0 2.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 25.5N 0.0 0.0 0.0 0.0 0.0 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 26.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 27.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0",
" 28.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0",
" 29.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 30.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 31.5N 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 31.5N 0.0 0.0 0.0 2.9 4.6 4.5 ",
" 32.5N 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 32.5N 0.0 0.0 0.0 1.2 5.4 4.2 ",
" 33.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 33.5N 0.0 0.0 0.0 0.0 0.9 0.7 2.5 ",
" 34.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 34.5N 0.0 0.0 0.0 0.0 0.4 0.6 1.5 ",
" 35.5N 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 35.5N 0.0 0.0 0.0 0.0 0.2 0.4 1.0 1.6 ",
" 36.5N 0.0 0.0 0.0 0.0 0.0 0.0 ",
" 36.5N 0.0 0.0 0.0 0.0 0.3 0.6 ",
" 37.5N "
), class = "factor")), .Names = "ID.1", class = "data.frame", row.names = c(NA,
-64L))
(在讀取的數據中,行名全部排隊 - 我搞砸了這個縮進代碼)。
嘿,謝謝。我看着readLines,不幸的是,當我讀到前35行時,似乎將整個35塊的塊放入單個字符串中。似乎沒有辦法將其轉換爲數據框。 – Alberto 2011-12-14 15:19:27
容易使用grep,substr,stringr軟件包等進行清理,一旦它被加載到R ... – 2011-12-14 15:43:24