粘貼在一起逗號分隔欄.txt文件

我有通常有5列許多.txt文件，但有些行有更多，例如：粘貼在一起逗號分隔欄.txt文件

a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e,f,g 
a,b,c,d,e

所有我想要做的是粘貼所有列在一起比第五欄延伸得更遠。上面的示例應導致：

a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e f g 
a,b,c,d,e

我該如何在R中編程？

來源

2016-08-19 snarble

你是直接從文件做這個還是你已經讀過數據？ –

嗨，非常感謝，我不知道！你的回答是非常有用的，並且你正確地假設我已經通過「read.csv」函數獲得了我在R中的數據，而我之前知道「from」，並且我不知道我在堆棧溢出中做了什麼！：）再次感謝！ – snarble

我假設你已經讀你「的.csv」文件導入R，通過：

dat <- read.csv(file, header = FALSE, fill = TRUE)

您提供的數據一個小測試：

x <- "a,b,c,d,e 
     a,b,c,d,e 
     a,b,c,d,e 
     a,b,c,d,e,f,g 
     a,b,c,d,e" 

dat <- read.csv(text = x, header = FALSE, fill = TRUE) 

#   V1 V2 V3 V4 V5 V6 V7 
#1   a b c d e  
#2   a b c d e  
#3   a b c d e  
#4   a b c d e f g 
#5   a b c d e

這可能是另一種可能性

from <- 5 
dat[, from] <- do.call(paste, dat[from:ncol(dat)]) ## merge and overwrite 
dat[, (from+1):ncol(dat)] <- NULL ## drop 

#   V1 V2 V3 V4 V5 
#1   a b c d e 
#2   a b c d e 
#3   a b c d e 
#4   a b c d e f g 
#5   a b c d e

我簡單的方法要求你知道from事先;但似乎你知道它。

來源

2016-08-19 16:40:23

我們可以讀取使用readLines數據集，由分裂「行」「」成list，找出最小的list（‘的minLength’）的length的，創建一個邏輯條件（‘I1’），子集'lst'和paste大於'minLength'的元素放在一起，並使用ifelse創建一個向量。

lines <- readLines("yourfile.txt") 
lst <- strsplit(lines, ",") 
minLength <- min(lengths(lst)) 
i1 <- lengths(lst) > minLength 
v1 <- sapply(lst[i1], function(x) paste(x[(minLength+1):length(x)], collapse=" ")) 
v2 <- ifelse(i1, v1, "")

注意：這將不需要讀取數據並檢查有多少列。它會自動查找有效列的數量並粘貼其他列。

後，我們創建矢量（「V2」），我們可以read.csv閱讀「行」和fill = TRUE

df1 <- read.csv(text = lines, header = FALSE, fill = TRUE) 
df1$newCol <- v2

或者，我們可以直接與read.csv讀取該文件，並找到列其中將具有第一個NA或「」值。當有與行1000列的100年代，就很難檢查，其中第一NA或""開始（假設沒有其他NA或""在數據集）

df1 <- read.csv("yourfile.txt", header = FALSE, fill = TRUE) 
i1 <- which.max(colSums(dat=="")!=0) 
#i1 <- which.max(colSums(is.na(dat))!=0) #if it is NA 
transform(df1[seq(i1-1)], newCol= do.call(paste, df1[i1:ncol(df1)])) 
#  V1 V2 V3 V4 V5 newCol 
#1  a b c d e  
#2  a b c d e  
#3  a b c d e  
#4  a b c d e f g 
#5  a b c d e

注意：當我貼出第一我用do.call(paste

另一種方法是使用count.fields

i1 <- min(count.fields("yourfile.txt", sep=","))

然後使用read.csv/read.table和transform數據讀取數據集，如上述方法。

來源

2016-08-19 16:26:09 akrun

如果你是一個基於Unix的系統上，你可以只處理前加載式的R 之前的文件（例如文件ff.txt）：

$ paste -d ',' <(cut -f 1-4 -d ',' ff.txt) <(cut -f 5- -d ',' ff.txt | tr ',' ' ') > ff-mod.txt

其寫入新文件ff-mod.txt：

$ cat ff-mod.txt 
a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e 
a,b,c,d,e f g 
a,b,c,d,e

該文件可以很容易地將被讀入R：

> read.table('ff-mod.txt', sep=',') 
    V1 V2 V3 V4 V5 
1 a b c d  e 
2 a b c d  e 
3 a b c d  e 
4 a b c d e f g 
5 a b c d  e

來源

2016-08-19 17:57:39 user1981275

粘貼在一起逗號分隔欄.txt文件

回答

相關問題