在R中拆分數據幀的一部分

我想根據分隔符將數據幀列拆分爲多個列。我的數據框有一列，它看起來像如下─在R中拆分數據幀的一部分

A0017493 .A 11.86 23:59_10/10/2016 1.00 SURVEYED 
A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG

我想其中有6列，即網站，文件，變量，時間戳，值一個數據幀和註釋，就像如下─

Site File Variable Timestamp Value Comment 
A0017493 .A 11.86 23:59_10/10/2016 1.00 SURVEYED 
A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG

我試圖通過使用tidyr軟件包並使用'獨立'聲明來做到這一點，因爲每個觀察都是由空間分隔的。然而，問題是評論之間有空格，我不想分開評論。有沒有辦法做到這一點？任何幫助將不勝感激。謝謝！

來源

2016-10-10 asmi

如果除了評論一切都是固定的寬度，你也許可以嘗試https://stat.ethz.ch/ R-manual/R-devel/library/utils/html/read.fwf.html –

我們可以使用全向包庫來做你想做的。關鍵是要根據「'字符拆分每行，然後將這些註釋列合併在一起。這假定您的原始數據包含在名爲df的數據框中，該框架有一個名爲V1的列。

library(tidyverse) 

df.new <- strsplit(df$V1, split = ' ') %>% # split each row into a character vector contained in a list 
    lapply(function(x) data.frame(rbind(x))) %>% # simplify each vector into a character array 
    rbind.fill %>% # glue together the ragged rows 
    unite('Comment', -X1:-X5, sep = ' ') %>% # recombine every column that is NOT one of the first 5 (i.e., combine comment columns) 
    mutate(Comment = gsub(' NA', '', Comment)) %>% # get rid of 'NA' strings 
    rename(Site = X1, File = X2, Variable = X3, Timestamp = X4, Value = X5) # relabel columns 
    mutate_all(as.character) %>% type_convert # convert columns to appropriate formats 

     Site File Variable  Timestamp Value   Comment 
1 A0017493 .A 11.86 23:59_10/10/2016 1.00   SURVEYED 
2 A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
3 A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG

來源

2016-10-10 20:56:42 jdobres

另一個tidyverse答案，這個時候使用tidyr::separate。

我們注意到每行都是空格分隔的，除了最後一行（可以包含空格）之外。在這種情況下，我們可以分割空間直到我們知道的列數。

tidyr::separate需要一個extra參數可以處理這個用例：extra = "merge"。

library(tidyverse) 

data.raw = "A0017493 .A 11.86 23:59_10/10/2016 1.00 SURVEYED 
A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG" 

data = read_csv(data.raw, col_names = "Col1") 

data %>% 
    separate(Col1, into = c("Site", "File", "Variable", "Timestamp", "Value", "Comment"), sep = "\\s", extra = "merge") %>% 
    type_convert() %>% 
    head() 

#> # A tibble: 3 x 6 
#>  Site File Variable  Timestamp Value   Comment 
#>  <chr> <chr> <dbl>   <chr> <dbl>    <chr> 
#> 1 A0017493 .A 11.86 23:59_10/10/2016 1.00   SURVEYED 
#> 2 A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
#> 3 A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG

來源

2016-10-10 21:06:09

嗨Michael - 我使用了語句newdata <-separate（mydata，col = comments，into = c（「Site」，「File」「變量」，「時間戳」，「值」，「註釋」），sep =「」，extra =「合併」，remove = TRUE），但是，生成的數據框在第一列中包含「站點」，而其他所有內容在「柱。之間的所有其他列都是空的。基本上它只是分離出網站。難道我做錯了什麼？ – asmi

似乎是AA衣衫襤褸的固定寬度格式的文件，所以

library(readr) 
pos <- fwf_positions(start = c(1, 9, 13, 19, 36, 42), end = c(9, 13, 19, 36, 42, NA)-2) # if I counted correctly... 
df <- read_fwf(file = "A0017493 .A 11.86 23:59_10/10/2016 1.00 SURVEYED 
A0017493 .A 11.86 23:59_10/11/2016 1.15 DATALOGGER CHANGED 
A0017496 .A 11.82 23:59_11/12/2016 2.06 READING IS WRONG", col_positions = pos) 
glimpse(df) 
# Observations: 3 
# Variables: 6 
# $ X1 <chr> "A001749", "A001749", "A001749" 
# $ X2 <chr> ".A", ".A", ".A" 
# $ X3 <dbl> 11.86, 11.86, 11.82 
# $ X4 <chr> "23:59_10/10/2016", "23:59_10/11/2016", "23:59_11/12/2016" 
# $ X5 <chr> "1.00 SU", "1.15 DA", "2.06 RE" 
# $ X6 <chr> "VEYED", "ALOGGER CHANGED", "DING IS WRONG"

來源

2016-10-10 21:06:47 lukeA

在R中拆分數據幀的一部分

回答

相關問題