R中的連續引用編號：連字符中的單獨數字，如果順序 - 如果不是，則添加逗號

我想爲R中的數字生成sequential citation numbers。如果數字是連續的，則應該用連字符分隔數字。否則，數字用逗號分隔。例如，號碼1, 2, 3, 5, 6, 8, 9, 10, 11 and 13應該出現爲1-3,5,6,8-11,13。R中的連續引用編號：連字符中的單獨數字，如果順序 - 如果不是，則添加逗號

這個問題已經被previously answered for c#了，我寫了一個適用於R的函數，但是這個函數可以改進。我發佈這個問題作爲其他可能有類似需求的參考。如果您發現R（我沒有）的類似問題，請投票結束，我將刪除該問題。

下面的功能不是很優雅，但似乎做的工作。 如何使功能更短，更優雅？

x <- c(1,2,3,5,6,8,9,10,11,13) 

library(zoo) ## the function requires zoo::na.approx function 

##' @title Generate hyphenated sequential citation from an integer vector 
##' @param x integer vector giving citation or page numbers 
##' @importFrom zoo na.approx 

seq.citation <- function(x) { 

## Result if lenght of the integer vector is 1. 
if(length(x) == 1) return(x) else { 

## Sort 
x <- sort(x) 

## Difference 
df <- diff(x) 

## Index to determine start and end points 
ind <- c("start", rep("no", length(df)-1), "end") 
ind[which(df > 1)] <- "end" 

## Temporary start point vector 
sts <- which(ind == "end") + 1 
ind[sts[sts < length(ind)]] <- "start" 

## Replace the first index element 
ind[1] <- "start" 

## Replace the last index element, if preceding one is "end" 
if(ind[length(ind)-1] == "end") ind[length(ind)] <- "start" 

## Groups for comma separation using "start" as the determining value. 
grp <- rep(NA, length(x)) 
grp[which(ind == "start")] <- 1:length(grp[which(ind == "start")]) 
grp <- zoo::na.approx(grp, method = "constant", rule = 2) 

## Split sequences by group 
seqs <- split(x, grp) 

seqs <- lapply(seqs, function(k) { 
    if(length(k) == 1) k else { 
    if(length(k) == 2) paste(k[1], k[2], sep = ",") else { 
    paste(k[1], k[length(k)], sep = "-") 
    }} 
}) 

## Result 
return(do.call("paste", c(seqs, sep = ","))) 
} 
} 

seq.citation(x) 
# [1] "1-3,5,6,8-11,13"

來源

2017-08-09 Mikko

還參見，類似的[交]（https://stackoverflow.com/questions/34636461/function-to-summarize - 數字的向量 - 字符串） –

您可以通過使用tapply基礎R很容易地做到這一點，

paste(tapply(x, cumsum(c(1, diff(x) != 1)), function(i) 
    ifelse(length(i) > 2, paste0(head(i, 1), '-', tail(i, 1)), 
          paste(i, collapse = ','))), collapse = ',') 

[1] "1-3,5,6,8-11,13"

來源

2017-08-09 11:48:35 Sotos

已更新。看看 – Sotos

滿足「優雅」和「短」的要求，適用於我的實際數據集。謝謝！仍然可以將'as.character（）'添加到結果中，以便長度爲1的元素將作爲字符返回。 – Mikko

這適用於你的榜樣，應該是相當普遍的。

# get run lengths of differences, with max value of 2 
r <- rle(c(1, pmin(diff(x), 2))) 

# paste selected x values with appropriate separator 
res <- paste0(x[c(1, cumsum(r$lengths))], c("-", ",")[r$values], collapse="") 

# drop final character, which is a separator 
res <- substr(res, 1, nchar(res)-1)

這返回

res 
[1] "1-3,5-6,8-11,13"

來源

2017-08-09 11:48:47 lmo

@sotos添加一個'gsub（「 - \\ d + - 」，「 - 」，res）'，你應該很好走，差不多。 – lmo

出於某種原因，我的gsub的正則表達式與8-10-11不兼容。我沒有足夠的咖啡來弄清楚爲什麼。 – lmo

不錯！如果第一個元素的長度爲1，則此解決方案將返回1-1作爲「res」的開頭。應該以某種方式解決此問題。另外我在每個'res'結尾都會得到一個逗號。 – Mikko

有，當然，從「R.utils」包seqToHumanReadable功能。

library(R.utils) 
seqToHumanReadable(x) 
# [1] "1-3, 5, 6, 8-11, 13" 
seqToHumanReadable(x, tau = 1) ## If you want 5-6 and not 5, 6 
# [1] "1-3, 5-6, 8-11, 13"

結果的外觀也可以被控制：

seqToHumanReadable(x, delimiter = "...", collapse = " | ") 
# [1] "1...3 | 5 | 6 | 8...11 | 13"

來源

2017-08-09 16:48:30 A5C1D2H2I1M1N2O1R2T1

R中的連續引用編號：連字符中的單獨數字，如果順序 - 如果不是，則添加逗號

回答

相關問題