2017-11-11 183 views
2

我想在R中創建一個函數「f」,它在條目中包含個人和個人之間的邊緣data.frame(例如稱爲A2),並返回另一個只有A2的「祖先」和「孩子」以及祖先和孩子的祖先的數據框架!R中的函數返回網絡中的祖先和孩子

爲說明我的複雜的問題:

library(visNetwork) 
nodes <- data.frame(id = c(paste0("A",1:5),paste0("B",1:3)), 
       label = c(paste0("A",1:5),paste0("B",1:3))) 
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
       to = c("A2","A3","A4","A4","A5","B3","B3")) 
visNetwork(nodes, edges) %>% 
    visNodes(font = list(size=45)) %>% 
    visHierarchicalLayout(direction = "LR", levelSeparation = 500) 

enter image description here

在這個例子中,包含data.frame 2個不同獨立的網絡:1個網絡與 「A」 S和其他與 「B」 S 。

我想實現函數f(數據=邊緣,逐張=「A2」),它返回其包含涉及的「A」 S網絡data.frame邊緣的所有行一個data.frame:

F(邊,「A2」)將返回該提取物data.frame邊緣

head(f(edges,"A2")) 
# from to 
#1 A1 A2 
#2 A1 A3 
#3 A2 A4 
#4 A3 A4 
#5 A4 A5 

我希望這是很清楚的,你來幫我。

非常感謝!

+0

你試過了什麼?你試圖實現的算法是什麼? –

+0

不確定要確切地理解你想要的東西,但其目標實際上是爲每個個體返回它的祖先和子女以及他們的子女和祖先的祖先的子女。在花時間(當然是數小時)編寫代碼之前,我想知道是否有一個衆所周知的函數/程序包來做這件事,因爲在我看來,它可能是一個非常基本的問題(不像我),他們習慣於與網絡一起工作。但是我沒有在互聯網上找到滿意的東西(僅適用於樹木),所以我想問更多的專業人士!謝謝 – antuki

+0

我不是圖形分析師,但也許這可能有所幫助:http://igraph.org/r/doc/components.html – romles

回答

1

我寫了一個簡單的算法來查找所有鏈接到個人的家庭(我相信它可以改進)。像@romles建議你可以用像igraph這樣的一些R包來做同樣的事情。然而,在這種情況下,我的函數看起來更像igraph選項。

edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
        to = c("A2","A3","A4","A4","A5","B3","B3"), 
        stringsAsFactors = FALSE) 
f <- function(data, indiv){ 
    children_ancestors <- function(indiv){ 
     # Find children and ancestors of an indiv 
     c(data[data[,"from"]==indiv,"to"],data[data[,"to"]==indiv,"from"]) 
    } 
    family <- indiv 
    new_people <- children_ancestors(indiv) # New people to inspect 
    while(length(diff_new_p <- setdiff(new_people,family)) > 0){ 
     # if the new people aren't yet in the family : 
     family <- c(family, diff_new_p) 
     new_people <- unlist(sapply(diff_new_p, children_ancestors)) 
     new_people <- unique(new_people) 
    } 
    data[(data[,1] %in% family) | (data[,2] %in% family),] 
} 

f(edges, "A2")給出了預期的結果。與igraph相比:

library(igraph) 
library(microbenchmark) 
edges2 <- graph_from_data_frame(edges, directed = FALSE) 
microbenchmark(simple_function = f(edges,"A2"), 
       igraph_option = as_data_frame(subgraph.edges(edges2, subcomponent(edges2, 'A2', 'in'))) 
       ) 
#Unit: microseconds 
#   expr  min  lq  mean median  uq  max neval 
# simple_function 874.411 968.323 1206.037 1123.515 1325.075 2957.931 100 
# igraph_option 1239.896 1451.364 1802.341 1721.227 1984.380 3907.089 100 
+0

非常感謝你們三位給出的答案,對於理解我需要的算法和igraph軟件包都非常有用。我將花時間瞭解您提供的所有解決方案! – antuki

1

這個工作對我來說:

library(igraph) 
g <- graph_from_literal(A1--A2, A1--A3, A2--A4, A3--A4, A4--A5, B1--B3, B2--B3) 
sg_a2 <- subcomponent(g, 'A2', 'in') 
as_data_frame(subgraph.edges(g, sg_a2)) 

它提供:

# from to 
#1 A1 A2 
#2 A1 A3 
#3 A2 A4 
#4 A3 A4 
#5 A4 A5 
+0

謝謝你們三位的答案,對於理解算法非常有用我需要和igraph軟件包。我將花時間瞭解您提供的所有解決方案! – antuki

2

你可以嘗試和過濾只有連接到A2的節點(即距離不等於Inf

library(tidygraph) 
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
        to = c("A2","A3","A4","A4","A5","B3","B3")) 
as_tbl_graph(edges) %>% 
    filter(is.finite(node_distance_to(name=="A2", mode="all"))) 

哪給出

# A tbl_graph: 5 nodes and 5 edges 
# 
# A directed acyclic simple graph with 1 component 
# 
# Node Data: 5 x 1 (active) 
    name 
    <chr> 
1 A1 
2 A2 
3 A3 
4 A4 
5 A5 
# 
# Edge Data: 5 x 2 
    from to 
    <int> <int> 
1  1  2 
2  1  3 
3  2  4 
# ... with 2 more rows 
+0

謝謝你們三個人的回答,這對我理解我需要的算法和igraph軟件包非常有用。我將花時間瞭解您提供的所有解決方案! – antuki