可視化數據集中其他字段的字段唯一性

我有一個數據可視化問題。我的數據如下所示：{int x，int y，string a，string b，...}可視化數據集中其他字段的字段唯一性

我想要顯示{x，y}唯一標識{a，b}的能力。也就是說，如果x，y是已知的，那麼經常有1個，有時候只有幾個可能的a，b組合。我知道這是我的數據，但我想在可視化中展示。假設記錄數量約爲5000，最好的方法是什麼？

Here are a few lines of this data 
2320,1190,T,a 
3051,1680,i,a 
3099,1495,N,v 
3395,1475,C,v 
3395,1475,C,c 
3400,1480,C,a 
3405,1615,A,a 
3430,1630,1f,a 
3440,1480,C1,d 
3440,1640,C1,e 
3450,1640,u,lk

來源

2017-04-10 chet

你能分享你的數據集的前幾行嗎？ –

我編輯帖子以顯示數據以及字段的順序。 – chet

也許像這樣的東西是你在找什麼。從這裏你可以分辨出非唯一的條目。

require(ggplot2) 

df <- read.table(file="clipboard", sep=",",    #Read in your data 
       header=F, skip = 1, stringsAsFactors = F) 

df$key <- with(df, paste0(V1, V2))      #Make Key from {x,y} 
Counts <- as.data.frame(xtabs(~key, data = df))   #Get counts for {x,y} pairs 

df_merge <- merge(df, Counts, by = "key", all.x =T)  #Merge the Tables by Key 
df_merge$Unique <- ifelse(df_merge$Freq == 1, "Yes", "No") #Unique Yes or No 

qplot(data = df_merge, x = V1, y = V2, color = Unique, geom = "point") #Plot

來源

2017-04-11 01:33:14 Kgrey

可視化數據集中其他字段的字段唯一性

回答

相關問題