塞式數據建模和查詢

可以說我有在JAVA塞式數據建模和查詢

class Shape { 
    String type; 
    String color; 
    String size; 
}

下面的模型假設我有一個基於上述模型中的以下數據。

Triangle, Blue, Small 
Triangle, Red, Large 
Circle, Blue, Small 
Circle, Blue, Medium 
Square, Green, Medium 
Star, Blue, Large

我想回答以下問題

Given the type Circle how many unique colors? 
    Answer: 1 
Given the type Circle how many unique sizes? 
    Answer: 2 

Given the color Blue how many unique shapes? 
    Answer: 2 
Given the color Blue how many unique sizes? 
    Answer: 3 

Given the size Small how many unique shapes? 
    Answer: 2 
Given the size Small how many unique colors? 
    Answer: 1

我不知道我是否應該通過以下方式模型，可以...

set: shapes -> key: type -> bin(s): list of colors, list of sizes 
set: colors -> key: color -> bin(s): list of shapes, list of sizes 
set: sizes -> key: size -> bin(s): list of shapes, list of colors

還是有更好的辦法去做這個？如果我這樣做，我需要3倍的存儲空間。

我也希望每組有數十億條目。順便說一句，該模型已被編輯，以保護無知的代碼;）

來源

2015-10-05 user432024

問題仍然存在？我同意你提出的解決方案是最好的方法，如果你因3臺電腦變得「火熱」而限制了吞吐量，那麼你的方案會很好。要回答你的問題，你可以添加：是否結果和更新發生在'在線'或'離線'（算法）？你是否需要處理形狀刪除（需要引用計數器）？你有一個「粗糙的結果」，或者你需要100％正確嗎？在沒有顏色/尺寸/類型索引的基礎模型上，您希望的吞吐量是多少？ –

NoSQL中的數據建模總是關於您計劃如何檢索數據，吞吐量和延遲。

有幾種方法來模擬這些數據;最簡單的是模仿每個領域變成一個Bin的類結構。您可以在每個垃圾桶上定義二級索引，並使用聚合查詢來回答您的問題（上圖）。

但這只是一種方式;您可能需要使用不同的數據模型來滿足延遲和吞吐量的因素。

來源

2015-10-06 22:46:56 Mnemaudsyne

我模擬了每個字段都是垃圾桶的類。然後我運行一個選擇*從Shape其中Shape =？或者Color =？或Size =？這將所有的數據帶到我的應用程序，然後我做了一個計數。這比發送3個不同的聚合到服務器更快。 – user432024

塞式數據建模和查詢

回答

相關問題