Cassandra嵌套鍵值。更好的方案？

我想創建類似於卡桑德拉數據庫嵌套數據模型：Cassandra嵌套鍵值。更好的方案？

Forums = { 
    forum001: { 
     name: "General News", 
     topics: { 
      topic000001: { 
       subject: "This is what I think", 
       date: "2012-08-24 10:12:13", 
       posts: { 
        post20120824.101213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:12:13" } 
        post20120824.101513: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:15:13" } 
        post20120824.103213: { username: "harry", content: "Blah blah", datetime: "2012-08-24 10:32:13" } 
       } 
      }, 
      topic000002: { 
       subject: "OMG Look at this", 
       date: "2012-08-24 10:42:13", 
       posts: { 
        post20120824.104213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:42:13" } 
        post20120824.104523: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:45:23" } 
        post20120824.104821: { username: "harry", content: "Blah blah", datetime: "2012-08-24 10:48:21" } 
       } 
      } 
     } 
    }, 
    forum002: { 
     name: "Specific News", 
     topics: { 
      topic000003: { 
       subject: "Whinge whine", 
       date: "2012-08-24 10:12:13", 
       posts: { 
        post20120824.101213: { username: "tom", content: "Blah blah", datetime: "2012-08-24 10:12:13" } 
        post20120824.101513: { username: "dick", content: "Blah blah blah", datetime: "2012-08-24 10:15:13" } 
       } 
      } 
     } 
    } 
}

數據的基本設計是彼此內一堆嵌套地圖。我已經讀過，這是不合理的，因爲查詢這個數據結構的困難。爲了以這種方式構造數據，對於這個問題更好的解決方案是什麼？

來源

2014-01-14 user2356226

你能不能給我一點精度的你是怎樣在堅持卡桑德拉這個？ blob，Supercolumns（只有API保留，實際上是組合）或Composibles？或者是上述的混合？這裏給出的模式中的關鍵字，列名或列是什麼？ –

我還沒有設計任何這種結構，因爲我想弄清楚如何正確地構造它之前。上面的結構模型是我在概念上想到的，但我寧願擔心功能和速度。我需要回答了數據庫的主要查詢是：「給我之間[時間1]和[時間2]與forumID = [ID]的所有帖子」希望這有助於一點，與功能我需要出數據庫。 – user2356226

謝謝，這有助於很多...我會發佈一個答案。 –

如果你想使用一系列可以排序的東西（例如你的例子中的日期）進行查詢，那麼它需要在column_name中。

首先我會做的論壇ID的行鍵和column_family會是這個樣子：

*Row*: "forum001"<br> 
=> *column*: "name" - *value*: "General News"<br> 
=> *column*: "post::20120824101213::[some_uuid]" - *value*: "[serialized blob of data representing everything in the post]"<br>

從這個你要問到的範圍post::201203* ~ post::201204*返回列本月全部上崗以行軍爲例。

需要記住的一點是行會隨機存儲在您的cassandra集羣中（如果您保留建議的Cassandra的默認設置）。同一行的列位於同一節點上並進行排序，因此您可以將這些列用於值範圍。

對於列名，我喜歡使用列中序列化的對象的類型作爲前綴（這樣我可以在同一行中有很多類型）。然後，你必須在如何代表列名的日期的幾個選擇：

ISO format date + a random UUID：ISO格式爲您提供可讀性進行調試和作爲一個字符串排在，附加的UUID是有保證的唯一性列名（或者你可能在高流量的時期意外覆寫）
：會給你的時間排序，並一氣呵成的獨特性，但你不能從自己的卡珊德拉控制檯工具告訴日期。

你將不得不使用不同的行名稱的任何類型的查詢條件（作者，日期，大小，...），所以使用非規範化

很好看的（我覺得我已經貼這個一千次）是從eBay的這個兩個部分文章：
Cassandra Data Modeling Best Practices, Part 1
Cassandra Data Modeling Best Practices, Part 2

來源

2014-01-15 01:08:08

非常感謝您的幫助，我一定會研究這一點。 – user2356226

Cassandra嵌套鍵值。更好的方案？

回答

相關問題