2017-02-13 114 views
1

我有一個測試設置,我想有一個主數據的副本。如何在兩臺服務器之間可靠地複製Cassandra數據庫?

我使用卡桑德拉包從datastax,版本3.0.9

我使用CQLSH取數據的轉儲,並恢復對測試設置。 我用

COPY與DELIMITER = '\ T' 和NULL = '空' 和報價= ''「和HEADER =真走的是主數據的副本

我用填充

COPY數據從DELIMITER = '\ T' 和NULL = '空' 和報價= ''「和HEADER =真

AFTE COPY_FROM,CQLSH表示它成功複製了文件中的所有行。但是當我在表上運行一個計數(*)時,有幾行丟失。 缺少行沒有特定的模式。如果我在截斷表格後重放命令,則會丟失一組新的行。缺失行的數量是隨機的。

表結構包含用戶定義數據類型的列表/集合,UDT內容中可能包含「空」值。

是否有任何其他可靠的方式來複制數據,而不是以編程方式讀取和寫入兩個數據庫之間的單個行?


架構表(字段名改爲):

CREATE TYPE UDT1 (
    field1 text, 
    field2 int, 
    field3 text 
); 

CREATE TYPE UDT2 (
    field1 boolean, 
    field2 float 
); 

CREATE TABLE cypher.table1 (
    id int PRIMARY KEY, 
    list1 list<frozen<UDT1>>, 
    data text, 
    set1 set<frozen<UDT2>> 
) WITH bloom_filter_fp_chance = 0.01 
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} 
    AND comment = '' 
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} 
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} 
    AND crc_check_chance = 1.0 
    AND dclocal_read_repair_chance = 0.1 
    AND default_time_to_live = 0 
    AND gc_grace_seconds = 864000 
    AND max_index_interval = 2048 
    AND memtable_flush_period_in_ms = 0 
    AND min_index_interval = 128 
    AND read_repair_chance = 0.0 
    AND speculative_retry = '99PERCENTILE'; 
+0

您能否提供您的測試設置架構? –

+0

添加架構@AnowerPerves – Cypher

回答

2

除了導入/導出,你可以嘗試和複製數據本身的數據。

  1. 使用「nodetool snapshot」https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsSnapShot.html拍攝原始集羣中數據的快照。
  2. 測試羣集上創建架構
  3. 負載從原始羣集到測試簇快照:

    一個。如果測試中的所有節點都包含所有數據(單節點/ 3節點rf = 3) - 或者數據量很小 - 請將文件從原始集羣複製到keyspace/column_family目錄並執行nodetool刷新(https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRefresh.html) - 確保文件不重疊

    b。如果測試羣集節點不保存所有數據/數據量大 - 使用sstablloader(https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html)從快照中的文件流到測試集羣

+0

我在測試羣集上有不同的拓撲。因此'nodetool refresh'不起作用。將嘗試sstableloader,並讓你知道。 – Cypher

0

我與一般COPY TO測試您的架構和COPY FROM模式沒有分隔符,它工作正常。我已經測試了好幾次,但沒有什麼缺失。

[email protected]:cypher> INSERT INTO table1 (id, data, list1, set1) VALUES (1, 'cypher', ['a',1,'b'], {true}) ; 
[email protected]:cypher> SELECT * FROM table1 ; 

id | data | list1                                | set1 
----+--------+-----------------------------------------------------------------------------------------------------------------------------------+-------------------------------- 
    1 | cypher | [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] | {{field1: True, field2: null}} 

[email protected]:cypher> INSERT INTO table1 (id, data, list1, set1) VALUES (2, '2_cypher', ['amp','avd','ball'], {true, false}) ; 
[email protected]:cypher> SELECT * FROM table1 ; 

id | data  | list1                                 | set1 
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------- 
    1 | cypher |  [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |        {{field1: True, field2: null}} 
    2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}} 

[email protected]:cypher> COPY table1 TO 'table1.csv'; 
Using 1 child processes 

Starting copy of cypher.table1 with columns [id, data, list1, set1]. 
Processed: 2 rows; Rate:  0 rows/s; Avg. rate:  0 rows/s 
2 rows exported to 1 files in 4.358 seconds. 
[email protected]:cypher> TRUNCATE table table1 ; 
[email protected]:cypher> SELECT * FROM table1; 

id | data | list1 | set1 
----+------+-------+------ 

[email protected]:cypher> COPY table1 FROM 'table1.csv'; 
Using 1 child processes 

Starting copy of cypher.table1 with columns [id, data, list1, set1]. 
Processed: 2 rows; Rate:  2 rows/s; Avg. rate:  3 rows/s 
2 rows imported from 1 files in 0.705 seconds (0 skipped). 
[email protected]:cypher> SELECT * FROM table1 ; 

id | data  | list1                                 | set1 
----+----------+------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------- 
    1 | cypher |  [{field1: 'a', field2: null, field3: null}, {field1: '1', field2: null, field3: null}, {field1: 'b', field2: null, field3: null}] |        {{field1: True, field2: null}} 
    2 | 2_cypher | [{field1: 'amp', field2: null, field3: null}, {field1: 'avd', field2: null, field3: null}, {field1: 'ball', field2: null, field3: null}] | {{field1: False, field2: null}, {field1: True, field2: null}} 

(2 rows) 
[email protected]:cypher> 
+0

感謝您的努力。用小桌子我沒有發現任何問題。但是當我使用大表(大約100萬條)時,它開始刪除條目。 – Cypher

相關問題