2015-08-28 67 views
1

我試圖將CSV文件導入到Cassandra表中,但是我面臨一個問題。 當插入成功時,至少這是卡桑德拉所說的,我仍然看不到任何記錄。這裏是一個小的詳細信息:從Cassandra中的CSV導入時沒有插入表中的行

qlsh:recommendation_engine> COPY row_historical_game_outcome_data FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.216 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine> 

這是我的數據看起來像

'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|123123|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 
'SomeName'|673|'SomeName'|'SomeName'|'TYPE'|'M'|456456|0.20000000000000001|0.0|'GBP'|2015-07-01 00:01:42.19700|0.0| 

這是卡桑德拉版本Apache的卡桑德拉-2.2.0

編輯:

CREATE TABLE row_historical_game_outcome_data (
    customer_id int, 
    game_id int, 
    time timestamp, 
    channel text, 
    currency_code text, 
    game_code text, 
    game_name text, 
    game_type text, 
    game_vendor text, 
    progressive_winnings double, 
    stake_amount double, 
    win_amount double, 
    PRIMARY KEY ((customer_id, game_id, time)) 
) WITH bloom_filter_fp_chance = 0.01 
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' 
    AND comment = '' 
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} 
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} 
    AND dclocal_read_repair_chance = 0.1 
    AND default_time_to_live = 0 
    AND gc_grace_seconds = 864000 
    AND max_index_interval = 2048 
    AND memtable_flush_period_in_ms = 0 
    AND min_index_interval = 128 
    AND read_repair_chance = 0.0 
    AND speculative_retry = '99.0PERCENTILE'; 

我也嘗試了以下建議uri2x

,但仍然沒有:

select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
cqlsh:recommendation_engine> COPY row_historical_game_outcome_data ("game_vendor","game_id","game_code","game_name","game_type","channel","customer_id","stake_amount","win_amount","currency_code","time","progressive_winnings") FROM '/home/adelin/workspace/docs/re_raw_data2.csv' WITH DELIMITER='|'; 

2 rows imported in 0.192 seconds. 
cqlsh:recommendation_engine> select * from row_historical_game_outcome_data; 

customer_id | game_id | time | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 

(0 rows) 
+0

你能告訴我們你的'DESCRIBE TABLE嗎? – uri2x

+0

你在這裏我已經添加了表格說明。 – Adelin

+2

似乎您的列順序與CSV文件中的列順序不同(第一列不是int,第三列不是日期等)。嘗試使用COPY列名稱來匹配CSV文件的順序。 – uri2x

回答

1

好吧,我不得不改變一些事情對你的數據文件,使這項工作:

SomeName|673|SomeName|SomeName|TYPE|M|123123|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0 
SomeName|673|SomeName|SomeName|TYPE|M|456456|0.20000000000000001|0.0|GBP|2015-07-01 00:01:42|0.0 
  • 刪除了尾隨管。
  • 截斷時間縮短到秒。
  • 刪除所有單引號。

一旦我做到了,然後我執行:

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

Improper COPY command. 

這一個是有點棘手。我終於明白COPY不喜歡列名time。我調整了表使用的名稱,而不是game_time,並重新跑COPY

[email protected]:stackoverflow> DROP TABLE row_historical_game_outcome_data ; 
[email protected]:stackoverflow> CREATE TABLE row_historical_game_outcome_data (
      ...  customer_id int, 
      ...  game_id int, 
      ...  game_time timestamp, 
      ...  channel text, 
      ...  currency_code text, 
      ...  game_code text, 
      ...  game_name text, 
      ...  game_type text, 
      ...  game_vendor text, 
      ...  progressive_winnings double, 
      ...  stake_amount double, 
      ...  win_amount double, 
      ...  PRIMARY KEY ((customer_id, game_id, game_time)) 
      ...); 

[email protected]:stackoverflow> COPY row_historical_game_outcome_data 
(game_vendor,game_id,game_code,game_name,game_type,channel,customer_id,stake_amount, 
win_amount,currency_code , game_time , progressive_winnings) 
FROM '/home/aploetz/cassandra_stack/re_raw_data3.csv' WITH DELIMITER='|'; 

3 rows imported in 0.738 seconds. 
[email protected]:stackoverflow> SELECT * FROM row_historical_game_outcome_data ; 

customer_id | game_id | game_time    | channel | currency_code | game_code | game_name | game_type | game_vendor | progressive_winnings | stake_amount | win_amount 
-------------+---------+--------------------------+---------+---------------+-----------+-----------+-----------+-------------+----------------------+--------------+------------ 
     123123 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 
     456456 |  673 | 2015-07-01 00:01:42-0500 |  M |   GBP | SomeName | SomeName |  TYPE | SomeName |     0 |   0.2 |   0 

(2 rows) 
  • 我不知道爲什麼它說:「3行進口,」所以我的猜測是,它是計數標題行。
  • 您的密鑰都是分區密鑰。不知道你是否真的明白這一點。我只指出,因爲我想不出指定多個分區鍵而沒有的原因,它也指定了一個或多個集羣鍵。
  • 我在DataStax文檔中找不到任何指示「時間」是保留字的內容。這可能是一個在cqlsh中的錯誤。但嚴重的是,您應該可能將基於時間的列名稱指定爲「時間」以外的其他名稱。
+0

你調查的是真的,問題出在Informix DB生成的CSV上,但是CassandraDB應該有對其錯誤更詳細 – Adelin

0

有跡象表明,在您的CSV文件打擾cqlsh兩件事情:

  1. 刪除尾隨|在每個CSV行的末尾
  2. 從您的時間值中刪除微秒(精度應至多爲毫秒)。
1

一個其他評論。CQL中的COPY增加了WITH HEADER = TRUE,這會導致CSV文件的標題行(第一行)被忽略。 「時間」不是CQL中的保留字(相信我,因爲我只是在DataStax文檔中自己更新了CQL保留字)。但是,您確實在COPY命令的列名稱周圍顯示了列名「time」的空格,我認爲這是問題所在。沒有空格,只是逗號;在CSV文件中爲所有行執行相同操作。 (http://docs.datastax.com/en/cql/3.3/cql/cql_reference/keywords_r.html

+0

好點,CQLSH的COPY命令肯定可能會很棘手。 – Aaron