2015-06-21 77 views
3

我想用csv將一個非常簡單的有向圖文件導入到OrientDB中。具體來說,該文件是SNAP集合https://snap.stanford.edu/data/roadNet-PA.html中的roadNet-PA數據集。是該文件的第一行如下:使用OrientDB將簡單csv文件導入到圖形的最簡單方法ETL

# Directed graph (each unordered pair of nodes is saved once) 
# Pennsylvania road network 
# Nodes: 1088092 Edges: 3083796 
# FromNodeId ToNodeId 
0  1 
0  6309 
0  6353 
1  0 
6353 0 
6353 6354 

只有一種類型的頂點(道路交叉口),且邊緣有沒有信息(我想OrientDB輕巧邊緣我們的最佳選擇)。還要注意,頂點與選項卡分開。

我試着創建一個簡單的etl來導入文件沒有成功。這裏是etl:

{ 
    "config": { 
    "log": "debug" 
    }, 
    "source" : { 
    "file": { "path": "/tmp/roadNet-PA.csv" } 
    }, 
    "extractor": { "row": {} }, 
    "transformers": [ 
    { "csv": { "separator": " ", "skipFrom": 1, "skipTo": 4 } }, 
    { "vertex": { "class": "Intersection" } }, 
    { "edge": { "class": "Road" } } 
    ], 
    "loader": { 
    "orientdb": { 
     "dbURL": "remote:localhost/roads", 
     "dbType": "graph", 
     "classes": [ 
     {"name": "Intersection", "extends": "V"}, 
     {"name": "Road", "extends": "E"} 
     ], "indexes": [ 
     {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" } 
     ] 
    } 
    } 
} 

etl的工作原理,但它並沒有像我期望的那樣導入文件。我想問題在於變形金剛。我的想法是逐行讀取csv,並創建和邊連接兩個頂點,但我不知道如何在etl文件中表達它。有任何想法嗎?

回答

1

試試這個:

{ 
    "config": { 
    "log": "debug" 
    }, 
    "source" : { 
    "file": { "path": "/tmp/roadNet-PA.csv" } 
    }, 
    "extractor": { "row": {} }, 
    "transformers": [ 
    { "csv": { "separator": "\t", "skipFrom": 1, "skipTo": 4, 
       "columnsOnFirstLine": false, 
       "columns":["id", "to"] } }, 
    { "vertex": { "class": "Intersection" } }, 
    { "merge": { "joinFieldName":"id", "lookup":"Intersection.id" } }, 
    { "edge": { 
     "class": "Road", 
     "joinFieldName": "to", 
     "lookup": "Intersection.id", 
     "unresolvedLinkAction": "CREATE" 
     } 
    }, 
    ], 
    "loader": { 
    "orientdb": { 
     "dbURL": "remote:localhost/roads", 
     "dbType": "graph", 
     "wal": false, 
     "batchCommit": 1000, 
     "tx": true, 
     "txUseLog": false, 
     "useLightweightEdges" : true, 
     "classes": [ 
     {"name": "Intersection", "extends": "V"}, 
     {"name": "Road", "extends": "E"} 
     ], "indexes": [ 
     {"class":"Intersection", "fields":["id:integer"], "type":"UNIQUE" } 
     ] 
    } 
    } 
} 

提速裝載我建議你關閉服務器,並通過使用「plocal:」導入ETL,而不是「遙遠」。用以下示例替換存在:

 "dbURL": "plocal:/orientdb/databases/roads", 
+0

感謝您的回答。我不確定我是否做錯了,但是我發現了兩個錯誤。首先,由於第一行傳遞給變換器,skipFrom和skipTo配置不起作用。我已經手動刪除了這些行,並且發現了第二個問題:OrientVertex不能被轉換爲ODocument。這裏是日誌http://pastebin.com/i6QGRcUV –

+1

嘗試移動頂點之前的合併 – Lvca

1

它終於奏效了。我已經按照Luca的建議在頂點線之前移動了合併。我還將'id'字段更改爲'from'以避免錯誤「屬性鍵爲所有元素ID保留」。這裏是摘錄:

{ 
    "config": { 
    "log": "debug" 
    }, 
    "source" : { 
    "file": { "path": "/tmp/roads.csv" } 
    }, 
    "extractor": { "row": {} }, 
    "transformers": [ 
    { "csv": { "separator": "\t", 
       "columnsOnFirstLine": false, 
       "columns":["from", "to"] } }, 
    { "merge": { "joinFieldName":"from", "lookup":"Intersection.from" } }, 
    { "vertex": { "class": "Intersection" } }, 
    { "edge": { 
     "class": "Road", 
     "joinFieldName": "to", 
     "lookup": "Intersection.from", 
     "unresolvedLinkAction": "CREATE" 
     } 
    }, 
    ], 
    "loader": { 
    "orientdb": { 
     "dbURL": "remote:localhost/roads", 
     "dbType": "graph", 
     "wal": false, 
     "batchCommit": 1000, 
     "tx": true, 
     "txUseLog": false, 
     "useLightweightEdges" : true, 
     "classes": [ 
     {"name": "Intersection", "extends": "V"}, 
     {"name": "Road", "extends": "E"} 
     ], "indexes": [ 
     {"class":"Intersection", "fields":["from:integer"], "type":"UNIQUE" } 
     ] 
    } 
    } 
}