0
我有7個節點具有2個索引的彈性搜索集羣,並且都具有嵌套的對象映射。我被延遲插入到索引2(通過火花流)。我正在使用批量插入,每個批次需要〜8-12s(〜100k記錄)。彈性搜索:在大型數據集上性能較差
Node Configuration:
RAM: 64 GB
Core: 48
HDD : 1 TB
JVM allocated Memory: 32 GB
Marvel Node Status:
CPU Usages: ~10-20%
JVM Memory: ~60-75%
Load Average : ~3-35
Indexing Rate: ~10k/s
Search Rate: ~2k/s
Index1 (Replication 1):
Status: green
Documents: 84.4b
Data: 9.3TB
Total Shards: 400 (Could it be the reason of low performance)
Index2 (Replication 1):
Status: green
Documents: 1.4b
Data: 35.8GB
Total Shards: 10
Unassigned Shards: 0
Spark streaming configuration:
executors: 2
Executor core per executor: 8
Number of partition: 16
batch size: 10s
Event per batch: ~1k-200k
Elastic Bulk insert count: 100k
索引2映射:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
},
"mappings": {
"parent_list": {
"_all": {
"enabled": false
},
"properties": {
"parents": {
"type": "nested",
"properties": {
"parent_id": {
"type": "integer",
"doc_values": false
},
"childs": {
"type": "nested",
"properties": {
"child_id": {
"type": "integer",
"doc_values": false
},
"timestamp": {
"type": "long",
"doc_values": false
},
"is_deleted": {
"type": "boolean",
"doc_values": false
}
}
}
}
},
"other_ID": {
"type": "string",
"index": "not_analyzed",
"doc_values": false
}
}
}
}
}
我的查詢:
- 獲取數由父ID與至少一個孩子IS_DELETED假。
- 通過is_deleted爲false的子ID獲取計數。通過_id
100K文件的批量處理呢聽起來很像。你可以降低並再試一次嗎? –
我嘗試了10k,但是並沒有提高很多 –
@AndreiStefan Index1有400個分片。這可能是低績效的原因嗎?預期的插入率應該是多少? –