我們試圖複製此ES插件。原因是AWS ES不允許安裝任何自定義插件。該插件只是做點積和餘弦相似性,所以我猜測它應該非常簡單,在painless
腳本中複製它。它看起來像groovy
腳本在5.0中已棄用。如何在無痛腳本中執行此操作Elasticsearch 5.3
這裏是插件的源代碼。
/**
* @param params index that a scored are placed in this parameter. Initialize them here.
*/
@SuppressWarnings("unchecked")
private PayloadVectorScoreScript(Map<String, Object> params) {
params.entrySet();
// get field to score
field = (String) params.get("field");
// get query vector
vector = (List<Double>) params.get("vector");
// cosine flag
Object cosineParam = params.get("cosine");
if (cosineParam != null) {
cosine = (boolean) cosineParam;
}
if (field == null || vector == null) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": field or vector parameter missing!");
}
// init index
index = new ArrayList<>(vector.size());
for (int i = 0; i < vector.size(); i++) {
index.add(String.valueOf(i));
}
if (vector.size() != index.size()) {
throw new IllegalArgumentException("cannot initialize " + SCRIPT_NAME + ": index and vector array must have same length!");
}
if (cosine) {
// compute query vector norm once
for (double v: vector) {
queryVectorNorm += Math.pow(v, 2.0);
}
}
}
@Override
public Object run() {
float score = 0;
// first, get the ShardTerms object for the field.
IndexField indexField = this.indexLookup().get(field);
double docVectorNorm = 0.0f;
for (int i = 0; i < index.size(); i++) {
// get the vector value stored in the term payload
IndexFieldTerm indexTermField = indexField.get(index.get(i), IndexLookup.FLAG_PAYLOADS);
float payload = 0f;
if (indexTermField != null) {
Iterator<TermPosition> iter = indexTermField.iterator();
if (iter.hasNext()) {
payload = iter.next().payloadAsFloat(0f);
if (cosine) {
// doc vector norm
docVectorNorm += Math.pow(payload, 2.0);
}
}
}
// dot product
score += payload * vector.get(i);
}
if (cosine) {
// cosine similarity score
if (docVectorNorm == 0 || queryVectorNorm == 0) return 0f;
return score/(Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
} else {
// dot product score
return score;
}
}
我想從索引中獲得一個字段開始。但我得到錯誤。
這裏是我的索引的形狀。
我已經啓用delimited_payload_filter
"settings" : {
"analysis": {
"analyzer": {
"payload_analyzer": {
"type": "custom",
"tokenizer":"whitespace",
"filter":"delimited_payload_filter"
}
}
}
}
而且我有一個字段名爲@model_factor
存儲載體。
{
"movies" : {
"properties" : {
"@model_factor": {
"type": "text",
"term_vector": "with_positions_offsets_payloads",
"analyzer" : "payload_analyzer"
}
}
}
}
這是文檔
{
"@model_factor":"0|1.2 1|0.1 2|0.4 3|-0.2 4|0.3",
"name": "Test 1"
}
的形狀,下面是我如何使用腳本
{
"query": {
"function_score": {
"query" : {
"query_string": {
"query": "*"
}
},
"script_score": {
"script": {
"inline": "def termInfo = doc['_index']['@model_factor'].get('1', 4);",
"lang": "painless",
"params": {
"field": "@model_factor",
"vector": [0.1,2.3,-1.6,0.7,-1.3],
"cosine" : true
}
}
},
"boost_mode": "replace"
}
}
}
這是我的錯誤。
"failures": [
{
"shard": 2,
"index": "test",
"node": "ShL2G7B_Q_CMII5OvuFJNQ",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"caused_by": {
"type": "wrong_method_type_exception",
"reason": "wrong_method_type_exception: cannot convert MethodHandle(List,int)int to (Object,String)String"
},
"script_stack": [
"termInfo = doc['_index']['@model_factor'].get('1',4);",
" ^---- HERE"
],
"script": "def termInfo = doc['_index']['@model_factor'].get('1',4);",
"lang": "painless"
}
}
]
問題是如何訪問索引字段以獲得@model_factor
無痛腳本?