如何將查詢結果獲取到保留分層結構的列的數據框?這樣的列:從ElasticSearch-JSON文件獲取數據到Python
type|postDate|discussionTitle|courses|subjectKeywords|SentiStrength|SentiWordNet|universities|universityKeywords|
我有一個elasticSearch與大約1,000,000 JSOn文檔。 我想用這個數據集用於Python的自然語言處理(NLP)。 有人可以請我幫助我如何從elasticsearch中獲取數據到Python並將數據寫回Python的elasticsearch。 非常感謝,因爲我無法對數據集執行任何NLP,因爲我無法使用它來連接Python。 這就是elasticsearch的索引結構:
我想在「層次結構信息」 中輸入層次結構中的新索引,並且此新索引將根據一組我給的關鍵字 - 就像「universityKeywords」一樣,每個jason文件都應該存儲標籤使用的關鍵字集合。 我要標記的數據集分爲「過程信息」 - 提上了JSON文件named-應用,報價,擴招,基於關鍵字的JSON文件後標題要求4個標籤或分類和發佈文本
"educationforumsenriched2": {
"mappings": {
"whirlpool": {
"properties": {
"CourseInfo": {
"properties": {
"courses": {
"type": "string",
"index": "not_analyzed"
},
"subjectKeywords": {
"type": "string",
"index": "not_analyzed"
}
}
},
"SentimentInfo": {
"properties": {
"SentiStrength": {
"type": "float"
},
"SentiWordNet": {
"type": "float"
}
}
},
"UniversityInfo": {
"properties": {
"universities": {
"type": "string",
"index": "not_analyzed"
},
"universityKeywords": {
"type": "string",
"index": "not_analyzed"
}
}
},
"postDate": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"postID": {
"type": "integer"
},
"postText": {
"type": "string"
},
"references": {
"type": "string"
},
"threadID": {
"type": "integer"
},
"threadTitle": {
"type": "string"
}
}
},
"atarnotes": {
"properties": {
"CourseInfo": {
"properties": {
"courses": {
"type": "string",
"index": "not_analyzed"
},
"subjectKeywords": {
"type": "string",
"index": "not_analyzed"
}
}
},
"SentimentInfo": {
"properties": {
"SentiStrength": {
"type": "float"
},
"SentiWordNet": {
"type": "float"
}
}
},
"UniversityInfo": {
"properties": {
"universities": {
"type": "string",
"index": "not_analyzed"
},
"universityKeywords": {
"type": "string",
"index": "not_analyzed"
}
}
},
"discussionTitle": {
"type": "string"
},
"postDate": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"postID": {
"type": "integer"
},
"postText": {
"type": "string"
},
"query": {
"properties": {
"match_all": {
"type": "object"
}
}
},
"threadID": {
"type": "integer"
},
"threadTitle": {
"type": "string"
}
}
}
}
}
}
這是我用於創建基於Java的過程中信息的標籤我想做同樣在Python
processMap.put("Applications", new ArrayList<>(Arrays.asList("apply", "applied", "applicant", "applying", "application", "applications")));
processMap.put("Offers", new ArrayList<>(Arrays.asList("offers", "offer", "offered", "offering")));
processMap.put("Enrollment", new ArrayList<>(Arrays.asList("enrolling","enroled","enroll", "enrolment", "enrollment","enrol","enrolled")));
processMap.put("Requirements", new ArrayList<>(Arrays.asList("requirement","requirements", "require")));
[Python Elasticsearch客戶端](https://elasticsearch-py.readthedocs.io/en/master/)? –
pyelasticsearch?我已經安裝了軟件包 - 但無法弄清楚如何讓這個數據集到Python。一個小例子將非常有用。這是我elasticsearch指數的映射結構: –
「educationforumsenriched2」:{ 「映射」:{ 「漩渦」:{ 「屬性」:{ 「CourseInfo」:{.. –