2017-08-04 258 views
2

我正在嘗試使用PyMongo構建一個Python腳本,該腳本將能夠擊中可以獲得數量可能存在於數據庫中的n個對象的精確匹配的Mongo DB。目前,我有這樣的設置:在多個文檔字段上的MongoDB精確匹配

db.entries.find({'$or': [<list-of-objects]}) 

凡對象的列表看起來是這樣的:

[{'email': '[email protected]', 'zip': '11111'}, {'email': '[email protected]', 'zip': '11112'}, ...] 

使用$or工作好時,我有在列表中10級左右的項目。我現在正在測試100個,並且要花很長時間才能返回。我曾考慮過使用多個$in過濾器,但我不知道這是否是最佳選擇。

我確定有更好的方法來處理這個問題,但我對Mongo相當陌生。

編輯:的.explain()輸出如下:

{ 
    "executionStats": { 
     "executionTimeMillis": 228734, 
     "nReturned": 2, 
     "totalKeysExamined": 0, 
     "allPlansExecution": [], 
     "executionSuccess": true, 
     "executionStages": { 
      "needYield": 0, 
      "saveState": 43556, 
      "restoreState": 43556, 
      "isEOF": 1, 
      "inputStage": { 
       "needYield": 0, 
       "saveState": 43556, 
       "restoreState": 43556, 
       "isEOF": 1, 
       "inputStage": { 
        "needYield": 0, 
        "direction": "forward", 
        "saveState": 43556, 
        "restoreState": 43556, 
        "isEOF": 1, 
        "docsExamined": 5453000, 
        "nReturned": 2, 
        "needTime": 5452999, 
        "filter": { 
         "$or": [{ 
          "$and": [{ 
           "email": { 
            "$eq": "[email protected]" 
           } 
          }, { 
           "zipcode": { 
            "$eq": "11111" 
           } 
          }] 
         }, { 
          "$and": [{ 
           "email": { 
            "$eq": "[email protected]" 
           } 
          }, { 
           "zipcode": { 
            "$eq": "11112" 
           } 
          }] 
         }] 
        }, 
        "executionTimeMillisEstimate": 208083, 
        "invalidates": 0, 
        "works": 5453002, 
        "advanced": 2, 
        "stage": "COLLSCAN" 
       }, 
       "nReturned": 2, 
       "needTime": 5452999, 
       "executionTimeMillisEstimate": 211503, 
       "transformBy": { 
        "_id": false 
       }, 
       "invalidates": 0, 
       "works": 5453002, 
       "advanced": 2, 
       "stage": "PROJECTION" 
      }, 
      "nReturned": 2, 
      "needTime": 5452999, 
      "executionTimeMillisEstimate": 213671, 
      "invalidates": 0, 
      "works": 5453002, 
      "advanced": 2, 
      "stage": "SUBPLAN" 
     }, 
     "totalDocsExamined": 5453000 
    }, 
    "queryPlanner": { 
     "parsedQuery": { 
      "$or": [{ 
       "$and": [{ 
        "email": { 
         "$eq": "[email protected]" 
        } 
       }, { 
        "zipcode": { 
         "$eq": "11111" 
        } 
       }] 
      }, { 
       "$and": [{ 
        "email": { 
         "$eq": "[email protected]" 
        } 
       }, { 
        "zipcode": { 
         "$eq": "11112" 
        } 
       }] 
      }] 
     }, 
     "rejectedPlans": [], 
     "namespace": "db.entries", 
     "winningPlan": { 
      "inputStage": { 
       "transformBy": { 
        "_id": false 
       }, 
       "inputStage": { 
        "filter": { 
         "$or": [{ 
          "$and": [{ 
           "email": { 
            "$eq": "[email protected]" 
           } 
          }, { 
           "zipcode": { 
            "$eq": "11111" 
           } 
          }] 
         }, { 
          "$and": [{ 
           "email": { 
            "$eq": "[email protected]" 
           } 
          }, { 
           "zipcode": { 
            "$eq": "11112" 
           } 
          }] 
         }] 
        }, 
        "direction": "forward", 
        "stage": "COLLSCAN" 
       }, 
       "stage": "PROJECTION" 
      }, 
      "stage": "SUBPLAN" 
     }, 
     "indexFilterSet": false, 
     "plannerVersion": 1 
    }, 
    "ok": 1.0, 
    "serverInfo": { 
     "host": "somehost", 
     "version": "3.4.6", 
     "port": 27017, 
     "gitVersion": "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5" 
    } 
} 
+0

請添加的輸出'.explain()' –

+0

@MarkusWMahlberg看到OP – xtheking

+0

查詢是有點低效率的,你正在檢查的文件5453000終於得到2個文件。爲什麼不創建1.在任何包含高基數的字段上創建索引,它可以是郵政編碼或電子郵件。 2.使用聚合管道,使用您用來創建索引的字段選擇文檔,然後您必須使用新索引過濾掉大量文檔。希望有所幫助。 – Euclides

回答

0

爲了避免編制索引和重新編制索引(這個查詢不僅僅涉及電子郵件/ zip,將是動態的),我使用每個標題構建數據列表並將它們用作$in參數,然後將這些參數傳遞給$and 。它似乎工作得很好,並沒有超過3分鐘的時間查詢。

例子:

{'$and': [{'email': {'$in': ['[email protected]', '[email protected]', '[email protected]']}, 'zipcode': {'$in': ['12345', '11111', '11112']}}]} 
1

我建議創建一個新的索引(複合指數)在你的情況,你正在使用搜索兩個領域:

db.entries.createIndex({"email": 1, "zip": 1}) 

現在在你的查詢中運行你的查詢附加explain()命令,你應該看到,而不是COLLSCAN它已經開始使用IXSCAN。