如何使用兩個相關的表優化一個簡單的LINQ查詢？

'病人'：

{ Id = 1, Surname = Smith998 } 
... 
{ Id = 1000, Surname = Smith1000 }

，第二個是 '接待'：

{ PatientId = 1, ReceptionStart = 3/3/2017 1:14:00 AM } 
{ PatientId = 1, ReceptionStart = 1/7/2016 1:14:00 AM } 
... 
{ PatientId = 1000, ReceptionStart = 1/23/2017 1:14:00 AM }

表是不能從數據庫中，但它們是使用以下示例代碼生成的：

 var rand = new Random(); 
     var receptions = Enumerable.Range(1, 1000).SelectMany(pid => Enumerable.Range(1, rand.Next(0, 10)).Select(rid => new { PatientId = pid, ReceptionStart = DateTime.Now.AddDays(-rand.Next(1, 500)) })).ToList(); 
     var patients = Enumerable.Range(1, 1000).Select(pid => new { Id = pid, Surname = string.Format("Smith{0}", pid) }).ToList();

問題是選擇在2017年1月1日前有接待的患者的最佳方式是什麼？

事業我可以寫這樣的事：

 var cured_receptions = (from r in receptions where r.ReceptionStart < new DateTime(2017, 7, 1) select r.PatientId).Distinct(); 
     var cured_patients = from p in patients where cured_receptions.Contains(p.Id) select p;

，但目前尚不清楚什麼對我「cured_receptions.Contains（p.Id）」代碼實際上呢？它只是遍歷所有搜索Id的患者，或者它使用數據庫中的索引之類的東西嗎？可以cure_receptions.ToDictionary（）或類似的東西在這裏幫助不知何故？

來源

2017-07-26 Dmitriano

你可以把兩個查詢之間的一個連接，並做到在一個單一的步驟 –

從頭開始假設只有記憶一切......

你cured_receptions不計算，直到由Contains調用，所以這將是更有效地利用放.ToList()該變量定義的結尾（大約快100倍）。
LINQ不「搜索」 - Contains正在進行搜索。如果你想使用二進制搜索或更好的哈希表，你必須創建它。如果您使用HashSet<int>，那麼您將獲得另一個47X加速。關閉Distinct（因爲HashSet將處理該問題）可節省15％。
在變量中記住常量而不是隨意創建它們（new DateTime ...）可能會節省多一點。即使大大增加隨機數據，也不會花費足夠的時間來告訴HashSet。
使用join比您的初始查詢快，但您的查詢與HashSet結合最快。

因此最快的代碼是：

var cured_receptions = new HashSet<int>((from r in receptions where r.ReceptionStart < endDateTime select r.PatientId)); 
var cured_patients = from p in patients where cured_receptions.Contains(p.Id) select p;

注：我用LINQPad生成定時和樣本數據。我改變了你的日期參數，因爲你的價值觀使得大部分的招待會都是匹配的

這裏是我的LINQPad代碼：

var rand = new Random(); 
var begin = DateTime.Now; 
var receptions = Enumerable.Range(1, 100000).SelectMany(pid => Enumerable.Range(1, rand.Next(0, 100)).Select(rid => new { PatientId = pid, ReceptionStart = begin.AddDays(-rand.Next(1, 180)) })).ToList(); 
var patients = Enumerable.Range(1, 100000).Select(pid => new { Id = pid, Surname = string.Format("Smith{0}", pid) }).ToList(); 

var startTime = Util.ElapsedTime; 
var endDateTime = new DateTime(2017, 5, 1); 
//var cured_receptions = (from r in receptions where r.ReceptionStart < new DateTime(2017, 5, 1) select r.PatientId).Distinct().ToList(); 
//var cured_receptions = (from r in receptions where r.ReceptionStart < new DateTime(2017, 5, 1) select r.PatientId).Distinct(); 
//var cured_receptions = new HashSet<int>((from r in receptions where r.ReceptionStart < new DateTime(2017, 5, 1) select r.PatientId).Distinct()); 
//var cured_receptions = new HashSet<int>((from r in receptions where r.ReceptionStart < endDateTime select r.PatientId).Distinct()); 
//var cured_receptions = new HashSet<int>((from r in receptions where r.ReceptionStart < new DateTime(2017, 5, 1) select r.PatientId)); 
var cured_receptions = new HashSet<int>((from r in receptions where r.ReceptionStart < endDateTime select r.PatientId)); 
var cured_patients = from p in patients where cured_receptions.Contains(p.Id) select p; 

// var cured_patients = (from r in receptions 
//      where r.ReceptionStart < endDateTime 
//      join p in patients on r.PatientId equals p.Id 
//      select p).Distinct(); 

// var cured_patients = from p in patients 
//      join r in receptions on p.Id equals r.PatientId into rj 
//      where rj.Any(r => r.ReceptionStart < endDateTime) 
//      select p; 

cured_patients.Count().Dump(); 
var endTime = Util.ElapsedTime; 

(endTime - startTime).Dump("Elapsed");

來源

2017-07-26 23:16:22 NetMage

我的項目在內存中，但不在數據庫中。 – Dmitriano

沒有在您的評論中發現這一點。改爲在記憶中解釋。 – NetMage

關於HashSet的好主意！但我無法弄清楚＃1 - .ToList（）和original cured_receptions之間的區別是什麼？從我的角度來看，它們都是一些帶有O（N）搜索的容器，不是嗎？ – Dmitriano

但我不清楚'cure_receptions.contains（p.Id）'代碼實際上做了什麼？它只是遍歷所有搜索Id的患者，或者它使用數據庫中的索引之類的東西嗎？

案例1：與數據庫交互

如果你用的數據庫，然後，直到通過調用它ToList()或通過遍歷項目執行第二個查詢沒有查詢將被髮送到數據庫交互在cured_patients。發送到數據庫的查詢將沿着線的東西：

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[Surname] AS [Surname] 
FROM [dbo].[Patients] AS [Extent1] 
WHERE EXISTS (SELECT 
    1 AS [C1] 
    FROM [dbo].[Receptions] AS [Extent2] 
    WHERE ([Extent2].[ReceptionStart] < 
    convert(datetime2, '2017-07-01 00:00:00.0000000', 121)) 
    AND ([Extent2].[PatientId] = [Extent1].[Id]) 
)

它會用任何指標？

是如果PatientId，Id和ReceptionStart被索引，則數據庫服務器會使用它們。

案例2：在內存

與項目互動對於第一個查詢它會遍歷所有receptions，查找其ReceptionStart是給定日期之前的那些，選擇PatientId，然後刪除任何重複PatientId（S ）。

然後第二個查詢，低於：

var cured_patients = 
    from p in patients 
    where cured_receptions.Contains(p.Id) 
    select p;

將遍歷每個項目patients，看看該項目的Id在cured_receptions被發現。對於在cured_receptions中找到Id的所有商品，它將選擇這些商品。 Contains只需返回true或false。

來源

2017-07-26 23:04:45 CodingYoshi

是否有可能時，我的項目是在內存中以某種方式優化查詢，但不在數據庫中？ LINQ是否在地圖或散列表上運行？爲什麼LINQ無法在有序集中進行二分搜索？ – Dmitriano

@ user2394762爲什麼要優化它？它是否緩慢，這段代碼是否是瓶頸？ .NET有[BinarySearch]（https://msdn.microsoft.com/en-us/library/3f90y839（v = vs.110）.aspx） – CodingYoshi

請注意，既然'cured_receptions'是一個'IEnumerable <>' - 變量定義不會迭代任何東西，只是創建堆疊的LINQ'IEnumerable'函數。 'Contains'將爲每個'p'執行'cured_receptions'，每次重新計算元素直到找到匹配'p.Id'。 – NetMage

如何使用兩個相關的表優化一個簡單的LINQ查詢？

回答

相關問題