將LINQ運行到對象GroupBy()
許多項目(千兆字節)的方法可能會耗費內存。如果IEnumerable<T>
已經按鍵排序,我們可以編寫一個不會消耗太多內存的GroupBy
。GroupBy中保存的內存
我在哪裏可以找到具有此類方法的庫?
將LINQ運行到對象GroupBy()
許多項目(千兆字節)的方法可能會耗費內存。如果IEnumerable<T>
已經按鍵排序,我們可以編寫一個不會消耗太多內存的GroupBy
。GroupBy中保存的內存
我在哪裏可以找到具有此類方法的庫?
框架中沒有任何東西可以做到這一點。如果你需要一個完整的IGrouping<,>
實施這將是稍硬
static IEnumerable<IList<TElement>> GroupByChanges<TElement, TKey>
(this IEnumerable<TElement> source,
Func<TElement, TKey> projection)
{
// TODO: Argument validation, splitting this into two methods
// to achieve eager validation.
// TODO: Allow a custom comparer to be used, possibly even
// an IComparer<T> instead of an IEqualityComparer<T>
IEqualityComparer<TKey> comparer = EqualityComparer<TKey>.Default;
using (IEnumerator<TElement> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TKey currentKey = projection(iterator.Current);
IList<TElement> currentList = new List<TElement> { iterator.Current };
while (iterator.MoveNext())
{
TKey key = projection(iterator.Current);
if (!comparer.Equals(currentKey, key))
{
yield return currentList;
currentList = new List<TElement>();
}
currentList.Add(iterator.Current);
}
yield return currentList;
}
}
- 但你總是搶我Edulinq implementation:如果您不需要實際IGrouping<,>
您可以使用此。
的GroupByChanges
的實施將非常小的改變 - 只是改變了currentList
分配中的關鍵傳遞給Grouping
構造:
Grouping<TKey, TElement> currentGroup = new Grouping<TKey, TElement>(currentKey)
{ iterator.Current };
您的問題非常具體。你很難找到一個已經這樣做的庫。如果您的物品是按照您用來分組的鍵排序的,那麼您自己對該列表進行「分組」,這是一項近乎平凡的任務。
你可以很容易地實現它自己:
public static class Extensions
{
public static IEnumerable<IGrouping<TKey, TSource>> GroupByAlreadyOrdered<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.GroupByAlreadyOrdered(keySelector, null);
}
public static IEnumerable<IGrouping<TKey, TSource>> GroupByAlreadyOrdered<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey> comparer)
{
TKey currentKey = default(TKey);
bool first = true;
List<TSource> currentGroup = null;
comparer = comparer ?? EqualityComparer<TKey>.Default;
foreach (var item in source)
{
TKey key = keySelector(item);
if (first || !comparer.Equals(key, currentKey))
{
if (currentGroup != null && currentGroup.Any())
{
yield return new Grouping<TKey, TSource>(currentKey, currentGroup);
}
currentGroup = new List<TSource>();
}
currentGroup.Add(item);
first = false;
currentKey = key;
}
// Last group
if (currentGroup != null && currentGroup.Any())
{
yield return new Grouping<TKey, TSource>(currentKey, currentGroup);
}
}
private class Grouping<TKey, TElement> : IGrouping<TKey, TElement>
{
private readonly TKey _key;
private readonly IEnumerable<TElement> _elements;
public Grouping(TKey key, IEnumerable<TElement> elements)
{
_key = key;
_elements = elements;
}
public TKey Key
{
get { return _key; }
}
public IEnumerator<TElement> GetEnumerator()
{
return _elements.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
您的代碼有效。它比'GroupBy'稍慢,而我的版本比你的和.NET的都快 – 2011-05-23 12:56:12
像托馬斯'但稍快
public static IEnumerable<IGrouping<TKey, TSource>> FastGroupBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
using (var enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
Grouping<TKey, TSource> grouping;
List<TSource> list = new List<TSource>();
TKey key = keySelector(enumerator.Current);
list.Add(enumerator.Current);
while (enumerator.MoveNext())
{
var currentKey = keySelector(enumerator.Current);
if (key.Equals(currentKey))
{
list.Add(enumerator.Current);
continue;
}
grouping = new Grouping<TKey, TSource>(key, list);
yield return grouping;
key = currentKey;
list = new List<TSource>();
list.Add(enumerator.Current);
}
grouping = new Grouping<TKey, TSource>(key, list);
yield return grouping;
}
}
}
我不認爲這是特定的。 – 2011-05-23 12:19:53