如何阻塞，直到異步作業完成

我正在研究C＃庫，它使用NVIDIA的CUDA將某些工作任務卸載到GPU。這樣的一個例子是添加一起使用的擴展方法的兩個陣列：如何阻塞，直到異步作業完成

float[] a = new float[]{ ... } 
float[] b = new float[]{ ... } 
float[] c = a.Add(b);

在此代碼的工作是在GPU上完成的。但是，我希望它是異步完成的，只有當需要結果時纔會在CPU模塊上運行代碼（如果結果尚未在GPU上完成）。爲此，我創建了一個隱藏異步執行的ExecutionResult類。在使用中，這看起來如下：

float[] a = new float[]{ ... } 
float[] b = new float[]{ ... } 
ExecutionResult res = a.Add(b); 
float[] c = res; //Implicit converter

在最後一行，如果數據完成準備，程序會阻塞。我不確定在ExecutionResult類中實現這種阻塞行爲的最佳方式，因爲我對於同步線程和這些類型不是很熟悉。

public class ExecutionResult<T> 
{ 
    private T[] result; 
    private long computed = 0; 

    internal ExecutionResult(T[] a, T[] b, Action<T[], T[], Action<T[]>> f) 
    { 
     f(a, b, UpdateData); //Asych call - 'UpdateData' is the callback method 
    } 

    internal void UpdateData(T[] data) 
    { 
     if (Interlocked.Read(ref computed) == 0) 
     { 
      result = data; 
      Interlocked.Exchange(ref computed, 1); 
     } 
    } 

    public static implicit operator T[](ExecutionResult<T> r) 
    { 
     //This is obviously a stupid way to do it 
     while (Interlocked.Read(ref r.computed) == 0) 
     { 
      Thread.Sleep(1); 
     } 

     return result; 
    } 
}

傳遞給構造函數的Action是一個在GPU上執行實際工作的異步方法。嵌套的Action是異步回調方法。

我主要關心的是如何最好/最優雅地處理轉換器中的等待問題，以及是否有更合適的方法來整體解決問題。如果需要進一步闡述或解釋，請留下評論。

來源

2008-10-31 Morten Christiansen

我想知道你是否不能在這裏使用常規的Delegate.BeginInvoke/Delegate.EndInvoke？如果沒有，那麼等待句柄（如ManualResetEvent）可能是一種選擇：

using System.Threading; 
static class Program { 
    static void Main() 
    { 
     ThreadPool.QueueUserWorkItem(DoWork); 

     System.Console.WriteLine("Main: waiting"); 
     wait.WaitOne(); 
     System.Console.WriteLine("Main: done"); 
    } 
    static void DoWork(object state) 
    { 
     System.Console.WriteLine("DoWork: working"); 
     Thread.Sleep(5000); // simulate work 
     System.Console.WriteLine("DoWork: done"); 
     wait.Set(); 
    } 
    static readonly ManualResetEvent wait = new ManualResetEvent(false); 

}

需要注意的是，你可以這樣做只是使用對象，如果你真的想：

using System.Threading; 
static class Program { 
    static void Main() 
    { 
     object syncObj = new object(); 
     lock (syncObj) 
     { 
      ThreadPool.QueueUserWorkItem(DoWork, syncObj); 

      System.Console.WriteLine("Main: waiting"); 
      Monitor.Wait(syncObj); 
      System.Console.WriteLine("Main: done"); 
     } 
    } 
    static void DoWork(object syncObj) 
    { 

     System.Console.WriteLine("DoWork: working"); 
     Thread.Sleep(5000); // simulate work 
     System.Console.WriteLine("DoWork: done"); 
     lock (syncObj) 
     { 
      Monitor.Pulse(syncObj); 
     } 
    } 

}

來源

2008-10-31 11:30:23

目前尚不清楚到我有多少這是你正在實現的框架，以及你調用其他代碼的次數，但我會盡量使用.NET中的"normal" async pattern。

來源

2008-10-31 11:30:54

我發現問題的解決方案是將函數傳遞給ExecutionResult構造函數，該構造函數執行兩件事。當運行時，它會啓動異步工作，此外，它返回另一個功能，它返回所期望的結果：

private Func<T[]> getResult; 

internal ExecutionResult(T[] a, T[] b, Func<T[], T[], Func<T[]>> asynchBinaryFunction) 
{ 
    getResult = asynchUnaryFunction(a); 
} 

public static implicit operator T[](ExecutionResult<T> r) 
{ 
    return r.getResult(); 
}

在「的getResult」的功能塊，直到數據已被計算並從GPU獲取。這與CUDA驅動程序API的結構非常協調。

這是一個相當乾淨和簡單的解決方案。由於C＃允許匿名函數與訪問本地範圍來創建它是簡單地更換傳遞給ExecutionResult構造，使得...

... 

    status = LaunchGrid(func, length); 

    //Fetch result 
    float[] c = new float[length]; 
    status = CUDADriver.cuMemcpyDtoH(c, ptrA, byteSize); 
    status = Free(ptrA, ptrB); 

    return c; 
}

成爲一個方法的閉塞部的事...

... 

    status = LaunchGrid(func, length); 

    return delegate 
    { 
     float[] c = new float[length]; 
     CUDADriver.cuMemcpyDtoH(c, ptrA, byteSize); //Blocks until work is done 
     Free(ptrA, ptrB); 
     return c; 
    }; 
}

來源

2008-11-03 10:25:16

使用cudaThreadSyncronize（）或memcpy（）可以預設同步操作 - 適用於Invoke（）。

CUDA還允許您使用callAsync（）/ sync（）來請求異步內存傳輸 - 適用於使用callAsync（）的Begin/EndInvoke（）。

來源

2009-09-19 23:44:44

如何阻塞，直到異步作業完成

回答

相關問題