2015-07-12 84 views
1

我們在生產代碼中經常遇到這種異常,而不會增加對Couchbase的請求數量或服務器本身的任何內存壓力。 節點已經分配了30GB的RAM,並且使用量最大爲3GB,但是現在每隔一段時間都會拋出此異常。桶在每個應用程序生命週期中僅打開一次,之後僅執行獲取和插入操作。該連接初始化這樣的:.NET SDK中的Couchbase NodeUnavailableException

Config = new ClientConfiguration() 
{ 
    Servers = serverList, 

    UseSsl = false, 
    DefaultOperationLifespan = 2500, 
    BucketConfigs = new Dictionary<string, BucketConfiguration> 
    { 
     { bucketName, new BucketConfiguration 
     { 
      BucketName = bucketName, 
      UseSsl = false, 
      DefaultOperationLifespan = 2500, 
      PoolConfiguration = new PoolConfiguration 
      { 
      MaxSize = 2000, 
      MinSize = 200, 
      SendTimeout = (int)Configuration.Config.Instance.CouchbaseConfig.Timeout 
      } 
    }} 
    } 
}; 

Cluster = new Cluster(Config); 
Bucket = Cluster.OpenBucket(); 

能否請您讓我知道如果這個初始化是正確的,更重要的是什麼檢查Couchbase服務器上找到這個問題的原因是什麼?我檢查了服務器上的所有日誌,但在發現這些錯誤時找不到任何特別的東西。

謝謝

堆棧跟蹤:

System.Exception.Couchbase exception 
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get() 
at ###.API.Services.BaseService`1.SetUserID() 
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.EventsService.GetResponse() 
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.BaseService`1.Any() 
at lambda_method() 
at ServiceStack.Host.ServiceRunner`1.Execute() 
at ServiceStack.Host.ServiceRunner`1.Process() 
at ServiceStack.Host.ServiceExec`1.Execute() 
at ServiceStack.Host.ServiceRequestExec`2.Execute() 
at ServiceStack.Host.ServiceController.ManagedServiceExec() 
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f() 
at ServiceStack.Host.ServiceController.Execute() 
at ServiceStack.HostContext.ExecuteService() 
at ServiceStack.Host.RestHandler.ProcessRequestAsync() 
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest() 
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 
at System.Web.HttpApplication.ExecuteStep() 
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps() 
at System.Web.HttpApplication.BeginProcessRequestNotification() 
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
Caused by: System.Exception : Couchbase.Core.NodeUnavailableException: The node 172.31.34.105:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception. 
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get() 
at ###.API.Services.BaseService`1.SetUserID() 
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.EventsService.GetResponse() 
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext() 
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start() 
at ###.API.Services.BaseService`1.Any() 
at lambda_method() 
at ServiceStack.Host.ServiceRunner`1.Execute() 
at ServiceStack.Host.ServiceRunner`1.Process() 
at ServiceStack.Host.ServiceExec`1.Execute() 
at ServiceStack.Host.ServiceRequestExec`2.Execute() 
at ServiceStack.Host.ServiceController.ManagedServiceExec() 
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f() 
at ServiceStack.Host.ServiceController.Execute() 
at ServiceStack.HostContext.ExecuteService() 
at ServiceStack.Host.RestHandler.ProcessRequestAsync() 
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest() 
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() 
at System.Web.HttpApplication.ExecuteStep() 
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps() 
at System.Web.HttpApplication.BeginProcessRequestNotification() 
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper() 
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification() 
+0

您是否有堆棧跟蹤? – rene

+0

嗨@rene。我現在用stacktrace更新了這個問題。謝謝 –

+1

我不是一個couchbase用戶,但我希望你需要研究網絡連接,所以你的客戶端代碼或服務器端設置不會有太大的錯誤,而是客戶端和服務器之間的一個網絡組件拒絕連接暫時。 – rene

回答

1

一個NodeUnavailableException可以返回任意數量的網絡相關的問題......不過,既然你提到你在AWS上運行,這是有可能的TCP保持活動設置需要在客戶端進行調整。

您的MinSize連接(200)非常大,您不太可能全部使用它們,並且它們一直閒置,直到AWS LB決定關閉它們。發生這種情況時,SDK會暫時將失敗的節點(1000毫秒)置於關閉狀態,然後嘗試重新連接。在此期間,映射到它的任何鍵都將失敗,並出現該異常。

該博客介紹瞭如何設置TCP保持有效指示時間和間隔:http://blog.couchbase.com/introducing-couchbase-.net-sdk-2.1.0-the-asynchronous-couchbase-.net-client

var config = new ClientConfiguration 
{ 
    EnableTcpKeepAlives = true, //default it true 
    TcpKeepAliveTime = 1000*60*60, //set to 60mins 
    TcpKeepAliveInterval = 5000 //KEEP ALIVE will be sent every 5 seconds after 1hr 
}; 
var cluster = new Cluster(config); 
var bucket = cluster.OpenBucket(); 

這假定您使用的版本2.1.0或更高版本的客戶端。如果你不是,你可以做到這一點通過ServicePointManager:

//setting keep-alive time to 200 seconds 
ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); 

你必須設置,要不到什麼AWS LB被設定爲一個值(我相信這是60秒)。

你也應該可能設置你的連接池的最小值和最大值低一點,像5和10

+0

Hello @jeffrymorris。感謝您的回答,但不幸的是,您提出的更改並未解決問題。 couchbase服務器不在AWS負載平衡器下,因此不能作爲源。我們也減少了連接數量,但仍然沒有運氣。 couchbase服務器安裝在Ubuntu實例上。你知道我們是否需要修改操作系統上的任何東西? –

+0

我們已經監視了TCP連接,並且最少連接了5個連接,最多連接了20個,我們只看到3個端口被打開(https://www.dropbox.com/s/fkw0rika8a8wtv1/Screenshot%202015-07-14%2018.25 .01.png?DL = 0)。幾分鐘前有4個端口,當發生異常時,其中一個消失了。主要問題是他們也沒有重生,當發生這種情況時,我們的數據庫反應非常緩慢。你怎麼看? –

+0

@RaduCotofana - 你使用的是什麼版本的服務器?另外,如果啓用客戶端日誌記錄(http://docs.couchbase.com/developer/dotnet-2.1/setting-up-logging.html),則應該能夠記錄觸發NodeUnavailableException的實際異常。緩慢的反應可能是連接超時和失敗,然後重建自己......它需要15-20秒。 – jeffrymorris

0

即使問題並沒有完全解決,因爲我們仍然遇到超時,但在更低的價格,我們增加了性能通過使用ClusterHelper單例實例如下:

ClusterHelper.Initialize(
      new ClientConfiguration 
      { 
       Servers = serverList, 
       UseSsl = false, 
       DefaultOperationLifespan = 2500, 
       EnableTcpKeepAlives = true, 
       TcpKeepAliveTime = 1000*60*60, 
       TcpKeepAliveInterval = 5000, 
       BucketConfigs = new Dictionary<string, BucketConfiguration> 
       { 
        { 
         "default", 
         new BucketConfiguration 
         { 
          BucketName = "default", 
          UseSsl = false, 
          Password = "", 
          PoolConfiguration = new PoolConfiguration 
          { 
           MaxSize = 50, 
           MinSize = 10 
          } 
         } 
        } 
       } 
      });