2016-08-11 66 views
0

我想在Intel Xeon E5-2620v4的Debian 8.5上安裝扭矩6.0.2。然而,當我嘗試啓動使pbs_server我返回段故障,使用gdb:pbs_server,E5-2620v4和一般保護

#1 0x0000000000440ab6 in container::item_container<pbsnode*>::unlock (this=0xb5d900 <allnodes>) at ../../src/include/container.hpp:537 
#2 0x00000000004b787f in mom_hierarchy_handler::nextNode (this=0x4e610c0 <hierarchy_handler>, iter=0x7fffffff98b8) at mom_hierarchy_handler.cpp:122 
#3 0x00000000004b7a7d in mom_hierarchy_handler::make_default_hierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:149 
#4 0x00000000004b898d in mom_hierarchy_handler::loadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:433 
#5 0x00000000004b8ae8 in mom_hierarchy_handler::initialLoadHierarchy (this=0x4e610c0 <hierarchy_handler>) at mom_hierarchy_handler.cpp:472 
#6 0x000000000045262a in pbsd_init (type=1) at pbsd_init.c:2299 
#7 0x00000000004591ff in main (argc=2, argv=0x7fffffffdec8) at pbsd_main.c:1883 

的dmesg:

traps: pbs_server[22249] general protection ip:7f9c08a7a2c8 sp:7ffe520b5238 error:0 in libpthread-2.19.so[7f9c08a69000+18000] 

的valgrind:

==22381== Memcheck, a memory error detector 
==22381== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. 
==22381== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info 
==22381== Command: pbs_server 
==22381== 
==22381== 
==22381== HEAP SUMMARY: 
==22381==  in use at exit: 18,051 bytes in 53 blocks 
==22381== total heap usage: 169 allocs, 116 frees, 42,410 bytes allocated 
==22381== 
==22382== 
==22382== HEAP SUMMARY: 
==22382==  in use at exit: 19,755 bytes in 56 blocks 
==22382== total heap usage: 172 allocs, 116 frees, 44,114 bytes allocated 
==22382== 
==22381== LEAK SUMMARY: 
==22381== definitely lost: 0 bytes in 0 blocks 
==22381== indirectly lost: 0 bytes in 0 blocks 
==22381==  possibly lost: 0 bytes in 0 blocks 
==22381== still reachable: 18,051 bytes in 53 blocks 
==22381==   suppressed: 0 bytes in 0 blocks 
==22381== Rerun with --leak-check=full to see details of leaked memory 
==22381== 
==22381== For counts of detected and suppressed errors, rerun with: -v 
==22381== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 
==22383== 
==22383== Process terminating with default action of signal 11 (SIGSEGV) 
==22383== General Protection Fault 
==22383== at 0x72192CB: __lll_unlock_elision (elision-unlock.c:33) 
==22383== by 0x4E7E1A: unlock_node(pbsnode*, char const*, char const*, int) (u_lock_ctl.c:268) 
==22383== by 0x4B7A66: mom_hierarchy_handler::make_default_hierarchy() (mom_hierarchy_handler.cpp:164) 
==22383== by 0x4B898C: mom_hierarchy_handler::loadHierarchy() (mom_hierarchy_handler.cpp:433) 
==22383== by 0x4B8AE7: mom_hierarchy_handler::initialLoadHierarchy() (mom_hierarchy_handler.cpp:472) 
==22383== by 0x452629: pbsd_init(int) (pbsd_init.c:2299) 
==22383== by 0x4591FE: main (pbsd_main.c:1883) 
==22382== LEAK SUMMARY: 
==22382== definitely lost: 0 bytes in 0 blocks 
==22382== indirectly lost: 0 bytes in 0 blocks 
==22382==  possibly lost: 0 bytes in 0 blocks 
==22382== still reachable: 19,755 bytes in 56 blocks 
==22382==   suppressed: 0 bytes in 0 blocks 
==22382== Rerun with --leak-check=full to see details of leaked memory 
==22382== 
==22382== For counts of detected and suppressed errors, rerun with: -v 
==22382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 
==22383== 
==22383== HEAP SUMMARY: 
==22383==  in use at exit: 325,348 bytes in 186 blocks 
==22383== total heap usage: 297 allocs, 111 frees, 442,971 bytes allocated 
==22383== 
==22383== LEAK SUMMARY: 
==22383== definitely lost: 134 bytes in 6 blocks 
==22383== indirectly lost: 28 bytes in 3 blocks 
==22383==  possibly lost: 524 bytes in 17 blocks 
==22383== still reachable: 324,662 bytes in 160 blocks 
==22383==   suppressed: 0 bytes in 0 blocks 
==22383== Rerun with --leak-check=full to see details of leaked memory 
==22383== 
==22383== For counts of detected and suppressed errors, rerun with: -v 
==22383== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) 
~ 

沒有其他軟件有這樣的行爲,我測試該機器滿負荷2天沒有問題。已經嘗試更新處理器微碼。請問,任何人都有這種扭矩6.0.2或其他情況下的行爲?

此致敬禮。

+1

所以它有一個錯誤。報告給開發者。 –

+0

有必要重新編譯glibc --enable-lock-elision = no,我相信是intel xeon微代碼上的一個bug – user3821884

回答

1

這不是微碼錯誤。無論您運行的是哪種軟件(並且glibc/libpthreads中的而不是),它都是完全鎖定的平衡問題。

不要試圖解鎖已經解鎖的鎖。這是被禁止的行爲,也是陷阱的原因。

由於性能方面的原因,glibc並不費心去測試它和段錯誤,所以很多破碎的代碼在很長一段時間內都沒有了。 lock elision的硬件實現OTOH會引發陷阱(英特爾TSX,IBM Power 8,S390/X ...),所以這種破壞將在各處都變得很明顯,非常快。