爲什麼JVM不能在Windows x86上發出預取指令

正如標題所述，爲什麼OpenJDK JVM不能在Windows x86上發出預取指令？見OpenJDK的水銀@http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/windows_x86/vm/prefetch_windows_x86.inline.hpp 爲什麼JVM不能在Windows x86上發出預取指令

inline void Prefetch::read (void *loc, intx interval) {} 
inline void Prefetch::write(void *loc, intx interval) {}

有任何意見，我已經沒有發現其他資源之外的源代碼。我問，因爲它使Linux x86版本，請參閱http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/linux_x86/vm/prefetch_linux_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) { 
#ifdef AMD64 
    __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval)); 
#endif // AMD64 
} 

inline void Prefetch::write(void *loc, intx interval) { 
#ifdef AMD64 

    // Do not use the 3dnow prefetchw instruction. It isn't supported on em64t. 
    // __asm__ ("prefetchw (%0,%1,1)" : : "r" (loc), "r" (interval)); 
    __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval)); 

#endif // AMD64 
}

來源

2017-06-04 naze

另外也預取用於Solaris x86_64的：VM/solaris_x86_64.il https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/os_cpu /solaris_x86/vm/solaris_x86_64.il#L122;但所有列出的預取不用於發出預取，它們是JVM熱點機器代碼本身使用的預取。在生成的（JITted）代碼中發出預取是在所有操作系統的x86代碼中：https://github.com/openjdk-mirror/jdk7u-hotspot/blob/50bdefc3afe944ca74c3093e7448d6b889cd20d1/src/cpu/x86/vm/c1_LIRAssembler_x86.cpp#L1335' LIR_Assembler :: prefetchr' /'LIR_Assembler :: prefetchw' – osgx

謝謝，那至少解釋了一些事情。也許添加這個作爲評論，我會接受它。我仍然在尋找JVM決定插入預取指令的部分。 – naze

你引用的所有具有彙編代碼片段（inline assembler），這是在自己的代碼中使用一些C/C++軟件（如apangin, the JVM expert pointed的文件，主要是GC代碼）。實際上有這樣的區別：Linux,Solaris和BSD x86_64熱點的變體在熱點中有預取，而且窗口已禁用/未實現，這是部分奇怪的，部分無法解釋的原因，也可能使JVM位（一些百分數;更多沒有硬件預取的平臺）在Windows上速度較慢，但仍然無助於銷售更多Sun/Oracle的solaris/solaris付費支持合同。 Ross also guessed內聯asm語法可能不支持MS C++編譯器，但_mm_prefetch應該（誰將打開JDK bug來添加它to the file？）。（JIT將代碼從它自己的函數複製到生成的代碼或發出對支持函數的調用，預取是JIT代碼是由JIT發出（生成）爲字節作爲熱點中的字節發射）。我們怎樣才能找到它是如何發射的？簡單的在線方式是找到一些在線搜索jdk8u的副本（或更好地在cross-reference like metager），例如在github上：https://github.com/JetBrains/jdk8u_hotspot並搜索prefetch或prefetch emit或prefetchr或lir_prefetchr。有一些相關的結果：

在JVM的c1 compiler/LIR發出的實際字節jdk8u_hotspot/src/cpu/x86/vm/assembler_x86.cpp：

void Assembler::prefetch_prefix(Address src) { 
    prefix(src); 
    emit_int8(0x0F); 
} 

void Assembler::prefetchnta(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rax, src); // 0, src 
} 

void Assembler::prefetchr(Address src) { 
    assert(VM_Version::supports_3dnow_prefetch(), "must support"); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x0D); 
    emit_operand(rax, src); // 0, src 
} 

void Assembler::prefetcht0(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rcx, src); // 1, src 
} 

void Assembler::prefetcht1(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rdx, src); // 2, src 
} 

void Assembler::prefetcht2(Address src) { 
    NOT_LP64(assert(VM_Version::supports_sse(), "must support")); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x18); 
    emit_operand(rbx, src); // 3, src 
} 

void Assembler::prefetchw(Address src) { 
    assert(VM_Version::supports_3dnow_prefetch(), "must support"); 
    InstructionMark im(this); 
    prefetch_prefix(src); 
    emit_int8(0x0D); 
    emit_operand(rcx, src); // 1, src 
}

使用在C1 LIR：src/share/vm/c1/c1_LIRAssembler.cpp

void LIR_Assembler::emit_op1(LIR_Op1* op) { 
    switch (op->code()) { 
... 
    case lir_prefetchr: 
     prefetchr(op->in_opr()); 
     break; 

    case lir_prefetchw: 
     prefetchw(op->in_opr()); 
     break;

現在我們知道the opcode lir_prefetchr and can search for it或和lir_prefetchw ，找到唯一的例子在src/share/vm/c1/c1_LIR.cpp

void LIR_List::prefetch(LIR_Address* addr, bool is_store) { 
    append(new LIR_Op1(
      is_store ? lir_prefetchw : lir_prefetchr, 
      LIR_OprFact::address(addr))); 
}

存在其中預取指令的定義（對於C2，如noted by apangin）其他地方，the src/cpu/x86/vm/x86_64.ad：

// Prefetch instructions. ... 
instruct prefetchr(memory mem) %{ 
    predicate(ReadPrefetchInstr==3); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHR $mem\t# Prefetch into level 1 cache" %} 
    ins_encode %{ 
    __ prefetchr($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrNTA(memory mem) %{ 
    predicate(ReadPrefetchInstr==0); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch into non-temporal cache for read" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrT0(memory mem) %{ 
    predicate(ReadPrefetchInstr==1); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHT0 $mem\t# prefetch into L1 and L2 caches for read" %} 
    ins_encode %{ 
    __ prefetcht0($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchrT2(memory mem) %{ 
    predicate(ReadPrefetchInstr==2); 
    match(PrefetchRead mem); 
    ins_cost(125); 

    format %{ "PREFETCHT2 $mem\t# prefetch into L2 caches for read" %} 
    ins_encode %{ 
    __ prefetcht2($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchwNTA(memory mem) %{ 
    match(PrefetchWrite mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch to non-temporal cache for write" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

// Prefetch instructions for allocation. 

instruct prefetchAlloc(memory mem) %{ 
    predicate(AllocatePrefetchInstr==3); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHW $mem\t# Prefetch allocation into level 1 cache and mark modified" %} 
    ins_encode %{ 
    __ prefetchw($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocNTA(memory mem) %{ 
    predicate(AllocatePrefetchInstr==0); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHNTA $mem\t# Prefetch allocation to non-temporal cache for write" %} 
    ins_encode %{ 
    __ prefetchnta($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocT0(memory mem) %{ 
    predicate(AllocatePrefetchInstr==1); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHT0 $mem\t# Prefetch allocation to level 1 and 2 caches for write" %} 
    ins_encode %{ 
    __ prefetcht0($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%} 

instruct prefetchAllocT2(memory mem) %{ 
    predicate(AllocatePrefetchInstr==2); 
    match(PrefetchAllocation mem); 
    ins_cost(125); 

    format %{ "PREFETCHT2 $mem\t# Prefetch allocation to level 2 cache for write" %} 
    ins_encode %{ 
    __ prefetcht2($mem$$Address); 
    %} 
    ins_pipe(ialu_mem); 
%}

來源

2017-06-04 16:21:14 osgx

JVM實際決定是否預取的更有趣的部分之一是https://github.com/JetBrains/jdk8u_hotspot/blob/435f973f98771edfa2126d5e6b6dea9bbf272e86/src/share/vm/opto/macro.cpp – naze

我其實在一個科學論文中工作，包括像「JVM JIT預取」這樣的句子。由於沒有關於JVM內部的真實論文，我只需挖掘即便其常識即可找到證據。學術界只是沒有這樣:) – naze

naze，我找不到PrefetchAllocationNode如何實現到真正的操作碼，它有一些奇怪的ABIO標記。可能需要在本地編譯JVM/JDK才能生成所有要生成的文件，然後在完整的代碼上進行搜索（可能使用一些C++交叉引用工具;但要注意非C++文件（如asm和ad）通過交叉引用，只能通過'grep'）。 – osgx

作爲JDK-4453409指示，預取中的HotSpot JVM被實施在JDK 1.4來加速GC。那是在15年前，沒有人會記得現在爲什麼它沒有在Windows上實現。我的猜測是，Visual Studio（一直用於在Windows上構建HotSpot）基本上不了解這些時間的預取指令。看起來像一個改進的地方。

無論如何，您詢問的代碼是由JVM垃圾收集器在內部使用的。這不是JIT產生的。 C2 JIT代碼生成器規則位於架構定義文件x86_64.ad中，並且有rules將PrefetchRead,PrefetchWrite和PrefetchAllocation節點轉換爲相應的x64指令。

令人不安的事實是PrefetchRead和PrefetchWrite節點不會在代碼中的任何位置創建。它們僅支持Unsafe.prefetchX內在函數，但是，它們是JDK 9中的removed。

JIT生成預取指令的唯一情況是PrefetchAllocation節點。您可以使用-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly驗證在對象分配後確實生成了PREFETCHNTA，在Linux和Windows上均確實生成了。

class Test { public static void main(String[] args) { byte[] b = new byte[0]; for (;;) { b = Arrays.copyOf(b, b.length + 1); } } }

java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Test

# {method} {0x00000000176124e0} 'main' '([Ljava/lang/String;)V' in 'Test' ... 0x000000000340e512: cmp $0x100000,%r11d 0x000000000340e519: ja 0x000000000340e60f 0x000000000340e51f: movslq 0x24(%rsp),%r10 0x000000000340e524: add $0x1,%r10 0x000000000340e528: add $0x17,%r10 0x000000000340e52c: mov %r10,%r8 0x000000000340e52f: and $0xfffffffffffffff8,%r8 0x000000000340e533: cmp $0x100000,%r11d 0x000000000340e53a: ja 0x000000000340e496 0x000000000340e540: mov 0x60(%r15),%rbp 0x000000000340e544: mov %rbp,%r9 0x000000000340e547: add %r8,%r9 0x000000000340e54a: cmp 0x70(%r15),%r9 0x000000000340e54e: jae 0x000000000340e496 0x000000000340e554: mov %r9,0x60(%r15) 0x000000000340e558: prefetchnta 0xc0(%r9) 0x000000000340e560: movq $0x1,0x0(%rbp) 0x000000000340e568: prefetchnta 0x100(%r9) 0x000000000340e570: movl $0x200000f5,0x8(%rbp) ; {metadata({type array byte})} 0x000000000340e577: mov %r11d,0xc(%rbp) 0x000000000340e57b: prefetchnta 0x140(%r9) 0x000000000340e583: prefetchnta 0x180(%r9) ;*newarray ; - java.util.Arrays::[email protected] (line 3236) ; - Test::[email protected] (line 9)

來源

2017-06-04 18:20:06 apangin

我真的很想知道爲什麼這是downvoted。 – EJP

+1，用於發現預取僅用於分配的上下文中。我會猜測，當迭代現有數組時，也會執行預取。看來我的假設是錯誤的。感謝您澄清 – naze

@naze，在迭代數組時會有預取;但它不是軟件預取，而是硬件預取。您可以關閉它並測量以發現其對英特爾的影響：https://software.intel.com/zh-cn/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors（使用' wrmsr -p N 0x1a4' for each core）; 「舊的處理器型號使用0x1A0位9和19」 - https://stackoverflow.com/a/36339469。英特爾的hw預取是積極的，但限於4KB頁面：如果它們捕獲兩個存儲器訪問A和B，其中N = B-A的ptrdiff，並且B + N在相同的4 KB中，則它們預取。 – osgx

爲什麼JVM不能在Windows x86上發出預取指令

回答

相關問題