2016-03-18 21 views
1

我用python2.7運行一段代碼,cProfile說35s,而pypy上的cProfile說73s!假設pypy是更快的翻譯,這怎麼可能?該代碼在輸入比特流時實現BWT轉換。我有兩個文件:在fm.py中調用的bwt.py。我所謂的功能:python如何可能比pypy更快

pypy -m cProfle fm.py inputfile 

然後

python -m cProfle fm.py inputfile 

從bwt.py的代碼如下:

def rotations(t): 
    ''' Return list of rotations of input string t ''' 
    tt = t * 2 
    return [ tt[i:i+len(t)] for i in xrange(0, len(t)) ] 

def bwm(t): 
    return sorted(rotations(t)) 

def bwtViaBwm(t): 
    ''' Given T, returns BWT(T) by way of the BWM ''' 
    return ''.join(map(lambda x: x[-1], bwm(t))) 

def rankBwt(bw): 
    ''' Given BWT string bw, return parallel list of B-ranks. Also 
     returns tots: map from character to # times it appears. ''' 
    tots = dict() 
    ranks = [] 
    for c in bw: 
     if c not in tots: tots[c] = 0 
     ranks.append(tots[c]) 
     tots[c] += 1 
    return ranks, tots 
def firstCol(tots): 
    ''' Return map from character to the range of rows prefixed by 
     the character. ''' 
    first = {} 
    totc = 0 
    for c, count in sorted(tots.iteritems()): 
     first[c] = (totc, totc + count) 
     totc += count 
    return first 

def reverseBwt(bw): 
    ''' Make T from BWT(T) ''' 
    ranks, tots = rankBwt(bw) 
    first = firstCol(tots) 
    rowi = 0 # start in first row 
    t = '$' # start with rightmost character 
    while bw[rowi] != '$': 
     c = bw[rowi] 
     t = c + t # prepend to answer 
     # jump to row that starts with c of same rank 
     rowi = first[c][0] + ranks[rowi] 
    return t 



def suffixArray(s): 
    satups = sorted([(s[i:], i) for i in xrange(0, len(s))]) 
    print satups 
    return map(lambda x: x[1], satups) 

def bwtViaSa(t): 
    # Given T, returns BWT(T) by way of the suffix array 
    bw = [] 
    for si in suffixArray(t): 
     if si == 0: 
      bw.append('$') 
     else: 
      bw.append(t[si-1]) 
    return ''.join(bw) # return string-ized version of list bw 



def readfile(sd): 
    s="" 
    with open(sd,'r') as myfile: 
     s =myfile.read() 
    return s.rstrip('\n') 
def writefile(sd,N): 
    with open(sd, "wb") as sink: 
     sink.write(''.join(random.choice(string.ascii_uppercase + string.digits) for _ in xrange(N))) 
     sink.write('$') 
    return 



def main(): 
    data=readfile('inp') 
    b=bwtViaBwm(data) 
    ranks,tots = rankBwt(b) 
    print "Input stream = "+ data 
    print "BWT = " + bwtViaSa(data) 
    print '\n'.join(bwm(data)) 
    print ("Lc ranking:") 
    print zip(b,ranks) 

    fc=[x[0] for x in bwm(data)] 
    fc= ''.join(fc) 
    print ("First column="+ fc) 
    ranks,tots = rankBwt(fc) 
    print("Fc ranking:") 
    print zip(fc,ranks) 

    print reverseBwt(bwtViaSa(data)) 

if __name__=='__main__': 
    main() 

這是代碼形式fm.py這我叫它通過pypy:

import bwt 
import sys 
from collections import Counter 

def build_FM(fname): 
    stream=bwt.readfile(fname) 
    #print bwt.suffixArray(stream) 
    b=bwt.bwtViaBwm(stream) 
    ranks,tots = bwt.rankBwt(b) 
    lc=zip(b,ranks) 
    fc=[x[0] for x in bwt.bwm(stream)] 
    fc= ''.join(fc) 
    fc= zip(fc,ranks) 
    #print lc,fc 


def main(): 
    fname= sys.argv[1] 
    build_FM(fname) 
    return 


if __name__=='__main__': 
    main() 
+0

發表一個例子請 – kilojoules

+1

如果它需要更多的時間來運行pypy,那麼它似乎你的假設是不正確的....對於這個特定的代碼和數據。 –

+0

@WilliamPursell嗯,我認爲pypy總是更快。所以我錯了。我需要尋找什麼樣的代碼pypy勝過 – curious

回答

2

Pypy不保證更快地執行程序。首先,它實現的優化需要時間(有時需要很長時間)才能運行。其次,並不是所有的代碼在pypy下運行得都會更快,儘管大多數代碼都可以。

此外,剖析代碼的相對速度在它們之間可能會有很大差異 - pypy代碼是低層次的,因此引入剖析可能會比CPython更慢(相對而言)。沒有分析活動的結果是什麼?

我們需要查看您的程序以提供更多的信息。

+0

我用代碼編輯了我的問題 – curious

-1

您的腳本在rotations()(O(N ** 2)其中N是輸入文件的大小)中分配了一個瘋狂的內存量。從cProfile和vmprof可以看出,大部分時間都花在那裏。

因此,您所看到的是PyPy和CPython之間的內存處理差異。我的猜測是你正在交換,PyPy有更高的內存使用量。