2014-09-26 138 views
0

我一直在嘗試添加水印,如Add text to Existing PDF using Python中所示,但我一直收到有關來自reportlab的pdf數據的錯誤。輸入pdf是否有問題?(PyPDF2)嘗試合併PDF產生錯誤

設置:的Python 3.3(蟒蛇分佈)Windows 7的

from PyPDF2 import PdfFileMerger, PdfFileReader, PdfFileWriter 
from six import BytesIO 
from reportlab.lib.units import inch 
from reportlab.pdfgen.canvas import Canvas 
from reportlab.lib.pagesizes import letter 

# Render watermark layer 
stream = BytesIO() 
c = Canvas(stream, pagesize=letter) 
c.drawString(1 * inch, 8 * inch, "Hello World! " * 3) 
c.showPage() 
c.save() 

stream.seek(0) 
overlay = PdfFileReader(stream) 
source = PdfFileReader("test.pdf") 
writer = PdfFileWriter() 

# Merge sorce and watermark pages 
page0 = source.getPage(0) 
page0.mergePage(overlay.getPage(0)) 
writer.insertPage(page0, 0) 

# Write result to file 
with open('merged.pdf', 'wb') as fp: 
    writer.write(fp) 

我收到以下錯誤:

Traceback (most recent call last): 
    File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 73, in <module> 
    pageSelectionPDF("./merged_pdfs/FB1_report.pdf", [44,52]) 
    File "D:\IBP_Scripts\bsouthga\PDF Merge\merge.py", line 64, in pageSelectionPDF 
    page0.mergePage(overlay.getPage(0)) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1996, in mergePage 
    self._mergePage(page2) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2042, in _mergePage 
    page2Content = PageObject._pushPopGS(page2Content, self.pdf) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 1956, in _pushPopGS 
    stream = ContentStream(contents, pdf) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\pdf.py", line 2428, in __init__ 
    stream = BytesIO(b_(stream.getData())) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\generic.py", line 831, in getData 
    decoded._data = filters.decodeStreamData(self) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 317, in decodeStreamData 
    data = ASCII85Decode.decode(data) 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in decode 
    data = [y for y in data if not (y in ' \n\r\t')] 
    File "D:\Users\bsouthga\AppData\Local\Continuum\Anaconda\envs\py33\lib\site-packages\PyPDF2\filters.py", line 256, in <listcomp> 
    data = [y for y in data if not (y in ' \n\r\t')] 
TypeError: 'in <string>' requires string as left operand, not int 

回答

0

轉向Python 2.7(再次,森蚺DIST )它似乎工作,必須與3

0

這是python 3的pyPDF2庫中的問題。如果你想使用python 3,那麼你需要在filter.py文件中修補ascii85decode類。我遇到了同樣的問題,並從pdfminer3k(這是python 3的pdfminer的一個端口)中的ascii85.py中借用ascii85解碼代碼,並將其粘貼到filter.py中的def中修復了問題。問題是在python 3中它需要返回字節,但是在舊的python 2代碼中卻沒有。 github中有一個要求合併變更的請求。以爲我會在這裏回答以防萬一。

與pdfminer3k此代碼替換在filter.py的ascii85decode DEF在PyPDF2庫中的代碼:

if isinstance(data, str): 
    data = data.encode('ascii') 
n = b = 0 
out = bytearray() 
for c in data: 
    if ord('!') <= c and c <= ord('u'): 
     n += 1 
     b = b*85+(c-33) 
     if n == 5: 
      out += struct.pack(b'>L',b) 
      n = b = 0 
    elif c == ord('z'): 
     assert n == 0 
     out += b'\0\0\0\0' 
    elif c == ord('~'): 
     if n: 
      for _ in range(5-n): 
       b = b*85+84 
      out += struct.pack(b'>L',b)[:n-1] 
     break 
return bytes(out)