2017-10-05 37 views
2

做:如何從部件列表中構建URL路徑?

from urllib.parse import urljoin 
urljoin('https://site/folder', 'page') 

返回https://site/page。然後就可以了,我可以追加一個/。但是,當我的變量已經/和我追加另一個,我得到了雙條:

urljoin('https://site/folder//', 'page') 
>>> 'https://site/folder//page' 

不會是錯誤urljoin加入網址時,讓這雙吧//

我如何加入的網址零件列表如下:

urljoin('https://site/folder', 'page', 'otherpage') 
> https://site/folder/page/otherpage 

urljoin('https://site/folder', 'page', 'otherpage.jsf') 
> https://site/folder/page/otherpage.jsf 

urljoin('https://site/folder/' , 'page.htm',) 
> https://site/folder/page.htm 

urljoin('https://site/folder//', '/page', '///otherpage') 
> https://site/folder/page/otherpage 

urljoin('https://site/folder//', '//page/', '//otherpage.php' ) 
> https://site/folder/page/otherpage.php 

urljoin('https://site/folder//', 'page', '/otherpage////') 
> https://site/folder/page/otherpage 

回答

1

我敢肯定有不同的方法來做到這一點

from urllib.parse import urljoin 
from functools import reduce # python3 

def clean_url(url): 
    return url.strip('/') + '/' 

def joinurllist(urls): 
    return reduce(urljoin, map(clean_url, urls)) 

joinurllist(['https://site/folder//', 'page', '///otherpage/']) 
2

這種行爲在python docs提及。

留下尾部斜線是附加適當路徑組件的合理方法。

2

// ...是一個合法的URI路徑。

urljoin檢查以查看上一個元素是否有尾隨/。如果它確實如此,它將它作爲分支而不是葉子。

所以:

>>> urljoin('/foo/bar/','page') 
'/foo/bar/page' 

>>> urljoin('/foo/bar', 'page') 
/foo/page 

如果你想真正避免額外/,然後rstrip()和追加:

>>> urljoin('/foo/bar/'.rstrip('/'), 'page') 
'/foo/page' 

>>> urljoin('/foo/bar///'.rstrip('/') + '/', 'page') 
'/foo/bar/page' 

什麼,你可能想要做的是一樣的東西:

L = ['root', 'part1','/part2/','//part3//'] 
urljoin([p.rstrip('/') + '/' for p in L]) 
1

我寫了這個URL加入函數吧:

def _clean_urljoin(url): 

    if url.startswith('/') or url.startswith(' '): 
     url = url[1:] 
     url = _clean_urljoin(url) 

    if url.endswith('/') or url.endswith(' '): 
     url = url[0:-1] 
     url = _clean_urljoin(url) 

    return url 


def clean_urljoin(*urls): 
    fixed_urls = [] 

    for url in urls: 
     fixed_urls.append(_clean_urljoin(url)) 

    return "/".join(fixed_urls) 


print(clean_urljoin('https://site/folder' , 'page'  , 'otherpage'  )) 
print(clean_urljoin('https://site/folder' , 'page'  , 'otherpage.jsf' )) 
print(clean_urljoin('https://site/folder/' , 'page.htm' ,     )) 
print(clean_urljoin('https://site/folder//' , '/page' , '///otherpage' )) 
print(clean_urljoin('https://site/folder//' , '//page/' , '//otherpage.php')) 
print(clean_urljoin('https://site/folder//' , 'page'  , '/otherpage////' )) 

運行此返回:

$ python3 test.py 
https://site/folder/page/otherpage 
https://site/folder/page/otherpage.jsf 
https://site/folder/page.htm 
https://site/folder/page/otherpage 
https://site/folder/page/otherpage.php 
https://site/folder/page/otherpage