2016-11-16 119 views
3

我試圖構建完美的亞馬遜鏈接正則表達式在JavaScript中使用。這是我到目前爲止有:完美的亞馬遜鏈接正則表達式

var reg = /https?:\/\/(www|smile)\.amazon\.com\/(?:(?:[\w-]+\/)?(?:dp|gp\/product)\/(\w{10})\/)?/; 

我想這符合所有完下列網址:

http://smile.amazon.com/dp/B0005ZH4QI/?tag=menasheh02-20&psc=1&smid=ATVPDKIKX0DER 
http://www.amazon.com/gp/family/signup/info/?ie=UTF8&camp=1789&creative=9325&linkCode=ur2&ref_type=generic&refcust=5FNWKEJKP63HFBSY6JGLXL4XIQ&tag=menasheh02-20&linkId=HR76ZTGJKWO5ED2N 
http://www.amazon.com/gp/redirect.html?ie=UTF8&location=https%3A%2F%2Fwww.amazon.com%2Fgp%2Fsubscribe-and-save%2Fmanager%2Fviewsubscriptions%3Fie%3DUTF8%26ref_%3Dya%255FT15%255F33&tag=menasheh02-20&linkCode=ur2&camp=1789&creative=390957 
http://www.amazon.com/gp/student/signup/info?ie=UTF8&refcust=7EATHY4IXOFTTEMLIHVC3YL6DI&ref_type=generic 
http://www.amazon.com/gp/video/primesignup?tag=menasheh02-20 
https://smile.amazon.com/dp/B0005ZH4QI/?tag=menasheh02-20&psc=1&smid=ATVPDKIKX0DER 
https://smile.amazon.com/s/ref=s9_acss_gb_cg_HTLLPCGB_3d1?fst=as%3Aoff&rh=n%3A165793011%2Cn%3A!2334111011%2Cn%3A!2334173011%2Cn%3A15539865011%2Cp_n_age_range%3A165936011%2Cp_72%3A1248963011&bbn=15539865011&ie=UTF8&qid=1476851901&rnid=1248961011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=events-center-c-4&pf_rd_r=8MKN8SY6C5ZP4NC1C0RB&pf_rd_t=701&pf_rd_p=e4acec8d-70de-466a-be44-05291b40a5d4&pf_rd_i=HTL_desktop 
https://www.amazon.com/b/ref=s9_acss_gb_cg_HTLLPCGB_11a1?node=13521759011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=events-center-c-4&pf_rd_r=8MKN8SY6C5ZP4NC1C0RB&pf_rd_t=701&pf_rd_p=e4acec8d-70de-466a-be44-05291b40a5d4&pf_rd_i=HTL_desktop 
https://www.amazon.com/Doctor-Vortex-Manipulator-Sonic-Screwdriver/dp/B001PR1ZII/ref=gbph_tit_e-7_fb02_fc8a0d34?smid=AOUT97QIB451U&pf_rd_p=8e268714-ad3d-444b-b0df-d51d8825fb02&pf_rd_s=events-center-c-7&pf_rd_t=701&pf_rd_i=HTL_desktop&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=8MKN8SY6C5ZP4NC1C0RB 
https://www.amazon.com/dp/B0005ZH4QI/?tag=menasheh02-20&psc=1&smid=ATVPDKIKX0DER 
https://www.amazon.com/gp/coupon/skippy-baking-sale/A2UI00T2I5JAV3?ie=UTF8&heroAsin=B0005ZH4QI&source=grid_db_13285418011&pf_rd_p=782d30de-8b22-4b3d-9009-0f7a0cb995d3&pf_rd_s=merchandised-search-3&pf_rd_t=Landing&pf_rd_i=13285418011&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=PPNJHXVZRMM4XP9KXGGG 
https://www.amazon.com/Monster-High-School-Playset/dp/B006O6F932/ref=gbph_tit_e-7_fb02_85d3d028?smid=A3CXJV2JYTL237&pf_rd_p=8e268714-ad3d-444b-b0df-d51d8825fb02&pf_rd_s=events-center-c-7&pf_rd_t=701&pf_rd_i=HTL_desktop&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=8MKN8SY6C5ZP4NC1C0RB 
https://www.amazon.com/s/ref=s9_acss_gb_cg_HTLLPCGB_3d1?fst=as%3Aoff&rh=n%3A165793011%2Cn%3A!2334111011%2Cn%3A!2334173011%2Cn%3A15539865011%2Cp_n_age_range%3A165936011%2Cp_72%3A1248963011&bbn=15539865011&ie=UTF8&qid=1476851901&rnid=1248961011&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=events-center-c-4&pf_rd_r=8MKN8SY6C5ZP4NC1C0RB&pf_rd_t=701&pf_rd_p=e4acec8d-70de-466a-be44-05291b40a5d4&pf_rd_i=HTL_desktop 

而沒有這些:

https://www.google.com/search?safe=active&site=&source=hp&q=bad+regex&oq=bad+regex&gs_l=hp.3..0j0i22i30k1l9.724.2089.0.2265.10.9.0.0.0.0.269.1091.0j4j2.6.0....0...1c.1.64.hp..4.5.821.0..0i20k1j0i131k1j0i10k1.k62wRudUpsw 
https://sellercentral.amazon.com/B53C945A8D?randomstuff=34341&otherrandomstuff=2 

眼下,它不匹配任何一個壞的 - 這部分是相對簡單的。 (它也不匹配url中的gp/redirect.html?)。棘手的部分是讓匹配分別返回url的每個有用部分,特別是考慮if/elses和#。

##工作

match[1]應該等於或者 「WWW」 或 「笑臉」。

match[2]應等於ASIN,或者是空白,如果該URL沒有/dp/%ASIN%%SEO-string%/dp/%ASIN%,或/gp/product/%ASIN%

# Not Working #

match[3]應.COM之後等於URL的其餘部分,或者後產品如果設置了ASIN,但不包括#末尾

match[4]應該等於從匹配開始[3]到tag=(如果存在)。

match[5]應等於標籤參數(如果存在)

match[6]應等於標籤參數之間的網址的其餘部分(如果存在的話;否則空白)和#(如果它存在,否則到結束)

match[7]應後,等於#末和任何東西,或空白,如果沒有一個

我剛剛進入更復雜的正則表達式,並卡住的東西好像不打算一路到該行的結尾,如果有#等等;.

任何人都可以更有經驗的幫助嗎?謝謝。

+0

相反RegExp'構造的',使用正則表達式文字。 – Tushar

+0

@Tushar用於匹配功能?你是什​​麼意思? – Menasheh

+1

'var regex = /https?:\/\/(www|smile)\.amazon\.com\/((::([:[\w-]+\//??(?:dp|gp\/產品)\ /(\ W {10})\ /)([\ W \/= - ] +)([\ W \/= - ?]???+)(\ +)(#[\ w] +)?/;'然後_#不工作JS##問題將被解決。在'RegExp'構造函數中,字符串被傳遞時斜線需要被轉義。 – Tushar

回答

2

嘗試此正則表達式,其與JavaScript的工作原理:

https?:\/\/(?=(?:....)?amazon|smile)(www|smile)\S+com(((?:\/(?:dp|gp)\/([A-Z0-9]+))?\S*[?&]?(?:tag=))?\S*?)(?:#)?(\w*?-\w{2})?(\S*)(#?\S*)+ 

我做少許改變:

匹配[3] &匹配[4] =匹配[2] &匹配[3]

match [2] = match [4]。

希望它有幫助。

演示:https://regex101.com/r/sT2wj8/2

+0

爲什麼涉及谷歌和銷售 - 他們不應該匹配!這是專門爲亞馬遜網站 – Menasheh

+1

@Menasheh他們不是。位於它們之前的符號「?!」表示「與括號之間的內容不匹配」。無論如何,我更新了代碼,並將其刪除。查看演示中的匹配組。 – Ibrahim