2010-10-18 44 views
15

我真的不理解正則表達式,我也無法找到任何正則表達式規則來驗證文化代碼,如:en-GB,en-UK,az-AZ-Cyrl等。如何使用正則表達式驗證文化代碼?

如何使用正則表達式驗證這些代碼?

+1

該鏈接應該被刪除,因爲它會導致被劫持的網站。 – 2016-06-16 18:08:58

回答

28

你可以用這個驗證:

/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/ 

這裏是它如何工作的

^  <- Starts with 
[a-z] <- From a to z (lower-case) 
{2,3} <- Repeated at least 2 times, at most 3 
(?:  <- Non capturing group 
    -  <- The "-" character 
    [A-Z]  <- From a to z (upper-case) 
    {2,3}  <- Repeated at least 2 times, at most 3 
    (?:  <- Non capturing group 
     -   <- The "-" character 
     [a-zA-Z] <- from a to Z (case insensitive) 
     {4}  <- Repeated 4 times 
    )   <- End of the group 
    ?   <- Facultative 
)  <- End of the group 
?  <- Facultative 
$  <- Ends here 

您還可以通過(?:-(?:Cyrl|Latn))?替換最後一個非捕獲組,如果唯一選項是Cyrl和LATN

+0

謝謝Colin Hebert和Eumiro以及:) – SameName69 2010-10-18 19:30:52

+0

爲什麼這個正則表達式比規範定義的更喜歡? – 2012-10-22 08:22:50

+0

@Stephane我不知道規範中有正則表達式。你在哪裏找到它? – 2012-10-22 09:56:41

6

這是我在Dublin Core/W3C xsd中找到的:http://www.w3.org/2001/XMLSchema

<xs:simpleType name="language" id="language"> 
    <xs:annotation> 
     <xs:documentation 
     source="http://www.w3.org/TR/xmlschema-2/#language"/> 
    </xs:annotation> 
    <xs:restriction base="xs:token"> 
     <xs:pattern 
     value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*" 
       id="language.pattern"> 
     <xs:annotation> 
      <xs:documentation 
       source="http://www.ietf.org/rfc/rfc3066.txt"> 
      pattern specifies the content of section 2.12 of XML 1.0e2 
      and RFC 3066 (Revised version of RFC 1766). 
      </xs:documentation> 
     </xs:annotation> 
     </xs:pattern> 
    </xs:restriction> 
    </xs:simpleType> 

然後模式是:

[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})* 
+1

如果添加線錨和非捕獲組,它會變成'^ [a-zA-Z] {1,8}(?: - [a-zA-Z0-9] {1,8})* $ ' – 2016-09-15 18:09:31

4

https://en.wikipedia.org/wiki/IETF_language_tag根據正則表達式可以是:

/^[a-z]{2,3}(?:-[a-zA-Z]{4})?(?:-[A-Z]{2,3})?$/ 

來自維基:

單個初級語言子標記基於一個雙來自ISO 639-1(2002)的字母語言代碼或來自ISO 639-2(1998),ISO 639-3(2007)或ISO 6的三字母代碼39-5(2008),或通過BCP 47程序註冊,由五至八個字母組成;

可選腳本子標籤,基於來自ISO 15924的四個字母的腳本代碼(通常寫在標題大小寫中);

基於來自ISO 3166-1 alpha-2(通常用大寫字母書寫)的雙字母國家代碼的可選區域子標籤或來自UN M.49的地理區域的三位代碼;

+0

「一個三位數的代碼」我在該正則表達式中看不到任何數字匹配的字符。 – Triynko 2017-07-11 18:06:55

-3

^(?i:AF | AX | AL | DZ | AS | AD | AO | AI | AQ | AG | AR | AM | AW | AU | AT | AZ | BS | BH | BD | BB | BY | BE | BZ |北京| BM | BT | BO | BQ | BA | BW | BV | BR | IO | BN | BG | BF | BI | KH | CM | CA |簡歷| KY | CF | TD | CL | CN | CX | CC |公司| KM | CG | CD | CK | CR | CI | HR | CU | CW | CY | CZ | DK | DJ | DM | DO | EC | EG | SV | GQ | ER | EE | ET | FK | FO | FJ | FI | FR | GF | PF | TF | GA |通用| GE | DE | GH | GI | GR | GL |廣東| GP | GU | GT | GG | GN | GW | GY | HT | HM | VA | HN | HK | HU | IS | IN | ID | IR | IQ | IE | IM | IL | IT | JM | JP | JE | JO | KZ | KE | KI | KP | KR | KW | KG | LA | LV | LB | LS | LR | LY | LI | LT | LU | MO | MK | MG | MW | MY | MV | ML | MT | MH | MQ | MR | MU | YT | MX | FM | MD | MC | MN | ME | MS | MA | MZ | MM | NA | NR | NP | NL | NC | NZ | NI | NE | NG | NU | NF | MP | NO | OM | PK | PW | PS | PA | PG | PY | PE | PH | PN | PL | PT | PR | QA |地產| RO | RU | RW | BL | SH | KN | LC | MF |發短消息| VC | WS | SM | ST | SA | SN | RS | SC | SL | SG | SX | SK | SI | SB | SO | ZA | GS | SS | ES | LK | SD | SR | SJ | SZ | SE | CH | SY | TW | TJ | TZ | TH | TL | TG | TK | TO | TT | TN | TR | TM | TC |電視| UG | UA | AE | GB |美| UM | UY | UZ | VU | VE | VN | VG | VI | WF | EH | YE | ZM | ZW)$

+0

雖然此代碼片段可能會解決問題,但[包括解釋](http://meta.stackexchange.com/questions/114762/explaining-entirely-code-based-answers)確實有助於提高帖子的質量。請記住,您將來會爲讀者回答問題,而這些人可能不知道您的代碼建議的原因。此外,它的格式不正確,最後但並非最不重要的是,這並不回答問題,因爲問題中描述的文化代碼格式更像'[a-z] {2,2} - [A-Z] {2,4}' – Clijsters 2017-11-22 12:48:17