我需要很頻繁地調用這個函數。基本上,它將所有帶重音符號的字符替換爲不帶重音的等同字符,將空格更改爲「_」,將其轉換爲小寫字母,並去除外部代碼的其餘部分,因此使用文件名/ url路徑/等是安全的。問題是,正如你所看到的,從性能的角度來看,它看起來很糟糕。任何人都可以想到更清潔,更快的替代方案嗎?任何人都可以建議一個更快的替代這個正則表達式算法?
public static String makeValidPathName(String rawString) {
if (rawString==null) return null;
rawString = rawString.replaceAll("[ÁÀÂÄáàäaàáâãäå]","a");
rawString = rawString.replaceAll("æ","ae");
rawString = rawString.replaceAll("çÇ","c");
rawString = rawString.replaceAll("[ÈÉÊËèéêë]","e");
rawString = rawString.replaceAll("[ìíîïÍÌÎÏ]","i");
rawString = rawString.replaceAll("ñÑ","n");
rawString = rawString.replaceAll("[ÓÓÖÔòóôõö]","o");
rawString = rawString.replaceAll("œ","oe");
rawString = rawString.replaceAll("[ÙÚÛÜùúûü]", "u");
rawString = rawString.replaceAll("[ýÿ]","y");
rawString= rawString.replaceAll("[^a-z A-Z 0-9 \\_ \\+]","");
rawString = rawString.replaceAll(" ","_");
return rawString.toLowerCase();
}
---編輯
好吧,夥計們......我做了所有4個案件的性能測試:
- 情況1),因爲它是貼在這裏原來的子程序。
- 情況2)由@WChargin
- 情況3)通過@devconsole與我的優化建議用於SparseArray
- 案例4)歸一化方法的查找表由@Erik Pragt
建議提出的改進而且...獲勝者是...... TADAAA .....
D/REPLACEMENT_TEST(18563): *** Running Tests (1000 iterations)
D/REPLACEMENT_TEST(18563): Original REGEX : 13533 ms
D/REPLACEMENT_TEST(18563): Compiled REGEX : 12563 ms
D/REPLACEMENT_TEST(18563): Character LUT : 1840 ms
D/REPLACEMENT_TEST(18563): Java NORMALIZER : 2416 ms
- 很有趣的模式編譯優化沒有HEL很多。
- 我看到我對REGEXES的速度的假設也完全錯了,devconsole在他對Normalizer超越正則表達式的教育猜測中是正確的。
- 令人驚訝的是REGEXES的緩慢程度。一個數量級的差異真的讓我感到驚訝。我會盡量避免使用Java。
- 查找表是最短的空白選項。我會堅持使用這個解決方案,因爲使用Normalizer,我仍然需要手動替換一些字符(空格變成「_」),然後轉換成小寫字母。
測試在三星Galaxy Tab v1 10.1中完成。
請查看附件中的測試用例源代碼。
public class Misc {
/* Test 2 (@WCChargin's Regex compilation) initialization */
static Map<Pattern, String> patterns = new HashMap<Pattern, String>();
static {
patterns.put(Pattern.compile("[ÁÀÂÄáàäaàáâãäå]") ,"a");
patterns.put(Pattern.compile("æ"),"ae");
patterns.put(Pattern.compile("çÇ"),"c");
patterns.put(Pattern.compile("[ÈÉÊËèéêë]"),"e");
patterns.put(Pattern.compile("[ìíîïÍÌÎÏ]"),"i");
patterns.put(Pattern.compile("ñÑ"),"n");
patterns.put(Pattern.compile("[ÓÓÖÔòóôõö]"),"o");
patterns.put(Pattern.compile("œ"),"oe");
patterns.put(Pattern.compile("[ÙÚÛÜùúûü]"), "u");
patterns.put(Pattern.compile("[ýÿ]"),"y");
patterns.put(Pattern.compile("[^a-z A-Z 0-9 \\_ \\+]"),"");
patterns.put(Pattern.compile(" "),"_");
}
/* Test 3 (@devconsole's Lookup table) initialization */
static SparseArray<String> homeBrewPatterns=new SparseArray<String>();
/** helper function to fill the map where many different chars map to the same replacement */
static void fillMap(String chars, String replacement) { for (int i=0,len=chars.length(); i<len; i++) homeBrewPatterns.put(chars.charAt(i), replacement); }
static {
// fill the sparsearray map with all possible substitutions: Any char code gets its equivalent, ie, ä->a. a->a. A->a
// this also does the toLowerCase automatically. If a char is not in the list, it is forbidden and we skip it.
fillMap("ÁÀÂÄáàäaàáâãäå","a");
fillMap("æ","ae");
fillMap("çÇ","c");
fillMap("ÈÉÊËèéêë","e");
fillMap("ìíîïÍÌÎÏ","i");
fillMap("ñÑ","n");
fillMap("ÓÓÖÔòóôõö","o");
fillMap("œ","oe");
fillMap("ÙÚÛÜùúûü","u");
fillMap("ýÿ","y");
fillMap(" ","_");
for (char c='a'; c<='z'; c++) fillMap(""+c,""+c); // fill standard ASCII lowercase -> same letter
for (char c='A'; c<='Z'; c++) fillMap(""+c,(""+c).toLowerCase()); // fill standard ASCII uppercase -> uppercase
for (char c='0'; c<='9'; c++) fillMap(""+c,""+c); // fill numbers
}
/** FUNCTION TO TEST #1: Original function, no pattern compilation */
public static String makeValidPathName(String rawString) {
if (rawString==null) return null;
rawString = rawString.replaceAll("[ÁÀÂÄáàäaàáâãäå]","a");
rawString = rawString.replaceAll("æ","ae");
rawString = rawString.replaceAll("çÇ","c");
rawString = rawString.replaceAll("[ÈÉÊËèéêë]","e");
rawString = rawString.replaceAll("[ìíîïÍÌÎÏ]","i");
rawString = rawString.replaceAll("ñÑ","n");
rawString = rawString.replaceAll("[ÓÓÖÔòóôõö]","o");
rawString = rawString.replaceAll("œ","oe");
rawString = rawString.replaceAll("[ÙÚÛÜùúûü]", "u");
rawString = rawString.replaceAll("[ýÿ]","y");
rawString = rawString.replaceAll("[^a-z A-Z 0-9 \\_ \\+]","");
rawString = rawString.replaceAll(" ","_");
return rawString.toLowerCase();
}
/** FUNCTION TO TEST #2: @WCChargin's suggestion: Compile patterns then iterate a map */
public static String makeValidPathName_compiled(String rawString) {
for (Map.Entry<Pattern, String> e : patterns.entrySet()) {
rawString=e.getKey().matcher(rawString).replaceAll(e.getValue());
}
return rawString.toLowerCase();
}
/** FUNCTION TO TEST #3: @devconsole's suggestion: Create a LUT with all equivalences then append to a stringbuilder */
public static String makeValidPathName_lut(String rawString) {
StringBuilder purified=new StringBuilder(rawString.length()); // to avoid resizing
String aux; // to avoid creating objects inside the for
for (int i=0, len=rawString.length(); i<len; i++) {
aux=homeBrewPatterns.get(rawString.charAt(i));
if (aux!=null) purified.append(aux);
}
return purified.toString();
}
/** FUNCTION TO TEST #4: @Erik Pragt's suggestion on the use of a Normalizer */
public static String makeValidPathName_normalizer(String rawString) {
return rawString == null ? null
: Normalizer.normalize(rawString, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
/** Test Runner as a Runnable, just do a Handler.post() to run it */
public static Runnable msStringReplacementTest=new Runnable() {
public void run() {
String XTAG="REPLACEMENT_TEST";
Log.d(XTAG, "*** Running Tests");
int ITERATIONS=1000;
String[] holder=new String[4];
/* http://www.adhesiontext.com/ to generate dummy long text ... its late n im tired :) */
String tester="e arte nací valse ojales i demediada cesé entrañan domó reo ere fiaréis cinesiterapia fina pronto mensuraré la desatufases adulantes toree fusca ramiro hez apolíneo insalvable atas no enorme mí ojo trola chao fas eh borda no consignataria uno ges no arenque ahuyento y daca pío veló tenle baúl ve birria filo rho fui tañe mean taz raicita alimentarías ano defunciones u reabráis repase apreciaran cantorales ungidas naftalina ex guió abomba o escribimos consultarás des usó saudí mercó yod aborrecieses guiri lupia ña adosado jeringara fe cabalgadura tú descasar diseñe amar limarme escobero latamente e oreó lujuria niñez fabularios da inviernen vejé estañarán dictará sil mírales emoción zar claudiquéis ó e ti ñ veintén dañen ríase esmeraras acató noté as mancharlos avisen chocarnos divertidas y relata nuera usé fié élitro baba upa cu enhornan da toa hechizase genesíacos sol fija aplicó gafa pi enes fin asé deal rolar recurran cacen ha id pis pisó democristiano oes eras lañó ch pino fijad ñ quita hondazos ñ determinad vid corearan corrompimiento pamema meso fofas ocio eco amagados pian bañarla balan cuatrimestrales pijojo desmandara merecedor nu asimiladores denunciándote jada ñudos por descifraseis oré pelote ro botó tu sí mejorado compatibilizaciones enyerba oyeses atinado papa borbón pe dé ñora semis prosada luces leí aconteciesen doy colmará o ve te modismos virola garbillen apostabas abstenido ha bajá le osar cima ají adormecéis ñu mohecí orden abrogándote dan acanilladas uta emú ha emporcara manila arribeña hollejo ver puntead ijadeáis chalanesca pechugón silbo arabescos e i o arenar oxidas palear ce oca enmaderen niñez acude topó aguanieves i aconsejaseis lago él roía grafito ceñir jopo nitritos mofe botáis nato compresores ñu asilo amerizan allanándola cuela ó han ice puya alta lío son de sebo antieconómicas alá aceza latitud faca lavé colocándolos concebirlo miserea ñus gro mearé enchivarse";
long time0=System.currentTimeMillis();
for (int i=0; i<ITERATIONS; i++) holder[0]=makeValidPathName(tester); // store in an array to avoid possible JIT optimizations
long elapsed0=System.currentTimeMillis()-time0;
Log.d(XTAG, "Original REGEX : "+elapsed0+" ms");
long time1=System.currentTimeMillis();
for (int i=0; i<ITERATIONS; i++) holder[1]=makeValidPathName_compiled(tester); // store in an array to avoid possible JIT optimizations
long elapsed1=System.currentTimeMillis()-time1;
Log.d(XTAG, "Compiled REGEX : "+elapsed1+" ms");
long time2=System.currentTimeMillis();
for (int i=0; i<ITERATIONS; i++) holder[2]=makeValidPathName_lut(tester); // store in an array to avoid possible JIT optimizations
long elapsed2=System.currentTimeMillis()-time2;
Log.d(XTAG, "Character LUT : "+elapsed2+" ms");
long time3=System.currentTimeMillis();
for (int i=0; i<ITERATIONS; i++) holder[3]=makeValidPathName_normalizer(tester); // store in an array to avoid possible JIT optimizations
long elapsed3=System.currentTimeMillis()-time3;
Log.d(XTAG, "Java NORMALIZER : "+elapsed3+" ms");
Log.d(XTAG, "Output 0: "+holder[0]);
Log.d(XTAG, "Output 1: "+holder[1]);
Log.d(XTAG, "Output 2: "+holder[2]);
Log.d(XTAG, "Output 3: "+holder[3]);
}
};
,大家好,非常感謝您的建議:)我愛計算器:)
爲什麼不以舊式的方式迭代字符串的字符,只查看每個字符一次,進行轉換(或跳過字符)並將結果追加到StringBuilder中? – devconsole 2013-05-08 23:15:11
嗯......坦率地說,我不知道......比如說,如果字符串中有300個字符並且只有一個口音,這個迭代會比正則表達式更快嗎? – rupps 2013-05-08 23:21:29
如果你做基準測試,請嘗試@devconsole的解決方案。我想它會更快,因爲當你真的需要爲每個字符轉換做簡單的查找時,你正在使用正則表達式來匹配模式。 – 2013-05-08 23:46:03