2010-05-06 70 views
2

我有這個字符串,有我想刪除的非法字符,但我不知道可能存在什麼樣的字符。Applescript:清除字符串

我構建了一個我不希望被過濾的字符列表,並且我構建了該腳本(從我在網上找到的另一個字符串)。

on clean_string(TheString) 
    --Store the current TIDs. To be polite to other scripts. 
    set previousDelimiter to AppleScript's text item delimiters 
    set potentialName to TheString 
    set legalName to {} 
    set legalCharacters to {"a", "b", "c", "d", "e", "f", 
"g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", 
"s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", 
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", 
"S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", 
"6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", 
"É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", 
"õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", 
"/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".", 
"–", "_", " ", ":", ";", ASCII character 10, ASCII character 13} 

    --Whatever you want to eliminate. 
    --Now iterate through the characters checking them. 
    repeat with thisCharacter in the characters of potentialName 
     set thisCharacter to thisCharacter as text 
     if thisCharacter is in legalCharacters then 
      set the end of legalName to thisCharacter 
      log (legalName as string) 

     end if 
    end repeat 
    --Make sure that you set the TIDs before making the 
    --list of characters into a string. 
    set AppleScript's text item delimiters to "" 
    --Check the name's length. 
    if length of legalName is greater than 32 then 
     set legalName to items 1 thru 32 of legalName as text 
    else 
     set legalName to legalName as text 
    end if 
    --Restore the current TIDs. To be polite to other scripts. 
    set AppleScript's text item delimiters to previousDelimiter 
    return legalName 
end clean_string 

問題是,這個腳本是慢地獄,並給我超時。

我在做什麼是檢查字符的字符和比較legalCharacters列表。如果角色在那裏,那很好。如果不是,請忽略。

有沒有一種快速的方法來做到這一點?

「看TheString的每一個字符,並刪除那些不legalCharacters」

感謝您的任何幫助。

回答

3

你遇到了什麼非ascii字符?你的文件編碼是什麼?

使用shell腳本和tr,sed或perl來處理文本要高效得多。所有語言默認安裝在OS X中。

您可以使用帶tr的shell腳本(如下面的示例)剝離返回,並且還可以使用sed剝離空格(不在以下示例中):

set clean_text to do shell script "echo " & quoted form of the_string & "| tr -d '\\r\\n' " 

Technical Note TN2065: do shell script in AppleScript

或者,使用Perl,這將去除的非打印字符:

set x to quoted form of "Sample text. smdm#$%%&" 
set y to do shell script "echo " & x & " | perl -pe 's/[^[:alnum:]|[:space:]]//g'" 

搜索周圍SO使用TR的其它實例中,sed和perl用Applescript處理文本。或搜索MacScripter/AppleScript | Forums

2

在Applescript中迭代總是很慢,並且確實沒有更快的方法解決這些問題。登錄循環是減慢速度的絕對保證。明智地使用log命令。

但是,在您的特定情況下,您有一個長度限制,並將長度檢查移入重複循環中可能會大大縮短處理時間(只需不到一秒鐘,即可在腳本調試器中運行,而不管文本的長度):

on clean_string(TheString) 
    set potentialName to TheString 
    set legalName to {} 
    set legalCharacters to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "1", "2", "3", "4", "5", "6", "7", "8", "9", "0", "?", "+", "-", "Ç", "ç", "á", "Á", "é", "É", "í", "Í", "ó", "Ó", "ú", "Ú", "â", "Â", "ã", "Ã", "ñ", "Ñ", "õ", "Õ", "à", "À", "è", "È", "ü", "Ü", "ö", "Ö", "!", "$", "%", "/", "(", ")", "&", "€", "#", "@", "=", "*", "+", "-", ",", ".", "–", "_", " ", ":", ";", ASCII character 10, ASCII character 13} 
with timeout of 86400 seconds --86400 seconds = 24 hours 

    repeat with thisCharacter in the characters of potentialName 
     set thisCharacter to thisCharacter as text 
     if thisCharacter is in legalCharacters then 
     set the end of legalName to thisCharacter 
     if length of legalName is greater than 32 then 
     return legalName as text 
     end if 
     end if 
    end repeat 
end timeout 
    return legalName as text 
    end clean_string 
+0

謝謝,但這個循環給我這個錯誤結果: 錯誤「AppleEvent超時。」號碼-1712 ...我想文本太長,applescript不願意等它完成。 – SpaceDog 2010-05-06 22:00:02

+0

我已經給代碼添加了一個超時模塊,但是您不應該在這裏獲取(我相信默認超時時間爲60秒)。我在這個頁面的完整文本上運行代碼沒有任何問題。我認爲你可能不得不把周圍的調用包裝到子程序的調用或堆棧中更高的地方。 – 2010-05-07 00:03:51

2

另一個shell腳本方法可能是:

set clean_text to do shell script "echo " & quoted form of the_string & "|sed \"s/[^[:alnum:][:space:]]//g\"" 
使用SED刪除一切,不是一個字母數字字符,或空間

。更多正則表達式參考here

+0

這也是一個很好的字符串,用於處理文本。 – markratledge 2010-05-08 15:51:23

+0

短而甜美... + 1 – Marlon 2015-08-19 15:51:32

0

BBEdit或TextWrangler在這方面會快很多。下載TextWrangler(它是免費的),然後打開你的文件並運行Text - > Zap Gremlins ...就可以了。這是否做你需要的?如果是這樣,用冷飲料慶祝。如果沒有,嘗試BBEdit(它不是免費的),並根據需要創建一個新的文本工廠,並提供儘可能多的「全部替換」條件,然後打開文件並運行文本工廠。