查找PHP代碼庫中的所有字符串

我有幾百萬行的PHP代碼庫沒有顯示和邏輯的真正分離，我試圖提取所有在代碼中表示的字符串本地化的目的。顯示和邏輯的分離是一個長期目標，但現在我只想要本地化。查找PHP代碼庫中的所有字符串

在代碼中，字符串代表在PHP一切可能的格式，所以我需要一個理論（或實際）的方式來分析我們的整個源和至少是定位到每個字符串的生活。理想的情況下，當然，我會代替每個字符串的函數調用，例如

"this is a string"

將與

_("this is a string")

當然代替我需要同時支持單，雙quote format。其他人我不太在意，他們很少出現，我可以手動更改他們。

而且，我也不會要本地化當然數組索引。因此，像

$arr["value"]

字符串不應該成爲

$arr[_("value")]

誰能幫我把這個開始了嗎？

來源

2009-02-21 Ray

您可以使用token_get_all()從PHP文件（例如，

<?php 

$fileStr = file_get_contents('file.php'); 

foreach (token_get_all($fileStr) as $token) { 
    if ($token[0] == T_CONSTANT_ENCAPSED_STRING) { 
     echo "found string {$token[1]}\r\n"; 
     //$token[2] is line number of the string 
    } 
}

你可以做一個非常骯髒的支票，它沒有被用來作爲數組索引的東西，如：

$fileLines = file('file.php'); 

//inside the loop and if 
$line = $fileLines[$token[2] - 1]; 
if (false === strpos($line, "[{$token[1]}]")) { 
    //not an array index 
}

但你真的很難正確地做到這一點，因爲有人可能已經編寫一些你可能不希望如：

$str = 'string that is not immediately an array index'; 
doSomething($array[$str]);

編輯螞蟻P說，你可能會更好尋找周圍的令牌[和]對於這個答案的第二部分，而不是我的strpos黑客，像這樣：

$i = 0; 
$tokens = token_get_all(file_get_contents('file.php')); 
$num = count($tokens); 
for ($i = 0; $i < $num; $i++) { 
    $token = $tokens[$i]; 

    if ($token[0] != T_CONSTANT_ENCAPSED_STRING) { 
     //not a string, ignore 
     continue; 
    } 

    if ($tokens[$i - 1] == '[' && $tokens[$i + 1] == ']') { 
     //immediately used as an array index, ignore 
     continue; 
    } 

    echo "found string {$token[1]}\r\n"; 
    //$token[2] is line number of the string 
}

來源

2009-02-21 00:23:03

+1永遠不知道這個功能。棒極了。 – cletus 2009-02-21 00:26:12

唯一的一點是，對於 $ _SESSION [「logsession」] 它實際上給了我找到字符串「logsession」這當然不是我想要的本地化。 – Ray 2009-02-21 00:33:14

-3

而不是試圖解決這個過分聰明的命令行破解使用perl或grep，你應該寫一個程序來做到這一點:)

寫一個perl/python/ruby /任何腳本來搜索每個文件的一對單或雙引號。每次找到匹配項時，它都會提示您用下劃線函數替換它，並且可以讓它執行它或跳到下一個函數。

在一個完美的世界裏，你寫的東西會做這一切給你，但是這可能會花費較少的時間，最終，你會面臨更少的錯誤。

僞：

for fname in yourBigFileList: 
    create file handle for actual source file 
    create temp file handle (like fname +".tmp" or something) 
    for fline in fname: 
     get quoted strings 
     for qstring in quoted_strings: 
      show it in context, i.e. the entire line of code. 
      replace with _()? 
       if Y, replace and write line to tmp file 
       if N, just write that line to the tmp file 
    close file handles 
    rename it to current name + ".old" 
    rename ".tmp" file to name of orignal file

我敢肯定有這樣做的更* nix中福方式，但這種方法將讓你看看每個實例自己和決定。如果它是一百萬行，每一行都包含一個字符串，並且每一行都需要1秒來評估，那麼需要大約270個小時才能完成整個事情......也許你應該忽略這篇文章:)

來源

2009-02-21 00:26:32 inkedmn

代碼庫中可能存在其他一些情況，除了關聯數組之外，您將通過執行自動搜索和替換來完全中斷它們。

SQL查詢：

$myname = "steve"; 
$sql = "SELECT foo FROM bar WHERE name = " . $myname;

間接變量引用。

$bar = "Hello, World"; // a string that needs localization 
$foo = "bar"; // a string that should not be localized 
echo($$foo);

SQL字符串操作。

$sql = "SELECT CONCAT('Greetings, ', firstname) as greeting from users where id = ?";

沒有自動的方法來過濾所有的可能性。也許解決方案是編寫一個應用程序，該應用程序創建可能字符串的「審覈」隊列，並顯示每個字符串突出顯示並在幾行代碼的上下文中。然後，您可以瀏覽代碼以確定它是否是需要本地化的字符串，並單擊一個鍵來本地化或忽略字符串。

來源

2009-02-21 00:53:34 postfuturist

查找PHP代碼庫中的所有字符串

回答

相關問題