從C中的不規則字符串中獲取所有整數

我正在尋找一種（相對）簡單的方法來解析一個隨機字符串，並從中提取所有整數並將它們放入一個數組中 - 這與其他一些問題不同是相似的，因爲我的字符串沒有標準格式。從C中的不規則字符串中獲取所有整數

例子：

pt112parah salin10n m5:isstupid::42$%&%^*%7first3

我需要最終得到一個數組與這些內容：

112 10 5 42 7 3

而且我想的方法更有效然後通過串去逐個字符。

感謝您的幫助

來源

2011-06-07 RBaxter

我很確定唯一的方法就是逐個角色地進行搜索，除非你知道搜索的具體號碼。 – 2011-06-07 21:03:33

沒有比逐個字符更有效的方法。但是，您可能會發現一個隱藏引擎蓋下循環的庫函數。 – 2011-06-07 21:03:44

我所知道的是，它將是一個小於256的數字，非負數。我只能找到一個數字字符的索引，然後在該位置調用sscanf並重復，但我會認爲這樣做的效率更高（或者更簡潔）。 – RBaxter 2011-06-07 21:05:10

快速解決方案。我假設沒有數字超過long的範圍，並且沒有減號可以擔心。如果這些都是問題，那麼您需要做更多的工作來分析strtol()的結果，並且您需要檢測'-'後跟一個數字。

該代碼遍歷所有字符;我認爲你可以避免這種情況。但它確實使用strtol()來處理每個數字序列（找到第一個數字後），然後從strtol()中斷處繼續（並且strtol()足夠準確地告訴我們停止其轉換的位置）。

#include <stdlib.h> 
#include <stdio.h> 
#include <ctype.h> 

int main(void) 
{ 
    const char data[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3"; 
    long results[100]; 
    int nresult = 0; 

    const char *s = data; 
    char c; 

    while ((c = *s++) != '\0') 
    { 
     if (isdigit(c)) 
     { 
      char *end; 
      results[nresult++] = strtol(s-1, &end, 10); 
      s = end; 
     } 
    } 

    for (int i = 0; i < nresult; i++) 
     printf("%d: %ld\n", i, results[i]); 
    return 0; 
}

輸出：

來源

2011-06-07 21:24:32

我注意到變量'end'是多餘的;循環體可以是單個語句'results [nresult ++] = strtol（s-1，＆s，10）;'。我還注意到，我沒有在整數數組中包含溢出檢查 - 這也應該在那裏。 – 2011-06-07 21:36:21

更正：使用'end'更好，因爲它是常量正確的。使用'＆s'而不是'＆end'會導致在調用'strtol（）'時出現編譯警告。 – 2011-06-07 22:09:28

只是因爲我已經編寫Python一整天，我想休息一下。聲明一個數組將會很棘手。要麼你必須運行它兩次來計算出你有多少數字（然後分配數組），或者只是像這個例子一樣使用數字。

注意'0'到'9'的ASCII字符是48到57（即連續的）。

#include <stdlib.h> 
#include <stdio.h> 
#include <string.h> 
#include <stdbool.h> 

int main(int argc, char **argv) 
{ 
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3"; 

    int length = strlen(input); 
    int value = 0; 
    int i; 
    bool gotnumber = false; 
    for (i = 0; i < length; i++) 
    { 
     if (input[i] >= '0' && input[i] <= '9') 
     { 
      gotnumber = true; 
      value = value * 10; // shift up a column 
      value += input[i] - '0'; // casting the char to an int 
     } 
     else if (gotnumber) // we hit this the first time we encounter a non-number after we've had numbers 
     { 
      printf("Value: %d \n", value); 
      value = 0; 
      gotnumber = false; 
     } 
    } 

    return 0; 
}

編輯：以前的verison沒有與0

來源

2011-06-07 21:10:38 Joe

有趣的解決方案。將它適應我的代碼，感謝您的幫助！ – RBaxter 2011-06-07 21:15:25

什麼是多少？沒有用過。 – 2011-06-07 21:19:03

沒問題。如果它回答你的問題，請打勾！ – Joe 2011-06-07 21:21:05

更多高效處理不是由字符通過文字去？

不可能，因爲您必須查看每個字符才能知道它不是整數。現在

，因爲你必須去通過，雖然字符字符串的字符，我會建議簡單地鑄造每個字符爲int和檢查：

//string tmp = ""; declared outside of loop. 
//pseudocode for inner loop: 
int intVal = (int)c; 
if(intVal >=48 && intVal <= 57){ //0-9 are 48-57 when char casted to int. 
    tmp += c; 
} 
else if(tmp.length > 0){ 
    array[?] = (int)tmp; // ? is where to add the int to the array. 
    tmp = ""; 
}

數組將包含您的解決方案。

來源

2011-06-07 21:12:32 Phil

對'int'的轉換是毫無意義的，你可以直接使用'c';另外，48和57僅對基於ASCII的代碼頁很模糊和正確，只需使用'0'和'9'或'isdigit'功能。 – 2011-06-07 21:22:20

我看不到這樣的值，如「112」。 – Joe 2011-06-07 21:26:25

另一種解決方案是使用strtok功能

/* strtok example */ 
#include <stdio.h> 
#include <string.h> 

int main() 
{ 
    char str[] = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3"; 
    char * pch; 
    printf ("Splitting string \"%s\" into tokens:\n",str); 
    pch = strtok (str," abcdefghijklmnopqrstuvwxyz:$%&^*"); 
    while (pch != NULL) 
    { 
    printf ("%s\n",pch); 
    pch = strtok (NULL, " abcdefghijklmnopqrstuvwxyz:$%&^*"); 
    } 
    return 0; 
}

給出：

也許不是最好的解決這個任務，因爲你好編輯指定所有將被視爲令牌的字符。但它是其他解決方案的替代方案。

來源

2011-06-07 21:29:50

'strtok（）'立即排除掃描文字字符串或其他常量字符串，因爲它修改了它正在解析的數組。 – 2011-06-07 21:31:39

的確，這不是問題描述，因爲OP要求的是除了每個字符之外的循環之外的東西，所以我添加了這個，因爲它是一個有趣的函數:-) – 2011-06-07 21:42:45

獨立於我對使用'strtok ）'（並且解析代碼應該IMNSHO從不修改輸入字符串沒有明確的許可這樣做），顯示的代碼不會彈性顯示輸入中顯示的新字符。例如，它會遇到第一個大寫或重音字母的問題。使其具有彈性需要一個245字符的第二個參數給'strtok（）'（256 - 10個數字 - NUL）。 – 2011-06-07 21:53:38

#include <stdio.h> 
#include <string.h> 
#include <math.h> 

int main(void) 
{ 
    char *input = "pt112par0ah salin10n m5:isstupid::42$%&%^*%7first3"; 
    char *pos = input; 
    int integers[strlen(input)/2]; // The maximum possible number of integers is half the length of the string, due to the smallest number of digits possible per integer being 1 and the smallest number of characters between two different integers also being 1 
    unsigned int numInts= 0; 

    while ((pos = strpbrk(pos, "")) != NULL) // strpbrk() prototype in string.h 
    { 
     sscanf(pos, "%u", &(integers[numInts])); 

     if (integers[numInts] == 0) 
      pos++; 
     else 
      pos += (int) log10(integers[numInts]) + 1;  // requires math.h 

     numInts++; 
    } 

    for (int i = 0; i < numInts; i++) 
     printf("%d ", integers[i]); 

    return 0; 
}

查找整數經由重複調用完成，以strpbrk()所述偏移指針，與指針被以等於在整數的位數再次偏移，通過找到的以10爲底的對數計算整數並加1（特殊情況下整數爲0時）。在計算對數時，無需在整數上使用abs()，正如您所述，整數將是非負數。如果您想要更節省空間，您可以使用unsigned char integers[]而不是int integers[]，正如您所說的那樣，整數將全部爲< 256，但這不是必需的。

來源

2011-06-07 21:29:58 JAB

讓我的答案更合適一些......但可能有些方法可以簡化。 – JAB 2011-06-08 15:04:37

如果你不介意使用C++而不是C（通常沒有一個很好的理由爲什麼不），那麼你就可以減少您的解決方案的代碼只是兩行（使用AX解析器生成器）：

vector<int> numbers; 
auto number_rule = *(*(axe::r_any() - axe::r_num()) 
    & *axe::r_num() >> axe::e_push_back(numbers));

現在測試一下：

std::string str = "pt112parah salin10n m5:isstupid::42$%&%^*%7first3"; 
number_rule(str.begin(), str.end()); 
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

果然，你有你的號碼了。

作爲獎勵，你不需要解析Unicode寬字符串時改變什麼：

std::wstring str = L"pt112parah salin10n m5:isstupid::42$%&%^*%7first3"; 
number_rule(str.begin(), str.end()); 
std::for_each(numbers.begin(), numbers.end(), [](int i) { std::cout << "\ni=" << i; });

果然，你得到了相同的數字後面。

來源

2011-06-08 03:02:39

P.S.很容易修改該規則以提取負數，浮點數，十六進制數，十進制數...... – 2011-06-08 03:16:40

從C中的不規則字符串中獲取所有整數

回答

相關問題