2012-03-30 68 views
0

我有一個文本,如:正則表達式來捕獲文本週圍文字

Title A 
some description on a few lines, there may be empty lines here 
some description on a few lines 
Status: some random text 
Title B 
some description on a few lines, there may be empty lines here 
some description on a few lines 
Status: some other random text 
Title C 
some description on a few lines, there may be empty lines here 
some description on a few lines 
Status: some other random text 

我想分析基於字Status:文本,並得到項目的數組,每個標題,描述和狀態。我正在使用C#4.0。

+5

[你嘗試過什麼?](http://mattgemmell.com/2008/12/08/what-have-you-tried/) – 2012-03-30 14:31:19

+1

是文本以'string'或'string []'或其他結構給出?你能舉一個結果如何的例子嗎? – 2012-03-30 14:36:21

+0

爲了告訴你'真相'我沒有嘗試過任何東西,因爲任務超過了我的正則表達式技能...... – Lincoln 2012-03-30 14:37:03

回答

1

這是我會怎麼做(假設它是從文本文件中讀取):

Regex regStatus = new Regex(@"^Status:"); 
Regex regTitle = new Regex(@"^Title:"); 
string line; 
string[] decriptionLine; 
string[] statusLine; 
string[] titleLine; 
using(TextReader reader = File.OpenText("file.txt")) 
{ 
    while(reader.Peek() > 0) 
    { 
     line = reader.ReadLine(); 
     if(regStatus.IsMatch(line)) 
     { 
      // status line, convert to array, can drop first element as it is "status" 
      statusLine = line.Split(' '); 
      // do stuff with array 
     } 
     else if(regTitle.IsMatch(line)) 
     { 
      // title line, convert to array can drop first element as it is "title" 
      titleLine = line.Split(' '); 
      // do stuff with array 
     } 
     else 
     { 
      // description line, so just split into array 
      decriptionLine = line.Split(' '); 
      // do stuff with array 
     } 
    } 
} 

然後,您可以採取的陣列,並將其存儲在某些類別中,如果你想要的。我會讓你知道的。它只是使用一個簡單的正則表達式來檢查該行是否以 「Status:」或「Title:」開頭。真相被告知,這甚至不需要。你可以這樣做:

if(line.StartsWith("Status:")) {} 
if(line.StartsWith("Title:")) {} 

檢查每一行是否以狀態或標題開頭。

1

如果像你描述的內容是結構化的,可以緩衝文本

string myRegEx = "^String:.*$"; 

// loop through each line in text 

    if (System.Text.RegularExpressions.Regex.IsMatch(line, myRegEx)) 
    { 
     // save the buffer into array 
     // clear the buffer 
    } 
    else 
    { 
     // save the text into the buffer 
    } 
1

聲明一個項目類型

public class Item 
{ 
    public string Title { get; set; } 
    public string Status { get; set; } 
    public string Description { get; set; } 
} 

再拆文成線

string[] lines = text.Split(new[] { "\r\n" }, StringSplitOptions.None); 

或者用

從文件中讀取行
string[] lines = File.ReadAllLines(path); 

創建它的結果將被存儲

var result = new List<Item>(); 

現在我們可以做解析

Item item; 
for (int i = 0; i < lines.Length; i++) { 
    string line = lines[i]; 
    if (line.StartsWith("Title ")) { 
     item = new Item(); 
     result.Add(item); 
     item.Title = line.Substring(6); 
    } else if (line.StartsWith("Status: ")) { 
     item.Status = line.Substring(8); 
    } else { // Description 
     if (item.Description != null) { 
      item.Description += "\r\n"; 
     } 
     item.Description += line; 
    } 
} 

注意,這個解決方案沒有錯誤處理的項目清單。此代碼假定輸入文本始終是格式良好的。

+0

我通過將'lines [i]'分配給'line'來簡化代碼,並糾正了描述部分中的錯誤。 – 2012-03-30 15:45:38

0
string data = @"Title A 


Status: Nothing But Net! 
Title B 
some description on a few lines, there may be empty lines here 
some description on a few lines 
Status: some other random text 
Title C 
Can't stop the invisible Man 
Credo Quia Absurdium Est 
Status: C Status"; 

string pattern = @" 
^(?:Title\s+) 
(?<Title>[^\s]+) 
(?:[\r\n\s]+) 
(?<Description>.*?) 
    (?:^Status:\s*) 
    (?<Status>[^\r\n]+) 
"; 

// Ignorepattern whitespace just allows us to comment the pattern over multiple lines. 
Regex.Matches(data, pattern, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace) 
    .OfType<Match>() 
    .Select (mt => new 
     { 
      Title = mt.Groups["Title"].Value, 
      Description = mt.Groups["Description"].Value.Trim(), 
      Status = mt.Groups["Status"].Value.Trim() 
     }) 
     .ToList() // This is here just to do the display of the output 
     .ForEach(item => Console.WriteLine ("Title {0}: ({1}) and this description:{3}{2}{3}", item.Title, item.Status, item.Description, Environment.NewLine)); 

輸出:

Title A: (Nothing But Net!) and this description: 


Title B: (some other random text) and this description: 
some description on a few lines, there may be empty lines here 
some description on a few lines 

Title C: (C Status) and this description: 
Can't stop the invisible Man 
Credo Quia Absurdium Est