在記事本++中的引號之間提取文本

我的文件包含2000個以上摘要，其中包含超過18000個句子，以標記開始，以標記結束。我想找到使用記事本的信息++，我的文件的示意圖如下：在記事本++中的引號之間提取文本

<abstract> 
<sentence>Activationofthe<conslex="CD28_surface_receptor"sem="G#protein_family_or_group"><conslex="CD28"sem="G#protein_molecule">CD28</cons>surfacereceptor</cons>providesamajorcostimulatorysignalfor<conslex="T_cell_activation"sem="G#other_name">Tcellactivation</cons>resultinginenhancedproductionof<conslex="interleukin-2"sem="G#protein_molecule">interleukin-2</cons>(<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>)and<conslex="cell_proliferation"sem="G#other_name">cellproliferation</cons>.</sentence> 
<sentence>In<conslex="primary_T_lymphocyte"sem="G#cell_type">primaryTlymphocytes</cons>weshowthat<conslex="CD28"sem="G#protein_molecule">CD28</cons>ligationleadstotherapidintracellularformationof<conslex="reactive_oxygen_intermediate"sem="G#inorganic">reactiveoxygenintermediates</cons>(<conslex="ROI"sem="G#inorganic">ROIs</cons>)whicharerequiredfor<conslex="CD28-mediated_activation"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-mediatedactivation</cons>ofthe<conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>/<conslex="CD28-responsive_complex"sem="G#protein_complex"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-responsivecomplex</cons>and<conslex="IL-2_expression"sem="G#other_name"><conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expression</cons>.</sentence> 
<sentence>Delineationofthe<conslex="CD28_signaling_cascade"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>signalingcascade</cons>wasfoundtoinvolve<conslex="protein_tyrosine_kinase_activity"sem="G#other_name"><conslex="protein_tyrosine_kinase"sem="G#protein_family_or_group">proteintyrosinekinase</cons>activity</cons>,followedbytheactivationof<conslex="phospholipase_A2"sem="G#protein_molecule">phospholipaseA2</cons>and<conslex="5-lipoxygenase"sem="G#protein_molecule">5-lipoxygenase</cons>.</sentence> 
<sentence>Ourdatasuggestthat<conslex="lipoxygenase_metabolite"sem="G#protein_family_or_group"><conslex="lipoxygenase"sem="G#protein_molecule">lipoxygenase</cons>metabolites</cons>activate<conslex="ROI_formation"sem="G#other_name"><conslex="ROI"sem="G#inorganic">ROI</cons>formation</cons>whichtheninduce<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expressionvia<conslex="NF-kappa_B_activation"sem="G#other_name"><conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>activation</cons>.</sentence> 
<sentence>Thesefindingsshouldbeusefulfor<conslex="therapeutic_strategies"sem="G#other_name">therapeuticstrategies</cons>andthedevelopmentof<conslex="immunosuppressants"sem="G#other_name">immunosuppressants</cons>targetingthe<conslex="CD28_costimulatory_pathway"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>costimulatorypathway</cons>.</sentence> 
</abstract>

我想提取引號例如之間的文本我希望的輸出是這樣的

"CD28_surface_receptor" "G#protein_family_or_group" "CD28" "G#protein_molecule" 
"primary_T_lymphocyte" "G#cell_type"

我希望會有一個更簡單的方式來做到這一點，通過使用regx記事本++。如果有一種方法來提取顏色的記事本中的基礎++

來源

2015-04-02 Shaheen Gul

所需輸出的第一行僅包含引號之間的4個文本，而包含它們的第一行有4個以上（我計數爲12）。你只需要這4個或全部？ – Jerry 2015-04-02 10:32:05

檢查以下

"\w+"|"G#\w+"

或操作上的文字任務可能變得容易|僅在作業記事本++中使用

來源

2015-04-02 10:17:16

「或操作符|僅用於記事本++」這完全是不真實的。 – Jerry 2015-04-02 10:32:34

在記事本++中的引號之間提取文本

回答

相關問題