Python：獲得一定的否。從字典中的字符串

我有一個以下格式的字典，我使用拆分功能拆分不同的元素（其中逗號（，）發生），現在我試圖從列表中提取名稱...我是試圖用正則表達式，但顯然我慘遭失敗是新的Python ...名稱是下列格式...Python：獲得一定的否。從字典中的字符串

姓名（空格）姓氏
名（空間）名稱（空間）名稱
x.name
xyname
name（space ）X。（空格）（名稱）

其中x和y代表的名稱初始像J.爲John等也如果你能引導我在去除「\ t」的保持其他信息完好將也很棒。任何形式的幫助將超過歡迎...謝謝大家。

[[' I. Antonov', ' I. Antonova', ' E. R. Kandel', ' and R. D. Hawkins. Activity-dependent presynaptic facilitation and hebbian ltp are both required and interact during classical conditioning in aplysia. Neuron', ' 37(1):135--47', ' Jan 2003.'], ['\tSander M. Bohte ', ' Joost N. Kok', ' Applications of spiking neural networks', ' Information Processing Letters', ' v.95 n.6', ' p.519-520'], [' L. J. Eshelman. The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination. Foundations Of Genetic Algorithms', ' pages 265-283', ' 1990.'], ['Wulfram Gerstner ', ' Werner Kistler', ' Spiking Neuron Models: An Introduction', ' Cambridge University Press', ''], [' D. O. Hebb. Organization of behavior. New York: Wiley', ' 1949.'], [' D. Z. Jin. Spiking neural network for recognizing spatiotemporal sequences of spikes. Physical Review E', '69', ' 2004.'], ['Wolfgang Maass ', ' Christopher M. Bishop', ' Pulsed Neural Networks', ' MIT Press', ' '], ['Wolfgang Maass ', ' Henry Markram', ' Synapses as dynamic memory buffers', ' Neural Networks', ' v.15 n.2', ' p.'], [' H. Markram', ' Y. Wang', ' and M. Tsodyks. Differential signaling via the same axon of neocortical pyramidal neurons. Neurobiology', ' 95:5323--5328', ' April 1998.'], ['\t\tD. E. Rumelhart ', ' G. E. Hinton ', ' R. J. Williams', ' Learning internal representations by error propagation', ' Parallel distributed processing: explorations in the microstructure of cognition', ' vol. 1: foundations', ' MIT Press', ' Cambridge', ' MA', ' 1986 </a> \t\t\t\t\t\t\t\t\t'], ['\t J. D. Schaffer', ' L. D. Whitley', ' and L. J. Eshelman. Combinations of genetic algorithms and neural networks: A survey of the state of the art. In Combinations of Genetic Algorithms and NeuralNetworks', ' 1992.', ' COGANN-92. International Workshop on', ' pages 1--37', ' Philips Labs.', ' Briarcliff Manor', ' NY', ' 6 Jun 1992.'], ['\t S. Song', ' K. D. Miller', ' and L. F. Abbott. Competitive hebbian learning through spike-timing-dependent synaptic plasticity. Nature Neuroscience', ' 3(9):919--926', ' 2000.'], ['\t L. Watts. Event-driven simulation of networks of spiking neurons. Advances in Neural Information Processing Systems', ' 6:927--934', ' 1994.']]

來源

2012-01-06 irfanbukhari

看起來你將不得不根據你的輸入量身定做這件事。因爲在解析的文本中有太多不同的單詞和結構，所以您可能無法按照您創建的規則獲得100％的準確性。下面是一個例子，但是，假設你原來輸入的文本被稱爲input_text（我不認爲使用split（）方法是真的有用，因爲逗號並不只是劃定名）：

import re 

regexes = (r'[A-Z][a-z]+ [A-Z][a-z]+', # capitalized first and last name 
      r'[A-Z]\. [A-Z][a-z]+')  # capitalized initial, then last name 
names = [] 

for regex in regexes: 
    names += re.findall(regex, input_text)

你顯然想要爲各種各樣的名稱類型編寫額外的特定正則表達式。這在尋找名字方面做得很好，但也帶來了很多誤報（根據這些規則，Information Processing看起來很像一個名字）。這應該給你一個出發點。

來源

2012-01-06 08:35:19

上述方案被證明是非常有用的區別......我想問你一些類似的事情......但在另一種情況下......事情是我的名單上有格式的名字...... ['約翰一個，史密斯，威廉告訴，雅各布奧蘭姆']現在我想要更正名字約翰史密斯是一個完整的名稱，但由於原始數據的格式它出來這樣...現在使用上述方法我發現所有這樣的名稱使用 ab = re.search（（r' [AZ] [az] + [AZ] \，[AZ] [az] +'），zz [uy]）其中zz [uy]是字符串的位置... – irfanbukhari 2012-01-18 08:30:19

現在我想要的是我想要的是，我只是想取代th以正確格式的約翰a的名字。史密斯，威廉告訴，列表中的雅各布奧蘭姆...請幫助 – irfanbukhari 2012-01-18 08:34:04

基本上我想做的所有事情都是用「。」替換「a」後面的「，」。並將其放回字符串 – irfanbukhari 2012-01-18 09:12:15

要移除突出部和多餘的空格，使用帶材（）：

>>> "\t foobar \t\t\t".strip() 
'foobar'

來源

2012-01-06 08:21:35

要卸下（字符串，並在開始時其它的空的空間或結束）的標籤：

stripped = [s.strip() for t in mylist]

說實話，如果你試圖提取名字，那麼像這樣分割線條將無濟於事 - 請注意一些名字仍然與標題組合在一起。將會更好地構建一個匹配名稱的良好正則表達式，並在各條線上使用re.findall。

來源

2012-01-06 08:23:25

它也可能是，它更容易找到一些在線信息來源，這項工作已經完成。例如，在諸如this或this的地方。

來源

2012-01-06 09:40:41

條中的所有字符串
確定是肯定沒有名字的字符串（很長的，那些包括數字，經過這些列表中的一個）
indentify字符串是肯定的名稱（短字符串在列表的起始處，字符串由模式$ [AZ] [az] {0,3}。？s（Dr.，Miss，Mr，Prof等）開始
sudy最後一串字符串不符合這些規則，並嘗試通過創建一個certidude係數來選擇模糊規則：靠近列表的開始處，較短的字符串將有一個高分，最後有一個大si澤。添加這樣的標準並設置最低分數。

如果您需要高精度，loof名稱數據庫和貝葉斯過濾器。

這不會是完美的：這是很難知道的名字命名的名字'和「字逐字」

來源

2012-01-06 11:44:00

Python：獲得一定的否。從字典中的字符串

回答

相關問題