2011-01-13 81 views
2

我有幾百個成員的名單,我想用名,中間名和姓來分開,但一些成員有前綴(用'P'表示) 。所有可能的組合:獨立的第一個,中間名和最後一個名(Python)

First Middle Last 
P First Middle Last 
First P Middle Last 
P First p Middle Last 

如何分開的第一(以P,如果有的話),中(其中P,如果有的話),並在Python的姓氏?這是我想出來的,但它不起作用。

import csv 
inPath = "input.txt" 
outPath = "output.txt" 

newlist = [] 

file = open(inPath, 'rU') 
if file: 
    for line in file: 
     member = line.split() 
     newlist.append(member) 
    file.close() 
else: 
    print "Error Opening File." 

file = open(outPath, 'wb') 
if file: 
    for i in range(len(newlist)): 
     print i, newlist[i][0] # Should get the First Name with Prefix 
     print i, newlist[i][1] # Should get the Middle Name with Prefix 
     print i, newlist[i][-1] 
    file.close() 
else: 
    print "Error Opening File." 

我要的是:

  1. 獲得第一和中段的名字與他們的前綴(如果可用)
  2. 輸出每(第一,中間,最後)分隔txt文件,或者一個CSV文件(優選的)。

非常感謝您的幫助。

+3

從示例中不清楚「前綴」是什麼;例如,如何判斷「A B C D」是「(A B」,「C」,「D」)還是「(」A「,」B C「,」D「)`。請給出一個更完整的例子,並更具體地解釋「前綴」是什麼。 – 2011-01-13 08:47:30

+0

如果前綴的長度是一個字母,並且沒有長度爲一個字母的名稱,則可以嘗試將`len()`過濾出來,並將它們與它們各自的名稱進行分組。只是一個想法。 – soulseekah 2011-01-13 08:55:55

+0

只有三個前綴「M」,「Shk」和「BS」 – 3zzy 2011-01-13 09:12:54

回答

2

這個怎麼樣完整的測試腳本:

import sys 

def process(file): 
    for line in file: 
     arr = line.split() 
     if not arr: 
      continue 
     last = arr.pop() 
     n = len(arr) 
     if n == 4: 
      first, middle = ' '.join(arr[:2]), ' '.join(arr[2:]) 
     elif n == 3: 
      if arr[0] in ('M', 'Shk', 'BS'): 
       first, middle = ' '.join(arr[:2]), arr[-1] 
      else: 
       first, middle = arr[0], ' '.join(arr[1:]) 
     elif n == 2: 
      first, middle = arr 
     else: 
      continue 
     print 'First: %r' % first 
     print 'Middle: %r' % middle 
     print 'Last: %r' % last 

if __name__ == '__main__': 
    process(sys.stdin) 

如果你在Linux上,在例如線型運行這個,然後按Ctrl + d來表示結束輸入。在Windows上,使用Ctrl + Z而不是Ctrl + D。當然,您也可以通過管道輸入文件。

以下輸入文件:

First Middle Last 
M First Middle Last 
First Shk Middle Last 
BS First M Middle Last 

給出了這樣的輸出:

First: 'First' 
Middle: 'Middle' 
Last: 'Last' 
First: 'M First' 
Middle: 'Middle' 
Last: 'Last' 
First: 'First' 
Middle: 'Shk Middle' 
Last: 'Last' 
First: 'BS First' 
Middle: 'M Middle' 
Last: 'Last' 
+0

太棒了!奇蹟般有效! :D – 3zzy 2011-01-13 09:49:53

1
names = [('A', 'John', 'Paul', 'Smith'), 
('Matthew', 'M', 'Phil', 'Bond'), 
('A', 'Morris', 'O', 'Reil', 'M', 'Big')] 

def getItem(): 
    for name in names: 
     for (pos,item) in enumerate(name): 
      yield item 

itembase = getItem() 

for i in enumerate(names): 
    element = itembase.next() 
    if len(element) == 1: firstName = element+" "+itembase.next() 
    else: firstName = element 
    element = itembase.next() 
    if len(element) == 1: mName = element+" "+itembase.next() 
    else: mName = element 
    element = itembase.next() 
    if len(element) == 1: lastName = element+" "+itembase.next() 
    else: lastName = element 

    print "First Name: "+firstName 
    print "Middle Name: "+mName 
    print "Last Name: "+lastName 
    print "--" 

這似乎工作。替換len(element) == 1的條件(我不知道你只需要檢查3個,所以我已經完成了一個任何單個字母)條件尋找你有三個前綴。

**Output** 
First Name: A John 
Middle Name: Paul 
Last Name: Smith 

First Name: Matthew 
Middle Name: M Phil 
Last Name: Bond 

First Name: A Morris 
Middle Name: O Reil 
Last Name: M Big 
+0

似乎不適用於此:`Firts Middle Last | M第一中間| First Shk Middle Last | Shk First M Middle Last` – 3zzy 2011-01-13 09:31:38

+0

我說你必須用你需要的條件來替換`len(element)== 1`。我無法爲你做所有的工作,這只是一個例子。其他人提供的更好,我們都在這裏學習。 – soulseekah 2011-01-13 10:12:01

-1

下面是另一種解決辦法(通過更改有關給定源代碼獲得):

import csv 
inPath = "input.txt" 
outPath = "output.txt" 

newlist = [] 

file = open(inPath, 'rU') 
if file: 
    for line in file: 
     member = line.split() 
     newlist.append(member) 
    file.close() 
else: 
    print "Error Opening File." 

file = open(outPath, 'wb') 
if file: 
    for fullName in newlist: 
     prefix = "" 
     for name in fullName: 
      if name == "P" or name == "p": 
       prefix = name + " " 
       continue 
      print prefix+name 
      prefix = "" 
     print 
    file.close() 
else: 
    print "Error Opening File." 
1

您在這裏以面向對象的方式去:

class Name(object): 
    def __init__(self, fullname): 
     self.full = fullname 
     s = self.full.split() 

     try: 
      self.first = " ".join(s[:2]) if len(s[0]) == 1 else s[0] 
      s = s[len(self.first.split()):] 

      self.middle = " ".join(s[:2]) if len(s[0]) == 1 else s[0] 
      s = s[len(self.middle.split()):] 

      self.last = " ".join(s[:2]) if len(s[0]) == 1 else s[0] 
     finally: 
      pass 

names = [ 
    "First Middle Last", 
    "P First Middle Last", 
    "First P Middle Last", 
    "P First p Middle Last", 
] 

for fullname in names: 
    name = Name(fullname) 
    print (name.first, name.middle, name.last) 
1

如果「M」,「新鴻基」和「BS」是無效的名字/姓氏,即你不關心他們的確切位置,你可以過濾出來用一行代碼:

first, middle, last = filter(lambda x: x not in ('M','Shk','BS'), yourNameHere.split()) 

其中,當然,yourNameHere是包含您想要解析的名稱的字符串。

警告:對於這段代碼,我假設你總是有一箇中間名,正如你在上面的例子中指定的那樣。如果不是,你必須得到整個列表並計算元素,以確定你是否有中間名。

編輯:如果你不關心的前綴位置:

first, middle, last = map(
    lambda x: x[1], 
    filter(
     lambda (i,x): i not in (0, 2) or x not in ('M','Shk','BS'), 
     enumerate(yourNameHere.split()))) 
-2

我會用正則表達式,即particulaly設計用於這種用途。 這個解決方案很容易保持和理解。

值得一試。 http://docs.python.org/library/re.html

import re 
from operator import truth 

// patterns 
        //First Middle Last 
first = re.compile ("^([\w]+) +([\w]+) ([\w]+)$") 
        //P First Middle Last 
second = re.compile ("^(M|Shk|BS) +([\w]+) +([\w]+) ([\w]+)$") 
        //First  P Middle Last 
third = re.compile ("^([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")  
        //P First p Middle Last 
forth = re.compile ("^(M|Shk|BS) +([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")  

if truth (first.search (you_string)): 
    parsed = first.search (you_string) 
    print parsed.group(1), parsed.group(2), parsed.group(3) 
elif truth (second.search (you_string)): 
    parsed = first.search (you_string) 
    print parsed.group(1), parsed.group(2), parsed.group(3) 
elif truth (third.search (you_string)): 
    parsed = first.search (you_string) 
    print parsed.group(1), parsed.group(2), parsed.group(3) 
elif truth (forth.search (you_string)): 
    parsed = first.search (you_string) 
    print parsed.group(1), parsed.group(2), parsed.group(3) 
else: 
    print "not match at all" 

這將更快,由於執行預編譯模式

0
import csv 

class CsvWriter(object): 
    """ 
    Wraps csv.writer in a partial file-API compatibility layer 
    """ 
    def __init__(self, fname, mode='w', *args, **kwargs): 
     super(CsvWriter, self).__init__() 
     self.f = open(fname, mode) 
     self.writer = csv.writer(self.f, *args, **kwargs) 

    def write(self, *args): 
     """ 
     Writes a row of data to the csv file 

     Can be called as 
      .write()   puts a blank row 
      .write(2)  puts a single cell 
      .write([1,2,3]) puts 3 cells 
      .write(1,2,3) puts 3 cells 
     """ 
     if len(args)==1 and hasattr(args[0], ('__iter__')): 
      # single argument, and it's a sequence - let it be the row data 
      rowdata = args[0] 
     else: 
      rowdata = args 

     self.writer.writerow(rowdata) 

    def close(self): 
     self.writer = None 
     self.f.close() 

    def __enter__(self): 
     return self 

    def __exit__(self, *exc): 
     self.close() 

class NameSplitter(object): 
    def __init__(self, pre=None): 
     super(NameSplitter, self).__init__() 

     # list of accepted prefixes 
     if pre is None: 
      self.pre = set(['m','shk','bs']) 
     else: 
      self.pre = set([s.lower() for s in pre]) 

     # is-a-prefix word tester 
     self.isPre = lambda x,p=self.pre: x.lower() in p 

     jn = lambda *args: ' '.join(*args) 

     # signature-based dispatch table 
     self.match = {} 
     self.match[(3,())] = lambda w,j=jn: (w[0],   w[1],   w[2]) 
     self.match[(4,(0,))] = lambda w,j=jn: (j(w[0],w[1]), w[2],   w[3]) 
     self.match[(4,(1,))] = lambda w,j=jn: (w[0],   j(w[1],w[2]), w[3]) 
     self.match[(5,(0,2))] = lambda w,j=jn: (j(w[0],w[1]), j(w[2],w[3]), w[4]) 

    def __call__(self, nameStr): 
     words = nameStr.split() 

     # build hashable signature 
     pres = tuple(n for n,word in enumerate(words) if self.isPre(word)) 
     sig = (len(words), pres) 

     try: 
      do = self.match[sig] 
      return do(words) 
     except KeyError: 
      return None 

def process(inf, outf, fn): 
    for line in inf: 
     res = fn(line) 
     if res is not None: 
      outf.write(res) 

def main(): 
    infname = "input.txt" 
    outfname = "output.csv" 

    with open(infname,'rU') as inf: 
     with CsvWriter(outfname) as outf: 
      process(inf, outf, NameSplitter()) 

if __name__=="__main__": 
    main() 
0

完整的腳本:

import sys 

def f(a,b): 
    if b in ('M','Shk','BS'): 
      return '%s %s' % (b,a) 
    else: 
      return '%s,%s' % (b,a) 

for line in sys.stdin: 
    sys.stdout.write(reduce(f, reversed(line.split(' ')))) 

輸入:

First Middle Last 
M First Middle Last 
First Shk Middle Last 
BS First M Middle Last 

CSVØ輸出:

First,Middle,Last 
M First,Middle,Last 
First,Shk Middle,Last 
BS First,M Middle,Last 
相關問題