拼合列表的列表與一捻

我有以下的數據結構：拼合列表的列表與一捻

a= [ 
     [u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', 
     u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', 
     u':', u'//t.co/5k8PUInmqK'], 
     [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', 
     u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', 
     u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', 
     u'#', u'NY', u'#', 
     u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026'] 
    ]

我看到這一點的方式，它是一個字符串列表的列表，但它是由一對[]而籠罩（）。所述雙[]是系統發出的一個結果是：

a = [nltk.tokenize.word_tokenize(tweetL) for tweetL in tweetList]

最後，我需要這種結構變平爲字符串的列表，並進行一些正則表達式和上的文字進行計數操作，但外雙[]正在阻止這一點。

我試着使用：

list.extend()

和

ll = len(a) 
for n in xrange(ll): 
    print 'list - ', a[n], 'number = ', n

，但仍得到相同的結果：

list - [ number = 1 
list - u number = 2 
list - ' number = 3 
list - h number = 4 
list - a number = 5 
list - p number = 6 
list - p number = 7

正如你看到的，代碼考慮的每一個符號字符串作爲列表的元素，而不是將整個字符串視爲元素

什麼可以有效地完成？

嘗試這樣做：

flat_list = [i for sublist in a for i in sublist] 
for i in flat_list: 
    print 'element - ', i

結果（部分）：

element - h 
element - a 
element - p 
element - p 
element - y 
element - 
element - t

來源

2015-10-15 Toly

我認爲這是在你的代碼行被鑄造爲一個字符串，而不是名單列表。它不是一個額外的支架 –

你的輸出也不正確，你有另一行說'list - [number = 0'？ – zehnpaard

[在Python中創建列表之外的平面列表]的可能副本（http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-蟒蛇） – TigerhawkT3

嵌套列表理解應該解決您的第一個問題。

a = [token for tweetL in tweetList for token in nltk.tokenize.word_tokenize(tweetL)]

此構造允許您遍歷嵌套for循環中找到的元素。最外面的循環總是先出現，然後是最外面的第二個等，直到最後一個循環的內部。

這可能有助於理解這相當於：

a = [] 
for tweetL in tweetList: 
    for token in nltk.tokenize.word_tokenize(tweetL): 
     a.append(token)

在Python 2，你可以編碼Unicode字符串使用UTF-8。這會將它們從unicode類型轉換爲str類型，這應該解決UnicodeEncodeError。

例子：

u'\u2713'.encode('utf-8')

有關Python 2 Unicode的詳細信息，你可以在這裏閱讀：https://docs.python.org/2/howto/unicode.html

來源

2015-10-16 00:33:33 Shashank

謝謝！我可以在您的嵌套語句中包含unicoding嗎？我真的不想打印它。我期待的最終結果是將所有內容都放入字符串列表中。我的意圖是使用這些字符串與正則表達式（清理字符串）最終得到一些統計數據。 – Toly

@Toly是的，任何有效的Python表達式都可以用在列表理解的最左邊部分，所以'token.encode（'utf-8'）'可以很容易地替換'token'，像這樣：'a = [token .token.word_tokenize（tweetL）中的tweetList中的tweetL代碼（'utf-8'）' – Shashank

哇！絕對好！解決了我所有的問題！謝謝Shashank和所有幫助我的人！非常感謝和學到很多東西！ – Toly

我不知道我很理解你的問題，讓我知道，如果我的路要走，但是，基於該你提供的輸入，你有一個列表的列表。不僅如此，但如果這是你總是有結構，你可以只取出你需要

a = a[0]

這將只是給你一個列表的內容。

然後，你可以只是簡單地重複爲：

for i in a: 
    print(i)

然而，如果僅僅是一個樣品，你實際上有這樣的事情：

[[],[],[],[]]

而且要完全壓平那麼對於單個列表，那麼你想要使用的理解是：

flat_list = [i for sublist in a for i in sublist]

然後你意味着有一個單一的名單爲：[1, 2, 3, 4]

然後你只需遍歷你想要什麼：

for i in flat_list: 
    print(i)

另外，如果你是想打印出來的指數，以及那麼你可以這樣做：

for i, v in enumerate(flat_list): 
    print("{}: {}".format(i, v))

只是關於您對擴展的使用的最終評論。

extend作爲幫助的方法規定：

extend(...) 
    L.extend(iterable) -- extend list by appending elements from the iterable

所以，這是因爲通過這個例子做用法「擴展」列表：

a = [1, 2, 3] 
b = [4, 5, 6] 
a.extend(b) 
# a will now be [1, 2, 3, 4, 5, 6]

運行你輸入：

a = [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']]

對我的代碼，產生此輸出：

0: happy 
1: thursday 
2: from 
3: my 
4: big 
5: sweater 
6: and 
7: this 
8: ART 
9: @ 
10: East 
11: Village 
12: , 
13: Manhattan 
14: https 
15: : 
16: //t.co/5k8PUInmqK

來源

2015-10-15 23:22:53 idjaw

使第一行/// a = a [0] – Prune

歡呼聲@Prune。感謝那。 – idjaw

@idjaw - 不幸的是我已經嘗試過，並且已經有其他方法。它仍然爲我的數據結構返回單個字符（而不是「單詞」）。這是我在嘗試上述解決方案時遇到的情況：h a p p作爲列 – Toly

a= [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']] 

from itertools import chain 

flat_a = list(chain.from_iterable(a)) 

['happy', 'thursday', 'from', 'my', 'big', 'sweater', 'and', 'this', 'ART', '@', 'East', 'Village', ',', 'Manhattan', 'https', ':', '//t.co/5k8PUInmqK', 'RT', '@', 'MayorKev', ':', 'IM', 'SO', 'HYPEE', '@', 'calloutband', '@', 'FreakLikeBex', '#', 'Callout', '#', 'TheBitterEnd', '#', 'Manhattan', '#', 'Music', '#', 'LiveMusic', '#', 'NYC', '#', 'NY', '#', 'Jersey', '#', 'NJ', 'http', ':', '//t.co/0…'] 

print(flat_a)

來源

2015-10-15 23:45:38 LetzerWille

遺憾的是，我仍然有同樣的問題['''，''，'''，'h'，'a'，'p '，'p'，'y'，'''，'，'，''，'u'，'''，'t'，'h'，'u'，'r'，'s'，' d'，'a'，'y'，'''，'，'，''，'u'，'''，'f'，'r'，'o'，'m'，''「 '，'，''，'u'，'''，'m'，'y'，'''，'，'，''，'u'，'''，'b'，'i'， 'g'，'''，'，'，''，'u'，'''，'s'，'w'，'e'，'a'作爲輸出。我完全不明白爲什麼它不起作用。我使用Python 2.7，以防萬一 – Toly

所以在2.7它甚至沒有扁平化列表..奇怪它爲我的python 3.嘗試運行此flat_a =列表（鏈（* a）） – LetzerWille

我把它收回。當[]有一個「信封」時它確實有效！問題在於wordTokenLw ='，'。join（map（str，wordToken））之前的命令移除了[]的信封，現在看起來像[u'happy'，u'thursday'，u'from'，u 'my'，]，[u'big'，u'sweater'，u'and'，u'this'，u'art'，u'@'，u]。現在需要弄清楚如何在這個結構上進行操作。嘗試了wordTokenLw [1]，wordTokenLw [2]但只得到了u'。對不起，我錯了！！ – Toly

a= [[u'happy', u'thursday', u'from', u'my', u'big', u'sweater', u'and', u'this', u'ART', u'@', u'East', u'Village', u',', u'Manhattan', u'https', u':', u'//t.co/5k8PUInmqK'], [u'RT', u'@', u'MayorKev', u':', u'IM', u'SO', u'HYPEE', u'@', u'calloutband', u'@', u'FreakLikeBex', u'#', u'Callout', u'#', u'TheBitterEnd', u'#', u'Manhattan', u'#', u'Music', u'#', u'LiveMusic', u'#', u'NYC', u'#', u'NY', u'#', u'Jersey', u'#', u'NJ', u'http', u':', u'//t.co/0\u2026']] 
for L in a: 
    for e in L: 
     print "element "+e 


element happy 
element thursday 
element from 
element my 
element big 
element sweater 
element and 
element this 
element ART 
element @ 
element East

來源

2015-10-16 00:12:33 skalp

優雅！而我不知道它爲什麼有效:) – Toly

拼合列表的列表與一捻

回答

相關問題