2014-01-10 23 views
1

因此,我正在開發一個項目,在該項目中,我必須對一個大型的34mb文本文件進行排序,該文件充滿了歌曲數據。文本文件的每一行都有一年,唯一編號,藝術家和歌曲。我無法弄清楚的是如何有效地將數據分類到其他文本文件中。我想按藝術家名稱和歌曲名稱排序。可悲的是,這是我的全部:如何分類這些數據?

#Opening the file to read here 
with open('tracks_per_year.txt', 'r',encoding='utf8') as in_file: 
#Creating 'lists' to put information from array into 
years=[] 
uics=[] 
artists=[] 
songs=[] 

#Filling up the 'lists' 
for line in in_file: 
    year,uic,artist,song=line.split("<SEP>") 
    years.append(year) 
    uics.append(uic) 
    artists.append(artist) 
    songs.append(song) 
    print(year) 
    print(uic) 
    print(artist) 
    print(song) 

#Sorting: 
with open('artistsort.txt', 'w',encoding='utf8') as artist: 

for x in range(1,515576): 

    if artists[x]==artists[x-1]: 
     artist.write (years[x]) 
     artist.write(" ") 
     artist.write(uics[x]) 
     artist.write(" ") 
     artist.write(artists[x]) 
     artist.write(" ") 
     artist.write(songs[x]) 
     artist.write("\n") 


with open('Onehitwonders.txt','w',encoding='utf8') as ohw: 

for x in range(1,515576): 

    if artists[x]!= artists[x-1]: 
     ohw.write (years[x]) 
     ohw.write(" ") 
     ohw.write(uics[x]) 
     ohw.write(" ") 
     ohw.write(artists[x]) 
     ohw.write(" ") 
     ohw.write(songs[x]) 
     ohw.write("\n") 

請記住我是新手,所以請儘量把你的解釋深入淺出。如果你們有其他的想法,我也很樂意聽到他們的意見。謝謝!

+1

你不應該使用'range'這一點。如果文件中的條目數量發生變化,將會破壞您的邏輯。你可以使用'爲藝術家排隊:'確保你總是遍歷每一行。 – IanAuld

+0

@IanAuld感謝您的建議,但我在開始時就這麼做了。問題在於沒有任何文件以這種方式寫在artistsort.txt文件中,並且一個命中奇蹟文件變得太大(〜32mb)。 – Bobbert

+0

這與'for'循環無關。在你之前的問題中,你的邏輯存在一個問題,它阻止了任何寫入該文件的內容。 for循環只是迭代你的數據,它是在它決定了你的數據實際發生了什麼後。 – IanAuld

回答

0

您可以將數據導入基於字典的結構,即對於每一個歌手和歌曲:

data = {artist_name: {song_name: {'year': year, 'uid': uid}, 
         ... }, 
     ...} 

然後,當你輸出,使用sorted讓他們按字母順序排列:

for artist in sorted(data): 
    for song in sorted(data[artist]): 
     # use data[artist][song] to access details 
0

請嘗試這樣的:

from operator import attrgetter 

class Song: 
    def __init__(self, year, uic, artist, song): 
     self.year = year 
     self.uic = uic 
     self.artist = artist 
     self.song = song 

songs = [] 

with open('tracks_per_year.txt', 'r', encoding='utf8') as in_file: 
    for line in in_file: 
     year, uic, artist, song = line.split("<SEP>") 
     songs.append(Song(year, uic, artist, song)) 
     print(year) 
     print(uic) 
     print(artist) 
     print(song) 

with open('artistsort.txt', 'w', encoding='utf8') as artist: 
    for song in sorted(songs, key=attrgetter('artist', 'song')): 
     artist.write (song.year) 
     artist.write(" ") 
     artist.write(song.uic) 
     artist.write(" ") 
     artist.write(song.artist) 
     artist.write(" ") 
     artist.write(song.song) 
     artist.write("\n") 
+0

非常感謝這個想法。它的唯一部分,我沒有得到的是「歌曲排序(歌曲,鍵= attrgetter('藝術家','歌')):」。介意解釋。 – Bobbert

+0

內置的python函數'sorted()'從'歌曲'列表中返回一個新的排序列表。可選參數'key'是來自'songs'每個元素的函數返回鍵。在這種情況下,'attrgetter'函數返回'Song'對象的'artists'和'song'字段。 – vmario

0

你不能擊敗的簡單。要閱讀您的文件:

import pandas as pd 

data = pd.read_csv('tracks_per_year.txt', sep='<SEP>') 
data 
# year uic  artist  song 
#0 1981 uic1 artist1  song1 
#1 1934 uic2 artist2  song2 
#2 2004 uic3 artist3  song3 

然後通過特定的列進行排序,並寫入新文件只是做:

data.sort(columns='year').to_csv('year_sort.txt')