2017-04-06 81 views
1

我試圖從本地保存的musicbrainz數據庫中隨機獲得100K首歌曲。我是編程新手,想知道計算機放慢速度的原因是什麼(可能是內存填充)。我在虛擬機上運行ubuntu。請提出一些改變,讓我可以在運行後進入睡眠狀態。運行此.py文件後,我的電腦逐漸變慢

import psycopg2 
import random 
import MySQLdb 
from contextlib import closing 

conn = psycopg2.connect("dbname='musicbrainz' user='musicbrainz' host='localhost' password='musicbrainz'") 
conn1 = MySQLdb.connect(host = "localhost", user = "root", passwd = "40OZlike", db = "plalyst") 
print("connections and cursors made...") 
cur= conn1.cursor() 
conn1.set_character_set('utf8') 
cur.execute('SET NAMES utf8;') 
cur.execute('SET CHARACTER SET utf8;') 
cur.execute('SET character_set_connection=utf8;') 
cur.close() 

def migrateSongDB(): 
    try: 
     cur1 = conn1.cursor() 
     cur1.execute("select count(*) from Song") 
     numberOfSongs = cur1.fetchall()[0][0] 
     cur1.close() 
     print("number of songs in our database is ") 
     print(numberOfSongs) 
     rnumbers = random.sample(range(1, 22660511), 100000-numberOfSongs) 
     print("random numbers generated....") 
     for eachnum in rnumbers: 
      cur = conn.cursor() 
      cur1 = conn1.cursor() 
      print(eachnum) 
      songName="" 
      while(songName==""): 
       cur.execute("""select name from track where id = %s """, (eachnum,)) 
       rows = cur.fetchall() 
       print(rows) 
       if not len(rows)==0: 
        songName = rows[0][0] 
       eachnum+=1 
      print("Got the track name:") 
      print(songName) 
      sql = 'INSERT into Song (name) values ("'+songName+'")' 
      print(sql) 
      cur1.execute(sql) 
      cur1.execute('commit') 
      print("inserted into the song table....") 
      cur.close() 
      cur1.close() 

     print("Songs Saved into new Data Base...") 
     conn.close() 
     conn1.close() 
     print("Connections Closed") 
    except: 
     with conn1 as cursor: 
      cursor.execute('select 1;') 
      result = cursor.fetchall() 
      for cur in result: 
       print(cur) 
     migrateSongDB() 

def main(): 
    migrateSongDB() 
    conn.close() 
    conn1.close() 

if __name__ == "__main__": main() 

感謝您抽出時間閱讀此代碼。 此外,如果你們有任何改善我的編碼風格的建議,我很樂意學習。 再次感謝您。

+0

使用常用系統工具(如top)識別放緩的原因。 –

+1

@KlausD:'top'在這裏會變得毫無價值。 – Makoto

+0

@Makoto ...你知道嗎? –

回答

0

我的預感是你從帽子里拉出的ID是,最終會漫步到ID不存在的區域。這意味着你要運行一個永久長的循環,效率低下。

而不是這種方法,爲什麼不把這些ID從數據庫中知道並選擇它呢?

你可以用這個來完成。我借用了一個列表展平操作from this Stack Overflow answer,這將使ID列表工作。

cur1.execute("select id from Song") 
result = cur1.fetchall() 
result = [item for sublist in result for item in sublist] 
# result holds the total amount of elements in your Song DB 
# use len() to get the total number of rows from here 
rnumbers = random.sample(result, 100000-len(result)) 

然後,您可以擺脫while循環,因爲您保證有一個實際存在於您的數據庫中的ID。

+0

我明白你在說什麼,但事情是所有的id都存在,但name屬性在musicbrainz中是空的D b。有什麼方法可以檢查運行此程序時佔用的變量或對象的內存量?原諒我,如果我錯了我是一個新手。謝謝。 –