Django數據庫到postgresql

大約有5000家公司，每家公司有大約4500個價格，總共有2200萬個價格。Django數據庫到postgresql

現在，前一段時間，我寫了存儲在該數據的格式像這個 -

class Endday(models.Model): 
    company = models.TextField(null=True) 
    eop = models.CommaSeparatedIntegerField(blank=True, null=True, max_length=50000)

以及存儲代碼，該代碼是 -

for i in range(1, len(contents)): 
    csline = contents[i].split(",") 
    prices = csline[1:len(csline)] 
    company = csline[0] 
    entry = Endday(company=company, eop=prices) 
    entry.save()

雖然，代碼爲很慢（顯然），但它確實工作並將數據存儲在數據庫中。有一天，我決定刪除Endday的所有內容，並嘗試再次存儲。但它沒有工作給我一個錯誤Database locked。

無論如何，我做了一些研究，並且知道MySql無法處理這麼多的數據。那麼它是如何存儲在第一位的？我得出的結論是，所有這些價格都存儲在數據庫的最初存儲位置，所以這些數據不會被存儲。

經過一番研究，我知道應該使用PostgreSql，所以我更改了數據庫，進行了遷移，然後再次嘗試代碼，但沒有運氣。我得到一個錯誤saying-

psycopg2.DataError: value too long for type character varying(50000)

好了，所以我想我們來試試使用bulk_create和修改了代碼一點，但我是用了同樣的錯誤歡迎。

接下來，我想也許讓兩個模型，一個持有公司名稱和其他的價格和特定公司的關鍵。所以，再一次，我改變了代碼 -

class EnddayCompanies(models.Model): 
    company = models.TextField(max_length=500) 

class Endday(models.Model): 
    foundation = models.ForeignKey(EnddayCompanies, null=True) 
    eop = models.FloatField(null=True)

而且，則須─

to_be_saved = [] 
for i in range(1, len(contents)): 
    csline = contents[i].split(",") 
    prices = csline[1:len(csline)] 
    company = csline[0] 
    companies.append(csline[0]) 
    prices =[float(x) for x in prices] 
    before_save = [] 
    for j in range(len(prices)): 
    before_save.append(Endday(company=company, eop=prices[j])) 
    to_be_saved.append(before_save) 
Endday.objects.bulk_create(to_be_saved)

然而令我驚訝，這是如此緩慢，在中間，它只是停在一家公司。我試圖找到其特定的代碼被減緩下來，它是 -

before_save = [] 
    for j in range(len(prices)): 
    before_save.append(Endday(company=company, eop=prices[j])) 
    to_be_saved.append(before_save)

好了，現在我又回到了起點，我想不出任何東西，所以我打電話的SO鈴。我現在的問題 -

如何去這？
爲什麼使用MySql保存工作？
有沒有更好的方法來做到這一點？（當然必須有）
如果有，是什麼？

來源

2016-08-02 ThatBird

您似乎收到的數據錯誤是因爲eop超過了50k個字符，而不是實際存儲的問題。被鎖定的數據庫可能與您刪除內容的方式有關（即，如果您使用單獨的進程執行此操作）。 – Sayse

所以將它改爲'TextField'就行了，但它的速度很慢並且無效 – ThatBird

雖然你提出了一個非常詳細的問題，但它仍然不是一個完整的MVCE。例如，你的輸入數據是什麼樣的？在其他意義上，它太廣泛了。那裏有3或4個問題。你爲什麼不把它分解成不同的問題？ – e4c5

我想你可以創建Company和單獨的樣板Price是這樣的：

class Company(models.Model): 
    name = models.CharField(max_length=20) 

class Price(models.Model): 
    company = models.ForeignKey(Company, related_name='prices') 
    price = models.FloatField()

這是你如何保存數據：

# Assuming that contents is a list of strings with a format like this: 
contents = [ 
    'Company 1, 1, 2, 3, 4...', 
    'Company 2, 1, 2, 3, 4...', 
    .... 
] 

for content in contents: 
    tokens = content.split(',') 
    company = Company.objects.create(name=tokens[0]) 
    Price.objects.bulk_create(
     Price(company=company, price=float(x.strip())) 
     for x in tokens[1:] 
    ) 
    # Then you can call prices now from company 
    company.prices.order_by('price')

更新：我剛剛注意到，它與第二個實現類似，唯一的區別是保存數據的方式。我的實現有較少的迭代。

來源

2016-08-02 09:27:12

我會試試這個，讓你知道 – ThatBird

@ThatBird很樂意知道結果。 :) –

Django數據庫到postgresql

回答

相關問題