2017-07-14 79 views
2

是否有更簡單的方式來設置dataloader,因爲在自動編碼器中輸入和目標數據相同,並且在訓練期間加載數據? DataLoader總是需要兩個輸入。如何在Pytorch中簡化用於自動編碼器的DataLoader

目前我定義我的DataLoader是這樣的:

X_train  = rnd.random((300,100)) 
X_val  = rnd.random((75,100)) 
train  = data_utils.TensorDataset(torch.from_numpy(X_train).float(), torch.from_numpy(X_train).float()) 
val   = data_utils.TensorDataset(torch.from_numpy(X_val).float(), torch.from_numpy(X_val).float()) 
train_loader= data_utils.DataLoader(train, batch_size=1) 
val_loader = data_utils.DataLoader(val, batch_size=1) 

,培養這樣的:

for epoch in range(50): 
    for batch_idx, (data, target) in enumerate(train_loader): 
     data, target = Variable(data), Variable(target).detach() 
     optimizer.zero_grad() 
     output = model(data, x) 
     loss = criterion(output, target) 

回答

1

爲什麼不將子類TensorDataset與未標記的數據兼容?

class UnlabeledTensorDataset(TensorDataset): 
    """Dataset wrapping unlabeled data tensors. 

    Each sample will be retrieved by indexing tensors along the first 
    dimension. 

    Arguments: 
     data_tensor (Tensor): contains sample data. 
    """ 
    def __init__(self, data_tensor): 
     self.data_tensor = data_tensor 

    def __getitem__(self, index): 
     return self.data_tensor[index] 

而且沿着這些線路的東西訓練你的自動編碼

X_train  = rnd.random((300,100)) 
train  = UnlabeledTensorDataset(torch.from_numpy(X_train).float()) 
train_loader= data_utils.DataLoader(train, batch_size=1) 

for epoch in range(50): 
    for batch in train_loader: 
     data = Variable(batch) 
     optimizer.zero_grad() 
     output = model(data) 
     loss = criterion(output, data) 
1

我相信這是因爲它得到簡單。除此之外,我想你將不得不實現你自己的數據集。示例代碼如下。

class ImageLoader(torch.utils.data.Dataset): 
def __init__(self, root, tform=None, imgloader=PIL.Image.open): 
    super(ImageLoader, self).__init__() 

    self.root=root 
    self.filenames=sorted(glob(root)) 
    self.tform=tform 
    self.imgloader=imgloader 

def __len__(self): 
    return len(self.filenames) 

def __getitem__(self, i): 
    out = self.imgloader(self.filenames[i]) # io.imread(self.filenames[i]) 
    if self.tform: 
     out = self.tform(out) 
    return out 

然後您可以如下使用它。

source_dataset=ImageLoader(root='/dldata/denoise_ae/clean/*.png', tform=source_depth_transform) 
target_dataset=ImageLoader(root='/dldata/denoise_ae/clean_cam_n9dmaps/*.png', tform=target_depth_transform) 
source_dataloader=torch.utils.data.DataLoader(source_dataset, batch_size=32, shuffle=False, drop_last=True, num_workers=15) 
target_dataloader=torch.utils.data.DataLoader(target_dataset, batch_size=32, shuffle=False, drop_last=True, num_workers=15) 

要測試第一批,請按以下步驟操作。

dataiter = iter(source_dataloader) 
images = dataiter.next() 
print(images.size()) 

最後,您可以枚舉批量訓練循環中的加載數據,如下所示。

for i, (source, target) in enumerate(zip(source_dataloader, target_dataloader), 0): 
    source, target = Variable(source.float().cuda()), Variable(target.float().cuda()) 

玩得開心。

PS。我共享的代碼示例不會加載驗證數據。

相關問題