0

我想從Tensorflow對象檢測API訓練ssd-inception-v2模型。我想使用的訓練數據集是一組不同大小的裁剪圖像,沒有邊框,因爲裁剪本身就是邊界框。使用圖像作物作爲訓練數據集的TensorFlow對象檢測API

我跟着create_pascal_tf_record.py例如相應更換邊界框和分類部分以生成所述TFRecords如下:

def dict_to_tf_example(imagepath, label): 
    image = Image.open(imagepath) 
    if image.format != 'JPEG': 
     print("Skipping file: " + imagepath) 
     return 
    img = np.array(image) 
    with tf.gfile.GFile(imagepath, 'rb') as fid: 
     encoded_jpg = fid.read() 
    # The reason to store image sizes was demonstrated 
    # in the previous example -- we have to know sizes 
    # of images to later read raw serialized string, 
    # convert to 1d array and convert to respective 
    # shape that image used to have. 
    height = img.shape[0] 
    width = img.shape[1] 
    key = hashlib.sha256(encoded_jpg).hexdigest() 
    # Put in the original images into array 
    # Just for future check for correctness 

    xmin = [5.0/100.0] 
    ymin = [5.0/100.0] 
    xmax = [95.0/100.0] 
    ymax = [95.0/100.0] 
    class_text = [label['name'].encode('utf8')] 
    classes = [label['id']] 
    example = tf.train.Example(features=tf.train.Features(feature={ 
     'image/height':dataset_util.int64_feature(height), 
     'image/width': dataset_util.int64_feature(width), 
     'image/filename': dataset_util.bytes_feature(imagepath.encode('utf8')), 
     'image/source_id': dataset_util.bytes_feature(imagepath.encode('utf8')), 
     'image/encoded': dataset_util.bytes_feature(encoded_jpg), 
     'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')), 
     'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),   
     'image/object/class/text': dataset_util.bytes_list_feature(class_text), 
     'image/object/class/label': dataset_util.int64_list_feature(classes), 
     'image/object/bbox/xmin': dataset_util.float_list_feature(xmin), 
     'image/object/bbox/xmax': dataset_util.float_list_feature(xmax), 
     'image/object/bbox/ymin': dataset_util.float_list_feature(ymin), 
     'image/object/bbox/ymax': dataset_util.float_list_feature(ymax) 
    })) 

    return example 


def main(_): 

    data_dir = FLAGS.data_dir 
    output_path = os.path.join(data_dir,FLAGS.output_path + '.record') 
    writer = tf.python_io.TFRecordWriter(output_path) 
    label_map = label_map_util.load_labelmap(FLAGS.label_map_path) 
    categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=80, use_display_name=True) 
    category_index = label_map_util.create_category_index(categories) 
    category_list = os.listdir(data_dir) 
    gen = (category for category in categories if category['name'] in category_list) 
    for category in gen: 
    examples_path = os.path.join(data_dir,category['name']) 
    examples_list = os.listdir(examples_path) 
    for example in examples_list: 
     imagepath = os.path.join(examples_path,example) 

     tf_example = dict_to_tf_example(imagepath,category) 
     writer.write(tf_example.SerializeToString()) 
#  print(tf_example) 

    writer.close() 

邊界框是硬編碼包含整個圖像。這些標籤將相應地提供給其相應的目錄。我使用mscoco_label_map.pbxt進行標記,並使用ssd_inception_v2_pets.config作爲我的管道的基礎。

我訓練和凍結模型使用jupyter筆記本的例子。但是,最終的結果是圍繞整個圖像的單個框。任何想法出了什麼問題?

回答

1

對象檢測算法/網絡通常通過預測邊界框以及類的位置來工作。出於這個原因,訓練數據通常需要包含邊界框數據。通過將訓練數據與邊界框一起提供給您的模型,邊界框始終是圖像的大小,那麼很可能您會得到垃圾預測,包括總是勾畫圖像的框。

這聽起來像是訓練數據的問題。您不應該裁剪圖像,而應該在完整的圖像/場景中註明對象。你現在基本上正在訓練一個分類器。

嘗試使用未剪裁的正確樣式的圖像進行訓練,看看您如何繼續。

相關問題