2017-06-06 87 views
17

我正在嘗試在2017年WWDC上演示的蘋果示例Core ML模型正常運行。我正在使用GoogLeNet嘗試對圖像進行分類(請參閱Apple Machine Learning Page)。該模型將CVPixelBuffer作爲輸入。我有一個名爲imageSample.jpg的圖片,我正在使用這個演示。我的代碼如下:將圖像轉換爲用於機器學習的CVPixelBuffer Swift

 var sample = UIImage(named: "imageSample")?.cgImage 
     let bufferThree = getCVPixelBuffer(sample!) 

     let model = GoogLeNetPlaces() 
     guard let output = try? model.prediction(input: GoogLeNetPlacesInput.init(sceneImage: bufferThree!)) else { 
      fatalError("Unexpected runtime error.") 
     } 

     print(output.sceneLabel) 

我總是在輸出而不是圖像分類中得到意想不到的運行時錯誤。我的代碼,將圖像轉換爲如下:

func getCVPixelBuffer(_ image: CGImage) -> CVPixelBuffer? { 
     let imageWidth = Int(image.width) 
     let imageHeight = Int(image.height) 

     let attributes : [NSObject:AnyObject] = [ 
      kCVPixelBufferCGImageCompatibilityKey : true as AnyObject, 
      kCVPixelBufferCGBitmapContextCompatibilityKey : true as AnyObject 
     ] 

     var pxbuffer: CVPixelBuffer? = nil 
     CVPixelBufferCreate(kCFAllocatorDefault, 
          imageWidth, 
          imageHeight, 
          kCVPixelFormatType_32ARGB, 
          attributes as CFDictionary?, 
          &pxbuffer) 

     if let _pxbuffer = pxbuffer { 
      let flags = CVPixelBufferLockFlags(rawValue: 0) 
      CVPixelBufferLockBaseAddress(_pxbuffer, flags) 
      let pxdata = CVPixelBufferGetBaseAddress(_pxbuffer) 

      let rgbColorSpace = CGColorSpaceCreateDeviceRGB(); 
      let context = CGContext(data: pxdata, 
            width: imageWidth, 
            height: imageHeight, 
            bitsPerComponent: 8, 
            bytesPerRow: CVPixelBufferGetBytesPerRow(_pxbuffer), 
            space: rgbColorSpace, 
            bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue) 

      if let _context = context { 
       _context.draw(image, in: CGRect.init(x: 0, y: 0, width: imageWidth, height: imageHeight)) 
      } 
      else { 
       CVPixelBufferUnlockBaseAddress(_pxbuffer, flags); 
       return nil 
      } 

      CVPixelBufferUnlockBaseAddress(_pxbuffer, flags); 
      return _pxbuffer; 
     } 

     return nil 
    } 

我從以前的StackOverflow後(最後答案here)得到這個代碼。我意識到代碼可能不正確,但我不知道如何自己做這個。我相信這是包含錯誤的部分。該模型需要以下輸入類型:Image<RGB,224,224>

+0

我創建了一個帶有完整代碼的示例項目,可在此處找到:https://hackernoon.com/swift-tutorial-native-mach ine-learning-and-machine-vision-in-ios-11-11e1e88aa397 –

回答

29

您不需要做一堆圖像就可以使用核心ML模型和圖像 - 新的Vision framework可以爲您做到這一點。

import Vision 
import CoreML 

let model = try VNCoreMLModel(for: MyCoreMLGeneratedModelClass().model) 
let request = VNCoreMLRequest(model: model, completionHandler: myResultsMethod) 
let handler = VNImageRequestHandler(url: myImageURL) 
handler.perform([request]) 

func myResultsMethod(request: VNRequest, error: Error?) { 
    guard let results = request.results as? [VNClassificationObservation] 
     else { fatalError("huh") } 
    for classification in results { 
     print(classification.identifier, // the scene label 
       classification.confidence) 
    } 

} 

WWDC17 session on Vision應該有更多的信息 - 這是明天下午。

+0

工程就像一個魅力(有一些修改),謝謝。我沒有意識到Vision對於從圖像輸入中輸出信息的模型有特定類型的請求。我想我應該更加關注文檔...... –

+0

對於原始問題,'VNImageRequestHandler(cgImage:CGImage)'更合適。 – chengsam

+0

@chengsam並非真正的原始問題從磁盤上的資源開始。將它讀入'UIImage',轉換爲'CGImage',並將其傳遞給Vision遠離元數據,但是傳遞資源URL會使該元數據可用於Vision。 – rickster

9

您可以使用純CoreML,但你應該調整到(224224)

DispatchQueue.global(qos: .userInitiated).async { 
     // Resnet50 expects an image 224 x 224, so we should resize and crop the source image 
     let inputImageSize: CGFloat = 224.0 
     let minLen = min(image.size.width, image.size.height) 
     let resizedImage = image.resize(to: CGSize(width: inputImageSize * image.size.width/minLen, height: inputImageSize * image.size.height/minLen)) 
     let cropedToSquareImage = resizedImage.cropToSquare() 

     guard let pixelBuffer = cropedToSquareImage?.pixelBuffer() else { 
      fatalError() 
     } 
     guard let classifierOutput = try? self.classifier.prediction(image: pixelBuffer) else { 
      fatalError() 
     } 

     DispatchQueue.main.async { 
      self.title = classifierOutput.classLabel 
     } 
    } 

// ... 

extension UIImage { 

    func resize(to newSize: CGSize) -> UIImage { 
     UIGraphicsBeginImageContextWithOptions(CGSize(width: newSize.width, height: newSize.height), true, 1.0) 
     self.draw(in: CGRect(x: 0, y: 0, width: newSize.width, height: newSize.height)) 
     let resizedImage = UIGraphicsGetImageFromCurrentImageContext()! 
     UIGraphicsEndImageContext() 

     return resizedImage 
    } 

    func cropToSquare() -> UIImage? { 
     guard let cgImage = self.cgImage else { 
      return nil 
     } 
     var imageHeight = self.size.height 
     var imageWidth = self.size.width 

     if imageHeight > imageWidth { 
      imageHeight = imageWidth 
     } 
     else { 
      imageWidth = imageHeight 
     } 

     let size = CGSize(width: imageWidth, height: imageHeight) 

     let x = ((CGFloat(cgImage.width) - size.width)/2).rounded() 
     let y = ((CGFloat(cgImage.height) - size.height)/2).rounded() 

     let cropRect = CGRect(x: x, y: y, width: size.height, height: size.width) 
     if let croppedCgImage = cgImage.cropping(to: cropRect) { 
      return UIImage(cgImage: croppedCgImage, scale: 0, orientation: self.imageOrientation) 
     } 

     return nil 
    } 

    func pixelBuffer() -> CVPixelBuffer? { 
     let width = self.size.width 
     let height = self.size.height 
     let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, 
        kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary 
     var pixelBuffer: CVPixelBuffer? 
     let status = CVPixelBufferCreate(kCFAllocatorDefault, 
             Int(width), 
             Int(height), 
             kCVPixelFormatType_32ARGB, 
             attrs, 
             &pixelBuffer) 

     guard let resultPixelBuffer = pixelBuffer, status == kCVReturnSuccess else { 
      return nil 
     } 

     CVPixelBufferLockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0)) 
     let pixelData = CVPixelBufferGetBaseAddress(resultPixelBuffer) 

     let rgbColorSpace = CGColorSpaceCreateDeviceRGB() 
     guard let context = CGContext(data: pixelData, 
             width: Int(width), 
             height: Int(height), 
             bitsPerComponent: 8, 
             bytesPerRow: CVPixelBufferGetBytesPerRow(resultPixelBuffer), 
             space: rgbColorSpace, 
             bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) else { 
             return nil 
     } 

     context.translateBy(x: 0, y: height) 
     context.scaleBy(x: 1.0, y: -1.0) 

     UIGraphicsPushContext(context) 
     self.draw(in: CGRect(x: 0, y: 0, width: width, height: height)) 
     UIGraphicsPopContext() 
     CVPixelBufferUnlockBaseAddress(resultPixelBuffer, CVPixelBufferLockFlags(rawValue: 0)) 

     return resultPixelBuffer 
    } 
} 

預期圖像尺寸輸入您可以在mimodel文件中找到的圖像: enter image description here

一個演示使用純CoreML和Vision變體的項目可以在這裏找到:https://github.com/handsomecode/iOS11-Demos/tree/coreml_vision/CoreML/CoreMLDemo

+0

我以爲我在視覺會議(或其他ML會議)中聽到您不必調整圖像大小......也許我錯了。 – pinkeerach

+3

@pinkeerach:如果您使用Vision API('VNCoreMLRequest',如我的答案)那樣,您不必調整圖像大小,因爲Vision會爲您處理圖像處理部分。如果您直接使用Core ML(無Vision),則必須調整圖像大小並重新格式化(無論您使用的特定模型是否需要)並將其自動轉換爲「CVPixelBuffer」。 – rickster

+0

@mauryat你的示例項目什麼都不做。真的沒有代碼。 – zumzum