在.txt文件中搜索並比較python中的兩個字符串值？

"cadence_regulatable_result": "completeRecognition", 
    "appserver_results": { 
     "status": "success", 
     "final_response": 0, 
     "payload": { 
      "actions": [{ 
       "speaker": "user", 
       "type": "conversation", 
       "nbest_text": { 
        "confidences": [478, 
        0, 
        0], 
        "words": [[{ 
         "stime": 0, 
         "etime": 1710, 
         "word": "ConnectedDrive\\*no-space-before", 
         "confidence": "0.241" 
        }], 
        [{ 
         "stime": 0, 
         "etime": 1020, 
         "word": "Connected\\*no-space-before", 
         "confidence": "0.0" 
        }, 
        { 
         "stime": 1020, 
         "etime": 1710, 
         "word": "drive", 
         "confidence": "0.0" 
        }], 
        [{ 
         "stime": 0, 
         "etime": 900, 
         "word": "Connect\\*no-space-before", 
         "confidence": "0.0" 
        }, 
        { 
         "stime": 900, 
         "etime": 980, 
         "word": "to", 
         "confidence": "0.0" 
        }, 
        { 
         "stime": 980, 
         "etime": 1710, 
         "word": "drive", 
         "confidence": "0.0" 
        }]], 
        "transcriptions"= ["ConnectedDrive", 
        "Connected drive", 
        "Connect to drive"] 
       } 
      }] 
     } 
    }, 
    "final_response": 0, 
    "prompt": "", 
    "result_format": "appserver_post_results" 
}: form-data;name="QueryResult"Content-Type: application/JSON;charset=utf-8Nuance-Context: efb3d3ce-ef50-4e83-8c31-063c3f5208aa{ 
    "status_code": 0, 
    "result_type": "DRAGON_NLU_ASR_CMD", 
    "NMAS_PRFX_SESSION_ID": "f786f0be-d547-4fca-8d72-96429a30c9db", 
    "NMAS_PRFX_TRANSACTION_ID": "1", 
    "audio_transfer_info": { 
     "packages": [{ 
      "time": "20151221085512579", 
      "bytes": 1633 
     }, 
     { 
      "time": "20151221085512598", 
      "bytes": 3969 
     }], 
     "nss_server": "10.56.11.186:4503", 
     "end_time": "20151221085512596", 
     "audio_id": 1, 
     "start_time": "20151221085512303" 
    }, 
    "cadence_regulatable_result": "completeRecognition", 
    "appserver_results": { 
     "status": "success", 
     "final_response": 1, 
     "payload": { 
      "diagnostic_info": { 
       "adk_dialog_manager_status": "undefined", 
       "nlu_version": "[NLU_PROJECT:NVCCP-eng-USA];[D0160932];[VL-Models:Version: vl.1.100.12-2-GMT20151130160335]", 
       "nlps_host": "mt-dmz-nlps002.nuance.com:8636", 
       "nlps_ip": "10.56.10.51", 
       "application": "AUDI_2017", 
       "nlu_component_flow": "[Input:VoiceJSON] [FieldID|auto_main] [NLUlib|C-eckart-r$Rev$.f20151118.1250] [build|G-r72490M.f20151130.1055] [vlmodel|Version: 2-GMT20151130160335] [Flow|+VlingoTokenized]", 
       "third_party_delay": "0", 
       "nmaid": "AUDI_SDS_2017_EXT_20151203", 
       "nlps_profile": "AUDI_2017", 
       "fieldId": "auto_main", 
       "nlps_profile_package_version": "r159218", 
       "nlu_annotator": "com-GBR.ncs51.VlingoNLU-client-qNVCCP_NCS51", 
       "ext_map_time": "2", 
       "nlu_use_literal_annotator": "0", 
       "int_map_time": "2", 
       "nlps_nlu_type": "nlu_project", 
       "nlu_language": "eng-GBR", 
       "timing": { 
        "finalRespSentDelay": "188", 
        "intermediateRespSentDelay": "648" 
       }, 
       "nlps_profile_package": "AUDI_2017" 
      }, 
      "actions": [{ 
       "Input": { 
        "Interpretations": ["ConnectedDrive"], 
        "Type": "asr" 
       }, 
       "Instances": [{ 
        "nlu_classification": { 
         "Domain": "UDE", 
         "Intention": "Unspecified" 
        }, 
        "nlu_interpretation_index": 1, 
        "nlu_slot_details": { 
         "Name": { 
          "literal": "ConnectedDrive" 
         }, 
         "Search-phrase": { 
          "literal": "connecteddrive" 
         } 
        }, 
        "interpretation_confidence": 4549 
       }], 
       "type": "nlu_results", 
       "api_version": "1.0" 
      }], 
      "nlps_version": "nlps(z):6.1.100.12.2-B359;Version: nlps-base-GMT20151130193521;" 
     } 
    },

首先，我在.txt文件搜索記錄以及解釋字（所以我使用正則表達式），那麼我想用Interpreations比較轉錄的第一個值（「送我到充電站」）價值（「驅車到充電站」）。如果我在我的程序給下面，它只是打印成識別無效在.txt文件中搜索並比較python中的兩個字符串值？

directory =os.path.join("C:\Users\hemanth_venkatappa\Desktop\Working\pcm-audio\English") 
for subdir, dirs, files in os.walk(directory): 
    for file in files: 
     if file.endswith(".txt"): 
      content=json.load(file) 
      if "status_code" in content: 
       if content["status_code"]==0: 
        print("valid")

來源

2015-12-21 Hemanth Venkatappa

，你能否告訴一下數據文件的其餘部分包含的內容。 –

對不起，意思是數據文件。 –

.txt文件很大。我不能把它放在這裏。 –

您可以在difflib看一看爲使用Python文本比較。

The difflib module contains tools for computing and working with differences between sequences. It is especially useful for comparing text, and includes functions that produce reports using several common difference formats.

difflib tutorial

使用這個模塊，您可以評估兩個字符串或.txt文件這樣的區別：

import difflib 

a = ["Drive me to a charging station", "Drive me to charging station", "Drive me to a charging Station"] 
correct = ["Drive me to a charging station"] 

print difflib.SequenceMatcher(None, a[0], correct[0]).ratio() 
>> 1.0 

print difflib.SequenceMatcher(None, a[1], correct[0]).ratio() 
>> 0.965517241379 

print difflib.SequenceMatcher(None, a[2], correct[0]).ratio() 
>> 0.966666666667

正如你所看到的，％.ratio()和correct 100之間a[0]是1.0。這意味着它們是相同的字符串。

可以使用loop來evaluate the ratios和if ratio == 1.0然後print "Recognition is VALID "

此外，如果你不想使用.ratio(）的字符串之間，您可以使用查詢的區別：

d = difflib.Differ() 
diff = d.compare(a, correct) 
print '\n'.join(diff)

這的代碼塊給我：

Drive me to a charging   # no signal at the start means it's the same string 
- Drive me to charging station # this string has less chars than the expected string 
- Drive me to a charging Station # same here

那麼你就必須想法子根據您的期望打印Recognition is VALID or INVALID。

來源

2015-12-21 11:01:36

感謝您的回覆。但首先，我想解析.txt文件來找到一個正確的（在我的情況下：解釋和轉錄），然後我必須進行比較。如何做到這一點你的建議？ –

你的'a'和'corrects'字符串在同一個'.txt'裏面？ –

我修改了上面的代碼。但它顯示錯誤爲：未定義'轉錄' –

這似乎是JSON。您應該能夠對整個文件加載到字典中：

import json 
data = json.load(f)

現在data包含其他字典和列表的字典。你需要通過探索字典找到你的路。

與此類似：

您需要調整您的真實數據。在交互式提示下玩耍，找出需要使用的密鑰和索引。

現在你檢查它是否包含：

if interpretations[0] in transcriptions: 
    print('found', interpretations[0])

你最後的方案可以類似於此：

def find_interpretations(fobj): 
    data = json.load(fobj) 
    interpretations = data["appserver_results"]["actions"][0]["Input"]["Interpretations"] 
    transcriptions = (data["cadence_regulatable_result"]["completeRecognition"]["appserver_results"] 
       ["payload"]["actions"][0]["nbest_text"]["transcriptions"]) 
    if interpretations[0] in transcriptions: 
     return interpretations[0] 
    return None 

for subdir, dirs, files in os.walk(directory): 
    for file in files: 
     if file.endswith(".txt"): 
      file_name = os.path.join(subdir, file) 
      with open(file_name) as fobj: 
       found = find_interpretations(fobj) 
       if found: 
        print('found: {} in file: {}'.format(found, file_name)

來源

2015-12-21 12:13:05

下面兩行，讀取.txt文件，但問題是如何比較解釋和轉錄內容或值？ f = open（os.path.join（subdir，file），'r'） a = f.read（） –

這是否解決了您的問題？ –

我修改了基於您的建議的代碼，但它不工作。我得到的錯誤爲：if在轉錄中的解釋[0]： NameError：name'Interpretations'未定義 –

由於與difflib和json的嘗試帶領你一事無成，這是基於您的問題修訂版2中的原始方法;它基本上只是使用re.search而不是re.findall首先要檢查的轉錄是否等於解釋：

#!/usr/bin/env python3 
import os 
import re 
directory = os.path.join("../data/English") 
for subdir, dirs, files in os.walk(directory): 
    for file in files: 
     if file.endswith(".txt"): 
      f = open(os.path.join(subdir, file),'r') 
      a = f.read() 
      if re.findall('\"status_code\": 0', a): 
       print('Status is Valid') 
      else: 
       print('Status is Invalid') 
      m = re.search('"transcriptions"= ."(.*)"', a) 
      if m and re.search('"Interpretations": ."'+m.group(1), a): 
       print('Recognition is VALID') 
      else: 
       print('Recognition is INVALID')

來源

2018-01-09 10:45:31 Armali

在.txt文件中搜索並比較python中的兩個字符串值？

回答

相關問題