我正在嘗試爲足球比賽提供解析器。我在這裏非常鬆散地使用「自然語言」這個詞,所以請耐心等待,因爲我對這個領域一無所知。自然語言解析器,用於解析體育比賽數據
這裏是什麼我用 工作的一些例子(格式:TIME | DOWN & DIST | OFF_TEAM |說明):
04:39|4th and [email protected]|Dal|Mat McBriar punts for 32 yards to NYJ14. Jeremy Kerley - no return. FUMBLE, recovered by NYJ.|
04:31|1st and [email protected]|NYJ|Shonn Greene rush up the middle for 5 yards to the NYJ21. Tackled by Keith Brooking.|
03:53|2nd and [email protected]|NYJ|Mark Sanchez rush to the right for 3 yards to the NYJ24. Tackled by Anthony Spencer. FUMBLE, recovered by NYJ (Matthew Mulligan).|
03:20|1st and [email protected]|NYJ|Shonn Greene rush to the left for 4 yards to the NYJ37. Tackled by Jason Hatcher.|
02:43|2nd and [email protected]|NYJ|Mark Sanchez pass to the left to Shonn Greene for 7 yards to the NYJ44. Tackled by Mike Jenkins.|
02:02|1st and [email protected]|NYJ|Shonn Greene rush to the right for 1 yard to the NYJ45. Tackled by Anthony Spencer.|
01:23|2nd and [email protected]|NYJ|Mark Sanchez pass to the left to LaDainian Tomlinson for 5 yards to the 50. Tackled by Sean Lee.|
截至目前,我寫了一個愚蠢的解析器,手柄所有簡單的東西(playid,quarter,time,down &距離,攻擊性團隊)以及一些腳本,並獲取這些數據並將其清理成上述格式。一條線變成一個「Play」對象存儲到數據庫中。
最困難的部分在這裏(至少對我來說)是解析該劇的描述。下面是一些我想從該字符串中提取信息:
例字符串:
"Mark Sanchez pass to the left to Shonn Greene for 7 yards to the NYJ44. Tackled by Mike Jenkins."
結果:
turnover = False
interception = False
fumble = False
to_on_downs = False
passing = True
rushing = False
direction = 'left'
loss = False
penalty = False
scored = False
TD = False
PA = False
FG = False
TPC = False
SFTY = False
punt = False
kickoff = False
ret_yardage = 0
yardage_diff = 7
playmakers = ['Mark Sanchez', 'Shonn Greene', 'Mike Jenkins']
,我有我的最初的解析器邏輯去這樣的事情:
# pass, rush or kick
# gain or loss of yards
# scoring play
# Who scored? off or def?
# TD, PA, FG, TPC, SFTY?
# first down gained
# punt?
# kick?
# return yards?
# penalty?
# def or off?
# turnover?
# INT, fumble, to on downs?
# off play makers
# def play makers
描述可以變得很毛(多個冒泡&罰款回收等),我想知道我是否可以利用一些NLP模塊。很有可能我會花數天時間在一個愚蠢的/靜態的機器上,比如解析器,但是如果有人對如何使用NLP技術來處理它有所建議,我想聽聽他們。
鑑於問題的主題,我覺得有趣的是,SO語法突出顯示器突出顯示了所有的人名... – Jon