0
我想爲StanfordNPP創建自己的序列化訓練模型,但訓練代碼花費了大量時間。StanfordNLP CRFClassifier需要太多時間
我的配置如下:
# location of the training file
trainFile=dictionary.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo=dictionary.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map=word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
type=crf
useQN=true
QNsize=2
featureDiffThresh=0.05
saveFeatureIndexToDisk=true
readerAndWriter=edu.stanford.nlp.sequences.ColumnDocumentReaderAndWriter
任何人可以幫助我這個?有人希望我的培訓文件能夠理解嗎?
要添加,我得到以下異常: - – user3279692