2016-02-25 4344 views
0

我正在研究R項目。在試圖分析情緒時,我必須創建一個數據框(在我的前面,這是「sentiment.df」)。data.frame中的錯誤:參數意味着不同的行數:2,5,19,7,1,11,4,6,9,3,13,14,22,26,27,30,31,29,35

sentiment.df <- data.frame(text, emotion=emotion, polarity=polarity, stringsAsFactors=FALSE) 

在這裏,文本 - 包含處理(清理)的推文分成關鍵字的列表;情感 - 包含一包角色的情感;極性 - 包含+ ve,-ve評論家。當運行上面的LOC我RStudio引發了以下錯誤:

Error in data.frame(c("httpstcoux1aacnxbk", "endalz"), c("i", "have", : 
    arguments imply differing number of rows: 2, 5, 19, 7, 1, 11, 4, 6, 9, 3, 13, 17, 8, 10, 24, 21, 15, 12, 25, 16, 20, 23, 18, 28, 14, 22, 26, 27, 30, 31, 29, 35 

的3個變量的長度 - 文本,情感&極性都是一樣的:2621

這是我的數據看起來像:

> str(text) 
List of 2621 
$ : chr [1:2] "httpstcoux1aacnxbk" "endalz" 
$ : chr [1:5] "i" "have" "the" "best" ... 
$ : chr [1:19] "kenny" "easley" "seahawks" "captain" ... 
$ : chr [1:2] "good" "defense" 
$ : chr [1:7] "superbowlxlix" "party" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ "" ... 
$ : chr "ihatetombrady" 
$ : chr [1:11] "coachbourbonusa" "understood" "still" "dont" ... 
$ : chr [1:19] "tiwaworks" "whitney" "houston" "sings" ... 
$ : chr [1:4] "thats" "still" "bae" "<U+2764><U+FE0F>""| __truncated__ 
$ : chr [1:6] "were" "a" "thousand" "miles" ... 
$ : chr [1:7] "dredoo24" "what" "i" "like" ... 
$ : chr [1:2] "bww" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ 
$ : chr [1:9] "i" "seriously" "cant" "wait" ... 
$ : chr [1:3] "flyysociety" "photoshoot<U+2716><U+FE0F>""| __truncated__ "httptcoxkywsj5i2x" 
$ : chr [1:5] "lienne11" "wait" "whos" "performing" ... 
$ : chr [1:13] "game" "on" "go" "wildcats<U+FFFD><U+FFFD>\u2b07<U+FE0F>""| __truncated__ ... 
$ : chr [1:2] "good" "defense" 
$ : chr [1:11] "seattle" "seahawks" "fan" "" ... 
$ : chr [1:9] "realprestonj" "congratulations" "preston" "the" ... 
$ : chr [1:5] "tsu19" "so" "funny" "bruh" ... 
$ : chr [1:4] "drunk" "tweets" "coming" "soon" 
$ : chr "tb12" 
$ : chr [1:13] "hicksville" "schools" "will" "be" ... 
$ : chr [1:5] "but" "momma" "said" "superbowl" ... 
$ : chr [1:4] "raggedy" "ass" "bitch" "" 
$ : chr [1:5] "arbyscares" "arbys" "prairie" "village" ... 
$ : chr [1:17] "lovetruth79" "ltltltloves" "to" "send" ... 
$ : chr [1:8] "「boynamedhxlz""| __truncated__ "quote" "this" "tweet" ... 
$ : chr [1:13] "stretching" "for" "ballet" "now" ... 
$ : chr [1:7] "jerrodflusche" "janabewley" "narnia" "for" ... 
$ : chr [1:8] "here" "goes" "my" "whole" ... 
$ : chr [1:10] "who" "you" "going" "for" ... 
$ : chr [1:3] "good" "stop" "hawks" 
$ : chr [1:5] "brady" "be" "smokin" "blounts" ... 
$ : chr [1:8] "me" "decepcioné" "perdoné" "hice" ... 
$ : chr [1:7] "happy21stbirthdayharry" "" "its" "also" ... 
$ : chr [1:24] "teammic3rd" "sounds" "amazing" "" ... 
$ : chr [1:21] "millions" "of" "people" "packed" ... 
$ : chr [1:8] "missed" "idina" "singing" "by" ... 
$ : chr [1:2] "your" "stupid" 
$ : chr [1:5] "seahawks" "all" "the" "way" ... 
$ : chr [1:4] "takeathillpill" "you" "are" "vile" 
$ : chr [1:3] "lets" "goo" "superbowlixlix" 
$ : chr [1:4] "snow" "day" "nigga" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ 
$ : chr [1:6] "ill" "just" "watch" "total" ... 
$ : chr [1:9] "liveextra" "site" "down" "its" ... 
$ : chr [1:3] "time" "to" "punt" 
$ : chr [1:5] "zachdettloff516" "groans" "at" "terrible" ... 
$ : chr [1:3] "go" "seahawks" "<U+FFFD><U+FFFD>""| __truncated__ 
$ : chr [1:7] "pizza" "friends" "super" "bowl" ... 
$ : chr [1:9] "hold" "onto" "me" "cause" ... 
$ : chr [1:6] "tom" "gonna" "get" "his" ... 
$ : chr [1:6] "lets" "goooooo" "nice" "3rd" ... 
$ : chr [1:15] "2" "fatal" "crashes" "reported" ... 
$ : chr [1:12] "supra" "dope" "atx" "sundayfunday" ... 
$ : chr [1:19] "all" "these" "students" "from" ... 
$ : chr [1:3] "danstricko" "not" "happening" 
$ : chr [1:17] "tom" "brady" "may" "wear" ... 
$ : chr "httptconqabzdezwf" 
$ : chr [1:4] "i" "miss" "you" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ 
$ : chr [1:25] "john" "legend" "and" "idina" ... 
$ : chr [1:13] "snowed" "in" "with" "kadybuchler" ... 
$ : chr [1:6] "that" "bright" "green" "and" ... 
$ : chr [1:9] "ive" "got" "the" "seahawks" ... 
$ : chr [1:9] "sds" "by" "mac" "miller" ... 
$ : chr [1:5] "jakeski52" "rotowire" "or" "roger" ... 
$ : chr "damnit" 
$ : chr "hawks" 
$ : chr [1:7] "my" "nephews" "and" "niece" ... 
$ : chr [1:16] "liking" "your" "own" "posts" ... 
$ : chr [1:2] "bailaconbruce" "fb" 
$ : chr [1:4] "djones7" "hell" "no" "<U+FFFD><U+FFFD>""| __truncated__ 
$ : chr [1:7] "best" "part" "of" "the" ... 
$ : chr [1:13] "holls016" "f" "u" "i" ... 
$ : chr [1:6] "mikebarnicle" "nice" "to" "meet" ... 
$ : chr [1:5] "u" "played" "me" "dirty" ... 
$ : chr [1:13] "my" "bac" "is" "looking" ... 
$ : chr [1:2] "est" "2008" 
$ : chr [1:12] "vacation" "time" "" "thats" ... 
$ : chr [1:3] "<U+FFFD><U+FFFD>""| __truncated__ "ok" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD"| __truncated__ 
$ : chr [1:2] "common" "seattle" 
$ : chr [1:3] "no" "cacc" "talc" 
$ : chr "lob" 
$ : chr [1:3] "cut" "the" "crap" 
$ : chr [1:11] "im" "at" "las" "alitas" ... 
$ : chr [1:3] "backstreets" "back" "alrighttttt" 
$ : chr [1:6] "the" "seahawks" "are" "going" ... 
$ : chr [1:13] "baby" "its" "cold" "outside" ... 
$ : chr [1:15] "i" "have" "sooo" "much" ... 
$ : chr [1:10] "so" "whos" "gonna" "pull" ... 
$ : chr [1:5] "my" "driveway" "tonight" "nwiweather" ... 
$ : chr "fuck" 
$ : chr [1:21] "now" "that" "its" "actually" ... 
$ : chr [1:7] "green" "goats" "<U+FFFD><U+FFFD>""| __truncated__ "" ... 
$ : chr [1:15] "i" "guess" "its" "time" ... 
$ : chr [1:3] "lets" "go" "seattle" 
$ : chr [1:20] "jozybrambila7" "do" "you" "ever" ... 
$ : chr [1:4] "reggiewo" "nice" "choice" "cheers" 
$ : chr [1:20] "i" "enjoy" "super" "bowl" ... 
    [list output truncated] 

> str(emotion) 
chr [1:2621] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "joy" ... 
> str(polarity) 
chr [1:2621] "positive" "positive" "positive" "positive" "positive" "positive" "positive" ... 

當我在網上發佈這個錯誤時,程序員說沒有行& cols不一樣。即它不是一個正方形矩陣,Dataframe將不能用於矩形矩陣。

如果有人幫我解決了這個錯誤,將不勝感激。

在此先感謝!

+2

你可以檢查'str(情緒)'和'str(極性)' – akrun

+1

有時物體看起來沒問題,但是內部有一個問題結構。看'STR(文本)' –

+0

我猜你有'文本'存儲爲列表,所以它試圖使列表的每一部分列。你可以嘗試'data.frame(unlist(text),emotion = emotion,polarity = polarity,stringsAsFactors = FALSE)'取決於你的確切數據佈局 – jeremycg

回答

1

您有2621 列表在'文本'中但文本條目的數量不相同。 每個列表可能包含不同數量的單詞。 因此,即使unlist()也不會幫助你,因爲所有單詞的數量都大於「情感」和「極性」向量中的條目數。

+0

@VKirilenko –

相關問題