2013-04-10 80 views
0

這真的很基本,但我陷入了過於複雜的代碼。我有一個CSV文件,包含一列測試,一列標記和一列學生。我想重新格式化數據,以便我有一排學生標記和測試列。初學者重新排列csv文件中的數據

我創建了一個單獨的csv,其中包含學生(如數字代碼),名爲「students.csv」,因爲現在這很容易。

我有52名學生和50個測試。

我可以得到下面的與單個學生的工作:

matricNumbers <- read.csv("students.csv") 
students <- as.vector(as.matrix(matricNumbers)) 
students 
data <- read.csv("marks.csv") 
studentSubset <- data[data[2] == 1150761,] 
marksSubset <- as.vector(as.matrix(studentSubset[5])) 
ll <- list() 
ll<-c(list(marksSubset), ll) 
dd<-data.frame(matrix(nrow=50,ncol=50)) 
for(i in 1:length(ll)){ 
    dd[i,] <- ll[[i]] 

} 
dd 

,但我似乎無法得到這個與for循環工作,要經過每一個學生。

getMarks <-function(studentNumFile,markFile){ 

matricNumbers <- read.csv(studentNumFile) 
students <- as.vector(as.matrix(matricNumbers)) 


data <- read.csv(markFile) 

for (i in seq_along(students)){ 
    studentSubset <- data[data[2] == i,] 
    marksSubset <- as.vector(as.matrix(studentSubset[5])) 
    ll <- list() 
    ll<-c(list(marksSubset), ll) 
    dd<-data.frame(matrix(nrow=52,ncol=50)) 
    for(i in 1:length(ll)){ 
     dd[i,] <- ll[[i]] 
    } 
} 
return(dd) 
} 

getMarks("students.csv","marks.csv") 

我收到錯誤:

Error in `[<-.data.frame`(`*tmp*`, i, , value = logical(0)) : replacement has 0 items, need 50 

我相信這是由於嵌套循環for但我無法弄清楚如何以其他方式做到這一點。

+0

當我停止時,「i」的值是多少?這應該是導致錯誤的那個人。你能展示那個子集嗎?另外,你是否嘗試用'j'替換嵌套循環中的'i'以獲得清晰? – 2013-04-10 13:04:50

回答

1

如果我正確理解問題,則可以使用reshape包實現所需。由於您不提供樣本數據,因此很難進行測試。我建議你將dput(head(matricNumbers))的輸出粘貼到上面的代碼塊中。

但是,你應該能夠遵循這個簡單的例子,我用一些虛擬數據。我想你可能只需要一行,而且你可以忘記所有複雜的循環的東西!

# These lines make some dummy data, similar to you matricNumbers (hopefully) 
test = sort(sample(c("Biology","Maths","Chemistry") , 10 , repl = TRUE)) 
students = unlist(sapply(table(test), function(x) { sample(letters[1:x] , x) })) 
names(students) <- NULL 
scores <- data.frame(test , mark = sample(40:100 , 10 , repl = TRUE) , students) 
scores 
     test mark students 
1 Biology 50  c 
2 Biology 93  a 
3 Biology 83  b 
4 Biology 83  d 
5 Chemistry 71  b 
6 Chemistry 54  c 
7 Chemistry 54  a 
8  Maths 97  c 
9  Maths 93  b 
10  Maths 72  a 



# Then use reshape to cast your data into the format you require 
# I use 'mean' as the aggregation function. If you have one score for each student/test, then mean will just return the score 
# If you do not have a score for a particular student in that test then it will return NaN 
require(reshape) 
bystudent <- cast(scores , students ~ test , value = "mark" , mean) 
bystudent 
    students Biology Chemistry Maths 
1  a  93  54 72 
2  b  83  71 93 
3  c  50  54 97 
4  d  83  NaN NaN 
+0

完美,這真是太容易了!謝謝! – EnduroDave 2013-04-10 15:25:01