集羣計算教程 - 問題與傳播

-1

雖然下面這個非常有趣的教程（https://rpubs.com/hrbrmstr/customer-segmentation-r），我遇到了一個我真的不明白的錯誤。集羣計算教程 - 問題與傳播

以下是導致消息'錯誤：值列'n'在輸入中不存在的代碼片段。在Rstudio 1.0.136中：

library(readxl) 
library(dplyr) 
library(tidyr) 
library(viridis) 
library(ggplot2) 
library(ggfortify) 

url <- "http://blog.yhathq.com/static/misc/data/WineKMC.xlsx" 
fil <- basename(url) 
if (!file.exists(fil)) download.file(url, fil) 

offers <- read_excel(fil, sheet = 1) 
colnames(offers) <- c("offer_id", "campaign", "varietal", "min_qty", "discount", "origin", "past_peak") 
head(offers, 12) 

transactions <- read_excel(fil, sheet = 2) 
colnames(transactions) <- c("customer_name", "offer_id") 
transactions$n <- 1 
head(transactions) 

left_join(offers, transactions, by="offer_id") %>% 
    count(customer_name, offer_id, wt=n) %>% 
    spread(offer_id, n) %>% 
    mutate_each(funs(ifelse(is.na(.), 0, .))) -> dat

最後一行是創建問題的行。

有人會知道爲什麼嗎？

來源

2017-04-17 Romain

一般來說，你應該在這裏發佈一個可重複的例子，而不是使用一個在幾年內易於破解的鏈接。一些指導：http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250另外，當然，你應該確定你使用的是哪種工具本身。「傳播」是不是在R的東西。 – Frank

當然，我的不好，我修改了一個可重複的例子 – Romain

原來的帖子好的謝謝。如果它需要一些數據的博客，它仍然不是長期可重複的。另外，如果你需要加載所有這些軟件包，那大概只有dplyr，這並不是很簡單。理想是[mcve]。無論如何，您可以通過查看'count'步驟是否生成一個名爲'n'的列來開始調試。 – Frank

請看看手冊頁的?dplyr::count：

Note

The column name in the returned data is usually n, even if you have supplied a weight.

If the data already already has a column named n, the output column will be called nn. If the table already has columns called n and nn then the column returned will be nnn, and so on.

在這種情況下，原始數據已經有一個叫做n列，因此count後的新列將被稱爲nn。因此，您必須將spread(offer_id, n) %>%更改爲spread(offer_id, nn) %>%。該教程可能會在此更改之前編寫。

來源

2017-04-18 01:20:54 mt1022

集羣計算教程 - 問題與傳播

回答

相關問題