用重複測量對數據進行應急測試

我希望有人能夠給我提供一些指導或幫助。我有一個數據集，其中包含一個在三年內已經過感染測試的人羣。一些人（不是全部）在一年多以前被抽樣（因此它們代表重複測量）。我想確定感染的流行是否隨着時間的推移而變化，但是我正在麻煩決定適當的測試。一個簡單的應急測試違反了獨立性的假設，因爲跨越多年重複的個人。我不認爲Cochran-Mantel-Haenszel測試或McNemar Chi-square測試是合適的，但如果我錯了，請隨時糾正我。這裏是我正在使用的數據集，「AnID」變量是代表單個個體的因素（因此如果多年抽樣一個人，您會看到該數字重複2或3次）。用重複測量對數據進行應急測試

我認爲，一個可行的辦法是隨機重新採樣數據多次（無需更換），每次只包括一個單獨的一次，整個年執行應急測試。如果沒有差異的零假設至少在95％的時間內被拒絕，那麼我可以可靠地聲稱存在差異。我還不夠好，還沒有寫出我自己的代碼。預先感謝您提供的任何幫助。

dput（實施例）結構（列表（ANID =結構（C（37L，37L，45L，45L，45L，55L， 55L，62L，62L，68L，68L，1L，1L，2L， 3L，3L，4L，9L，9L，18L， 18L，18L，19L，19L，19L，20L，20L，21L，22L，22L，23L，24L，24L， 24L，25L，25L，25L，26L， 27L，28L，28L，28L，29L，29L，29L，30L， 31L，32L，32L，33L，34L，35L，36L，38L，38L，39L，39L，40L，41L， 41L，42L，42L， 42L，43L，43L，43L，44L，46L，46L，46L，47L，47L， 47L，48L，48L，48L，49L，49L，49L，50L，51L，52L，52L，53L，53L， 54L， 54L，56L，56L，57L，57L，57L，58L，59L，60L，61L，63L，64L， 65L ，66L，67L，69L，70L，71L，72L，73L，74L，74L，5L，6L，7L， 8L，10L，11L，12L，13L，14L，15L，16L，17L）「10」，「11」，「12」，「13」，「136」，「137」，「138」，「139」，「14」，「140」，「141」，「142」「143」「144」「145」「146」「147」「26」「27」28「29」「30」「31」「37」 38，39，40，41，42，43，44，45，，46，47，48，49，5 50「，51」，52「，」53「，」57「，」58「，」59「，」6「，」60「，」61「，」62「，」63「「64」「65」「66」「67」「69」「7」，「70」，「71」，「72」，「75」，「76」，「77」「8」「82」「83」「84」「85」「86」「9」「90」「94」「95」「96」「97」結構（c）（1L，2L，1L，2L，3L，1L，2L，2L，3L，2L， 3L，2L，3L，2L，2L，3L），2L，2L，3 L，1L，2L，3L，1L，2L，3L， 2L，3L，2L，1L，2L，2L，1L，2L，3L，1L，2L，3L，2L，2L，1L， 2L，3L， 1L，2L，3L，2L，2L，2L，3L，2L，2L，2L，2L，2L，3L， 2L，3L，2L，2L，3L，1L，2L，3L，1L，2L，3L，2L ，1L，2L，3L， 1L，2L，3L，1L，2L，3L，1L，2L，3L，2L，2L，1L，2L，1L，2L， 1L，2L，1L，2L，1L，2L 3L，3L，3L，3L，3L，3L，1L，1L，1L，1L，1L，1L，1L，1L， 3L，3L，3L，3L，3L），...。標籤= c（「2012」，「2013」，「2014」），class =「factor」）， value = c（「Pos」，「Pos」，「Pos」，「Pos」 Neg「，」Neg「，」Pos「，」Pos「，」Pos「，」Pos「，」Pos「，」Pos「，」Neg「，」Neg「，」Pos「，」Neg「 Pos「，」Neg「，」Pos「，」Pos「，」Neg「，」Neg「，」Neg「，」Neg「，」Neg「，」Neg「，」Pos「，」Pos 「Pos」，「Pos」，「Pos」，「Pos」，「Neg」，「Pos」，「Pos」，「Neg」，「Neg」，「Neg」，「Neg」，「Pos」，「Pos」，「Pos」，「Neg」，「Neg」，「Pos」，「Pos」，「Neg」，「Pos」，「Neg」，「Pos」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Pos」，「Pos」，「Pos」「Neg」，「Neg」，「Pos」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Pos」 Pos「，」Neg「，」Neg「，」Neg「，」Pos「，」Pos「，」Pos「，」Pos「，」Pos 「Neg」，「Neg」，「Neg」，「Pos」，「Pos」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Pos 「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Neg」，「Pos」，「Pos」，「Pos」，「Pos」，「Pos」，「Neg」，「Neg」，「Pos」，「Neg」，「Pos」，「Neg」）），.Names = c（「AnID」年」，「值」），row.names = 187：306中，class = 「data.frame」）

來源

2017-02-17 giderk

記住，實驗/測試設計需要預先的有效樣本大小計算，以便如果存在統計顯着性差異，則最大化可能性。（欲瞭解更多信息，請看這裏：https://en.wikipedia.org/wiki/Sample_size_determination和https://en.wikipedia.org/wiki/Statistical_power）。

如果您的所有用戶都在科目之前/之後（例如test/contol），您可以執行McNemar的比例比較測試（請參閱：https://en.wikipedia.org/wiki/McNemar's_test）。

然而，並非所有的用戶都有重複的測量，所以我選擇爲每個用戶隨機選擇一年，這樣我就可以有3個獨立的樣本值。

假設dt是你的數據集...

library(dplyr) 

set.seed(1) # this will help you having a specific random sampling 

dt %>%      
    mutate(Pos = ifelse(value == "Pos", 1, 0)) %>% # create a binary variable to flag positives 
    group_by(AnID) %>%        # for each user 
    sample_n(1) %>%         # get one row/value randomly 
    group_by(year) %>%        # for each year 
    summarise(N = n(),        # get number of users 
      N_Pos = sum(Pos),      # get number of positive users 
      Prc_Pos = mean(Pos)) %>%    # get percentage of positives 
    print() -> tbl1         # print and save it 

# # A tibble: 3 × 4 
#  year  N N_Pos Prc_Pos 
# <fctr> <int> <dbl>  <dbl> 
# 1 2012 23  6 0.2608696 
# 2 2013 27  9 0.3333333 
# 3 2014 24 13 0.5416667

觀察上述百分比每年之後，你可以

# run the statistical comparison of proportions 
prop.test(tbl1$N_Pos, tbl1$N) 

# 3-sample test for equality of proportions without continuity correction 
# 
# data: tbl1$N_Pos out of tbl1$N 
# X-squared = 4.3038, df = 2, p-value = 0.1163 
# alternative hypothesis: two.sided 
# sample estimates: 
# prop 1 prop 2 prop 3 
# 0.2608696 0.3333333 0.5416667

P值爲跑這裏來了一個比例比較（0.1163）表明，我們在積極的可能性方面，沒有任何證據表明這些年份存在差異。

如果您發現有所不同，您可以在年份之間進行配對比較。

# run pairwise comparisons 
pairwise.prop.test(tbl1$N_Pos, tbl1$N) 

# Pairwise comparisons using Pairwise comparison of proportions 
# 
# data: tbl1$N_Pos out of tbl1$N 
# 
# 1 2 
# 2 0.80 - 
# 3 0.29 0.45 
# 
# P value adjustment method: holm

這裏的輸出是3個p值（3對比較）。正如所料，他們都表示沒有證據顯示這些年份之間存在差異。

您可以在一個函數內使用上述過程並創建N個模擬。檢查這些模擬中有多少可以找到具有統計意義的結果。

來源

2017-02-17 14:39:18 AntoniosK

謝謝！這工作得很好。我已經把你的代碼放在一個循環中來重複這個過程1000次。 – giderk

確保你刪除了'set.seed'，以便每次都可以得到不同的隨機數。 – AntoniosK

用重複測量對數據進行應急測試

回答

相關問題