An Eager Avocado

Eager Avocado

I give myself very good advice, but I very seldom follow it.

.Internal(sample)

,

.Internal(sample()) requires explicitly 4 arguments in order: n, size, replacement, probabilities

If probabilities is not NULL, the first argument has to be an integer. To achieve an equivalent output as that of sample, we need to map the sampled integers back to desired values

internal_boolean <- function(size,replace,prob) {
    s = .Internal(sample(2,size,replace, prob))
    return(s<2)
}

internal_boolean_rle <- function(size,replace,prob) {
    s = .Internal(sample(2,size,replace, prob))
    return(rle(s<2))
}
N = 100000
probs = c(0.0001, 1-0.0001)
s = internal_boolean(400,TRUE,probs)
print(paste0("P(TRUE) = ", sum(s)/length(s)))
## [1] "P(TRUE) = 0"
set.seed(123456)
microbenchmark(sample(c(TRUE,FALSE),N,TRUE),
               sample(c(TRUE,FALSE),N,TRUE,probs),
               .Internal(sample(c(TRUE,FALSE),N,TRUE,NULL)),
               internal_boolean(N,TRUE,probs),
               internal_boolean_rle(N,TRUE,probs))
## Unit: microseconds
##                                              expr      min        lq
##                   sample(c(TRUE, FALSE), N, TRUE)  982.331  988.1370
##            sample(c(TRUE, FALSE), N, TRUE, probs) 1041.445 1044.9485
##  .Internal(sample(c(TRUE, FALSE), N, TRUE, NULL))  625.999  626.9910
##                  internal_boolean(N, TRUE, probs)  936.505  951.7755
##              internal_boolean_rle(N, TRUE, probs) 2393.221 2605.2505
##       mean    median       uq       max neval cld
##  1140.8815 1066.1005 1301.035  1924.639   100  a 
##  1824.4864 1122.3140 1362.356 35364.789   100  a 
##   693.6163  638.5555  703.311  1167.958   100  a 
##  1171.2414 1209.5350 1278.501  2133.125   100  a 
##  6403.0794 2764.6940 3080.374 31129.819   100   b