In native R, the user sets the seed for random number generation (RNG) with set.seed()
. Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by set.seed()
. Good news! It can be done.
pacman::p_load(inline, purrr)
rbernoulli
Base R (or technically the stats
package) provides no rbernoulli()
. It’s a pretty gaping hole in the pantheon of rbeta()
, rbinom()
, rcauchy()
, rchisq()
, rexp()
, rf()
, rgamma()
, etc. Thankfully, Hadley Wickham noticed this and gave us purrr::rbernoulli()
.
set.seed(1)
rbernoulli(5, 0.7)
#> [1] FALSE TRUE TRUE TRUE FALSE
set.seed(1)
rbernoulli(5, 0.7)
#> [1] FALSE TRUE TRUE TRUE FALSE
So it seems like Hadley managed to get set.seed()
to work with rbernoulli()
. How did he do this? Let’s take a closer look at purrr::rbernoulli()
.
purrr::rbernoulli
#> function (n, p = 0.5)
#> {
#> stats::runif(n) > (1 - p)
#> }
#> <bytecode: 0x7fd93bf359c0>
#> <environment: namespace:purrr>
Ah, it seems Hadley just wrapped runif()
; hence, because set.seed()
works with runif()
, it works with his implementation of purrr::rbernoulli()
.
C++ RNG
The C++ standard library provides the <random>
header file, which includes Bernoulli RNG. Let’s give that a whirl.
cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), '
int n_ = as<int>(n), seed_ = as<int>(seed);
double p_ = as<double>(p);
std::default_random_engine generator(seed_);
std::bernoulli_distribution distribution(p_);
IntegerVector out(n_);
for (std::size_t i = 0; i != n_; ++i) {
out[i] = distribution(generator);
}
return out;
', includes = "#include <random>")
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 2)
#> [1] 1 0 1 0 1 1
OK, so now we have cpp_rbernoulli()
which is working, but the user has to pass the seed as an argument of the function, there’s no option to use set.seed()
.
get_seed()
If only there was a get_seed()
function that we could use. Well, here it is!
get_seed <- function() {
sample.int(.Machine$integer.max, 1)
}
This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by set.seed()
. Therefore, it’s fine to use as a seed itself. Let’s take a look.
set.seed(1)
replicate(6, get_seed())
#> [1] 1140350788 312928385 866248189 1909893419 554504146 884616499
set.seed(1)
replicate(6, get_seed())
#> [1] 1140350788 312928385 866248189 1909893419 554504146 884616499
set.seed(2)
replicate(6, get_seed())
#> [1] 794080207 314911494 1906307464 554751325 2010156236 226245929
set.seed(2)
replicate(6, get_seed())
#> [1] 794080207 314911494 1906307464 554751325 2010156236 226245929
So as we can see, setting a seed via set.seed()
determines the seeds that subsequently come out of get_seed()
, so all is well with the world. get_seed()
can now be used to create a version of cpp_rbernoulli()
which uses set.seed()
. For the sake of inflating my own ego, I’ll name this version after myself.
rorybernoulli <- function(n, p) {
cpp_rbernoulli(n, p, get_seed())
}
Let’s check that it’s in working order.
set.seed(1)
rorybernoulli(6, 0.7)
#> [1] 1 1 0 1 0 1
set.seed(1)
rorybernoulli(6, 0.7)
#> [1] 1 1 0 1 0 1
set.seed(2)
rorybernoulli(6, 0.7)
#> [1] 0 1 1 1 1 1
set.seed(2)
rorybernoulli(6, 0.7)
#> [1] 0 1 1 1 1 1
Everything is awesome.
Benchmarking
Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with p = 0.5
.
bench::mark(purrr::rbernoulli(1e6, p = 0.5),
rorybernoulli(1e6, p = 0.5),
check = FALSE)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl>
#> 1 purrr::rbernoulli(1e+06, p = 0.5) 28.36ms 29.17ms 33.3 11.45MB 13.9
#> 2 rorybernoulli(1e+06, p = 0.5) 8.77ms 9.13ms 108. 3.82MB 16.8
Wow, rorybernoulli()
is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as purrr::rbernoulli()
does). It’s also three times as efficient with memory; the memory and time improvements probably have the same underlying reason. The point of me writing this post was to share this get_seed()
thing with people so that they can use set.seed()
with Rcpp
and the like; purrr::rbernoulli()
was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to purrr
!