Using R's set.seed() to set seeds for use in C/C++ (including Rcpp)

In native R, the user sets the seed for random number generation (RNG) with set.seed(). Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by set.seed(). Good news! It can be done.

We’ll need the inline, purrr and microbenchmark packages.

pkgs <- c("inline", "purrr", "microbenchmark")
for (p in pkgs) if (!require(p, character.only = TRUE)) {
  install.packages(p)
  library(p, character.only = TRUE)
}
#> Loading required package: inline
#> Loading required package: purrr
#> Loading required package: microbenchmark

rbernoulli

Base R (or technically the stats package) provides no rbernoulli(). It’s a pretty gaping hole in the pantheon of rbeta(), rbinom(), rcauchy, rchisq(), rexp(), rf(), rgamma(), etc. Thankfully, Hadley Wickham noticed this and gave us purrr::rbernoulli().

set.seed(1)
rbernoulli(5, 0.7)
#> [1] FALSE  TRUE  TRUE  TRUE FALSE
set.seed(1)
rbernoulli(5, 0.7)
#> [1] FALSE  TRUE  TRUE  TRUE FALSE

So it seems like Hadley managed to get set.seed() to work with rbernoulli(). How did he do this? Let’s take a closer look at purrr::rbernoulli().

purrr::rbernoulli
#> function (n, p = 0.5) 
#> {
#>     stats::runif(n) > (1 - p)
#> }
#> <bytecode: 0x7f9b1c716778>
#> <environment: namespace:purrr>

Ah, it seems Hadley just wrapped runif(); hence, because set.seed() works with runif(), it works with his implementation of purrr::rbernoulli().

C++ RNG

The C++ standard library provides the <random> header file, which includes Bernoulli RNG. Let’s give that a whirl.

cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), '
                       int n_ = as<int>(n), seed_ = as<int>(seed);
                       double p_ = as<double>(p);
                       std::default_random_engine generator(seed_);
                       std::bernoulli_distribution distribution(p_);
                       IntegerVector out(n_);
                       for (std::size_t i = 0; i != n_; ++i) {
                         out[i] = distribution(generator);
                       }
                       return out;
                       ', includes = "#include <random>")
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 1)
#> [1] 1 0 1 1 0 1
cpp_rbernoulli(6, 0.7, seed = 2)
#> [1] 1 0 1 0 1 1

OK, so now we have cpp_rbernoulli() which is working, but the user has to pass the seed as an argument of the function, there’s no option to use set.seed().

get_seed()

If only there was a get_seed() function that we could use. Well, here it is!

get_seed <- function() {
  sample.int(.Machine$integer.max, 1)
}

This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by set.seed(). Therefore, it’s fine to use as a seed itself. Let’s take a look.

set.seed(1)
replicate(6, get_seed())
#> [1]  570175513  799129990 1230193230 1950361378  433108649 1929277158
set.seed(1)
replicate(6, get_seed())
#> [1]  570175513  799129990 1230193230 1950361378  433108649 1929277158
set.seed(2)
replicate(6, get_seed())
#> [1]  397031630 1508336757 1231208929  360888751 2026879546 2026097046
set.seed(2)
replicate(6, get_seed())
#> [1]  397031630 1508336757 1231208929  360888751 2026879546 2026097046

So as we can see, setting a seed via set.seed() determines the seeds that subsequently come out of get_seed(), so all is well with the world. get_seed() can now be used to create a version of cpp_rbernoulli() which uses set.seed(). For the sake of inflating my own ego, I’ll name this version after myself.

rorybernoulli <- function(n, p) {
  cpp_rbernoulli(n, p, get_seed())
}

Let’s check that it’s in working order.

set.seed(1)
rorybernoulli(6, 0.7)
#> [1] 1 1 0 0 1 0
set.seed(1)
rorybernoulli(6, 0.7)
#> [1] 1 1 0 0 1 0
set.seed(2)
rorybernoulli(6, 0.7)
#> [1] 0 1 1 1 1 1
set.seed(2)
rorybernoulli(6, 0.7)
#> [1] 0 1 1 1 1 1

Everything is awesome.

Benchmarking

Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with p = 0.5.

microbenchmark(purrr::rbernoulli(1e6, p = 0.5),
               rorybernoulli(1e6, p = 0.5))
#> Unit: milliseconds
#>                               expr       min        lq     mean   median
#>  purrr::rbernoulli(1e+06, p = 0.5) 27.238491 29.036677 30.83071 29.35566
#>      rorybernoulli(1e+06, p = 0.5)  8.660844  9.338893 10.06795  9.46709
#>         uq      max neval cld
#>  32.957843 39.44020   100   b
#>   9.803781 15.28896   100  a

Wow, rorybernoulli() is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as purrr::rbernoulli() does). The point of me writing this post was to share this get_seed() thing with people so that the can use set.seed() with Rcpp and the like; purrr::rbernoulli() was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to purrr!

comments powered by Disqus