Coping with varying `gcc` versions and capabilities in R packages

The problem

I have a package called strex which is for string manipulation. In this package, I want to take advantage of the regex capabilities of C++11. The reason for this is that in strex, I find myself needing to do a calculation like

x <- list(c("1,000", "2,000,000"),
          c("1", "50", "3,455"))
lapply(x, function(x) as.numeric(stringr::str_replace_all(x, ",", "")))
#> [[1]]
#> [1] 1e+03 2e+06
#> 
#> [[2]]
#> [1]    1   50 3455

A lapply like this can be done faster in C++11, so I’d like to have that speedup in my package. The problem is, this requires the regex capabilities of C++11, which are only supported in gcc >= 4.9. Many people are using an older gcc, e.g. the still popular Ubuntu 14.04 is on gcc 4.8. If these people tried to install the strex which relied on C++11 regex, they’d get a compile error.

The hope

I wanted to provide the faster option to those with a capable gcc and the slower lapply option (which isn’t painfully slow, just a bit slower) to those with an old gcc. This should all happen inside a seamless install.packages() call; the user needn’t be bored by all of this.

The solution

Figuring out which gcc version the user has

The configure step of package installation needed to do different things depending on the gcc version. Kevin Ushey’s configure package (https://github.com/kevinushey/configure) allows you to use R to configure R packages (normally you have to use shell commands). This was a saviour. To get the gcc version, I used the processx package (so I had to add it to Imports in DESCRIPTION) to execute the shell command gcc -v.

gcc_version <- function() {
  out <- tryCatch(processx::run("gcc", "-v", stderr_to_stdout = TRUE),
                  error = function(cnd) list(stdout = ""))
  out <- stringr::str_match(out$stdout, "gcc version (\\d+(?:\\.\\d+)*)")[1, 2]
  if (!is.na(out)) out <- numeric_version(out)
  out
}

This returns the gcc version if gcc is installed and NA otherwise. Then, the statement !is.na(gcc_version()) && gcc_version() < "4.9" returns TRUE if the user’s gcc does not support C++11 regex and FALSE otherwise.

Dealing with the gcc version

I decided that the default code in the package would be for people with an up to date gcc and that the configure step would make alterations to the code for people with an old gcc. Hence, for people with an old gcc, configure needed to remove all of the C++ code that required C++ regex and then replace the body of the R function which .Call()ed that (now removed) C++ code with R code that performed the same function. It took a long time (many days) and a lot of testing on Travis but this was the right strategy and now strex is installing beautifully with new and old gccs.

The code

There’s a little too much code to walk through the steps in this blog post (and the steps are specific to this package), but if you’re curious as to how this was done, first familiarize yourself with Kevin Ushey’s amazing configure package and then read the configuration steps in strex at https://github.com/rorynolan/strex. This includes useful functions like file_replace_R_fun() to change the body of an R function in a file, file_remove_C_fun() to remove a C/C++ function from a file and file_remove_matching_lines() to remove certain lines from a file.

Conclusion

This post is intended to give people an idea of how to deal with this type of problem. If you are struggling with this problem, feel free to contact me; I’m happy to share the limited knowledge that I have.

Related

comments powered by Disqus