Conversion to and from R data

One of the key goals with extendr, is to provide a framework that allows you to write Rust functions, that interact with R, without having to know the intricacies within R internals, or even R’s C-facilities. However, this is unavoidable if one wishes to understand why the extendr-api is the way it is.

Thus, for introducing extendr, we shall mention facts about R internals, but these are not necessary to keep in mind going forward.

A fundamental data-type in R is the 32-bit integer, int in C, and i32 in Rust. Passing that type around is essential, and straight forward:

#[extendr(use_try_from = true)]
fn ultimate_answer() -> i32 {
    return 42_i32;
}

And now this function is available within your R-session, as the output is 42.

Also, another fundamental data-type in R is numeric / f64, which we can also pass back and forth uninhibitated, e.g.

#[extendr(use_try_from = true)]
fn return_tau() -> f64 {
    std::f64::consts::TAU
} 

where \(\tau := 2\pi =\) \(6.2831853\).

However, passing data from R to Rust must be done with a bit of care: In R, representing a true integer in literal form requires using L after the literal.

#[extendr(use_try_from = true)]
fn bit_left_shift_once(number: i32) -> i32 {
    number << 1
}

This function supposedly is a clever way to multiply by two, however passing bit_left_shift_once(21.1) results in

Error in bit_left_shift_once(21.1): Expected an integer or a float representing a whole number, got 21.1

where bit_left_shift_once(21) is 42, as expected.

R also has the concept of missing numbers, NA encoded within its data-model. However i32/f64 do not natively have a representation for NA e.g.

bit_left_shift_once(NA_integer_)
Error in bit_left_shift_once(NA_integer_): Must not be NA.
bit_left_shift_once(NA_real_)
Error in bit_left_shift_once(NA_real_): Must not be NA.
bit_left_shift_once(NA)
Error in bit_left_shift_once(NA): Must not be NA.

Instead, we have to rely on extendr’s scalar variants of R types, Rint / Rfloat to encompass the notion of NA in our functions:

#[extendr(use_try_from = true)]
fn double_me(value: Rint) -> Rint {
    if value.is_na() {
        Rint::na()
    } else {
        (value.inner() << 1).into()
    }
}

which means, we can now handle missing values in the arguments

double_me(NA_integer_)
[1] NA
double_me(NA_real_)
[1] NA
double_me(NA)
[1] NA

One may notice here that NA_real_ was accepted even for an Rint. The reason for this, is when you specify a type without &/&mut, the value is coerced in a similar way, as R coerces values. In order to have strict type-checking during run-time, use & / &mut, as

#[extendr(use_try_from = true)]
fn wrong_input(value: &Rint) -> Rint {
    value.clone()
}
wrong_input(NA_integer_)
Error in wrong_input(NA_integer_): Must not be NA.
wrong_input(NA_real_)
Error in wrong_input(NA_real_): expected 13, got 14
wrong_input(21.0)
Error in wrong_input(21): expected 13, got 14
wrong_input(21L)
[1] 21

Here, only the last literal is a true Rint.

Vectors

Most data in R are vectors. Scalar values are in fact 1-sized vectors, and even lists are defined by a vector-type. A vector type in Rust is Vec. A Vec has a type-information, length, and capacity. This means, that if necessary, we may expand any given Vec-data to contain more values, and only when capacity is exceeded, will there be a reallocation.

Naively, we may define a function like so

#[extendr(use_try_from = true)]
fn repeat_us(mut values: Vec<i32>) -> Vec<i32> {
    assert_eq!(values.capacity(), values.len(), "must have zero capacity left");
    values[0] = 100;
    values.push(55);
    values
}
x <- c(1L, 2L, 33L)
repeat_us(x)
[1] 100   2  33  55

Even if the argument is mut Vec<_>, what happens is that the R vector gets converted to a Rust owned type, and it is that type that we can modify, and augment, with syncing to the original data.

Of course, a slice e.g. &[i32] / &mut [i32] could be used instead, and this allows us to modify the original data, i.e.

#[extendr(use_try_from = true)]
fn zero_middle_element(values: &mut [i32]) {
    let len = values.len();
    let middle = len / 2;
    values[middle] = 0;
}
x <- c(100L, 200L, 300L)
zero_middle_element(x)
x
[1] 100   0 300

This is great! If we wanted to insert an NA in the middle, we would have had to operate on &mut [Rint] instead.

A slice is a representation of a sequence of elements that are part of a larger collection. Since they represent only part of a collection (vector, in this case), we cannot add new elements to this. To do so, we have to rely on extendr provided types, that provide a Vec-like API to R’s vector-types. These are the Integers, Logicals, Doubles, and Strings types.

Strings are special