Modular Javascript bindings from Rust
I've been working on a Rust library for time series data analysis which comes with both Python and Javascript bindings. The Javascript bindings are generated from a Rust crate which belongs to the Cargo workspace, which has been OK so far, but as the scope of the project has grown (from just forecasting originally, to outlier detection, clustering, changepoint detection, and more), the size of the WASM bundle has grown to about 1MB, which is... not enormous, but definitely not ideal.
It's particularly annoying when users only want to use a tiny fraction of the library, but must load the entire WASM bundle first. What I'd really like is for my JS library's package.json to look something like this:
{
"name": "@bsull/augurs",
"version": "0.6.0",
"files": [
"*.wasm",
"*.js",
"*.d.ts",
"snippets/"
],
"main": "core.js",
"exports": {
".": "./core.js",
"./clustering": "./clustering.js",
"./dtw": "./dtw.js",
"./prophet": "./prophet.js",
"./outlier": "./outlier.js",
},
"types": "augurs.d.ts"
}
Users could then import the parts of the library they need like this:
import { Prophet } from '@bsull/augurs/prophet';
I can think of a few ways to do this:
1. Manually split the JS crate into multiple crates
This approach appears to be the most straightforward. Rather than the
JS crate being a single crate with each set of bindings in a module,
we split it into multiple crates, each with a single module.
We can then use wasm-pack
with each crate to generate
the JS bindings for that module, shove them all into a single directory,
manually generate the package.json
file, and we're done.
This is fine, but it's a bit of a pain to maintain (each original Rust
crate has to have a corresponding JS crate). Not only that but each
WASM module is self-contained, so if a user wants to use more than one
module, there's a bunch of duplication in amongst them (e.g. all the WASM
machinery, serde
stuff, tracing
, etc is duplicated). So the overall bundle
size is probably larger than we started with, but if someone only wants to use
outlier detection, they save bandwidth. Great.
But. Not all of the modules are self contained. For example, both the
clustering
and dtw
modules use a shared DistanceMatrix
type, which is
intended to be opaque to users, returned from distance matrix calculation functions
in dtw
and consumed by clustering functions in clustering
. This will only
work if the WASM modules know how to talk to each other, which they don't. Passing
an object returned from one module to another isn't possible - they each have their
own memory space. We'd need to deserialize and reserialize the data in order to
pass it between modules, which can be quite slow if the data is large.
It's certainly an option, but it's not perfect.
2. Create a WASM component for each piece of functionality
This feels exactly like the WASM component model's raison d'ĂȘtre: allowing multiple
core WASM modules to talk to each other. The idea would be to create
a separate WASM component for each module, starting with an interface
defined using WIT, then using cargo-component
to generate the bindings
and implementing them in Rust. These would supersede the existing JS crate;
we could then use jco
to generate the JS bindings for each component.
The feasibility of this approach depends entirely on how jco
generates the
initialization code for the WASM modules. It's not clear to me how it would
know about the dependencies between the modules, and how it would handle
the case where a module depends on another module. Ideally it'd just load exactly
what it needs for any given module, but it's not clear how to do that.
For example, in the situation above, both dtw
and clustering
depend on
DistanceMatrix
, but DistanceMatrix
is defined in the core
module. If
someone imports clustering
, I'd want the bindings to load and instantiate
the core
and clustering
modules. Then if someone imports dtw
, I'd want
the bindings to load and instantiate only the dtw
module, and use the existing
core
module.
I've yet to find out if this is possible, but I'll write more about it if I do.
One way I think it might work is by having a final wrapper WASM component which
imports and re-exports the other components. This way, the dependency tree of the
modules would be known to jco
, so it would hopefully be able to generate
optimal bindings. The last time I tried this, it didn't work for two reasons:
- the initialization code generated by
jco
was very eager to load all of the WASM modules, even if they weren't needed, which is no better than the approach we're currently taking. It is possible to modify the instantiation code somewhat (mentioned in the instantiation docs) but I struggled to do anything meaningful here. - the Rust bindings generated by
cargo-component
produced separate Rust modules with separate types for each interface, so I would have had to write a ton of boilerplate to convert types between those expected by each module. I asked about this on the jco Zulip and it sounded like this might be fixable by first defining the types in some separate shared WIT file. I'll probably try that next.
And... I think that's it? If you know any other ways to do this, please let me know! My contact details are on the about page.