Using NumPy Arrays on the Frontend
Introduction
It’s true: it’s possible to use NumPy arrays directly on the frontend by using a combination of two technologies: ArrayBuffer and Typed Arrays.
This comes with a lot of caveats: for instance, the arrays are of a fixed size, and (with this approach) are limited to being one-dimensional. You also need to be aware of the endianness of both the backend and the browser’s JS VM, and you may need to do slightly more legwork surrounding data types.
The benefits are compelling, however:
- Typed arrays are fast. The VM does not need to perform any type checking in order to make accesses.
- No overhead with converting in and out of JSON—use the underlying data directly.
- ArrayBuffer-backed arrays implement the [Transferable] attribute, meaning they can be moved in and out of web workers with very low overhead.
- Typed arrays implement the array interface almost completely, making them a drop-in replacement for normal arrays.
- Shipping arrays from the client can be done with fairly little effort.
This is something I’ve been playing with at work, where I’ve been working on a tool that can analyze big data interactively (such as time-series data from satellite downlinks).
An Extremely Brief Tour of Typed Arrays
In essence, typed arrays implement a sort of “view” for an underlying ArrayBuffer. ArrayBuffers themselves do not provide very much functionality at all, they are essentially just byte vectors. With a typed array, you can use the contents of an ArrayBuffer as an array.
// Create a buffer holding the underlying data (all zeroes)
const buf = new ArrayBuffer(1024) // bits
// Create a "view" of that data as 1024 8-bit unsigned ints
const u8_view = Uint8Array(buf);
// Create another view, but this time as 256 32-bit unsigned ints
const u32_view = Uint32Array(buf);
// because these two point at the same array, we can do some weird stuff
u8_view[1] = 1 // set the second byte of the buffer to 1 (0b00000001)
console.log(u32_view[0]) // this is now 256
An Even Briefer Tour of NumPy Internals
Under the hood, a numpy array is just a block of memory + an indexing scheme + a data type:
typedef struct PyArrayObject {
PyObject_HEAD
/* Block of memory */
char *data;
/* Data type descriptor */
PyArray_Descr *descr;
/* Indexing scheme */
int nd;
npy_intp *dimensions;
npy_intp *strides;
/* Other stuff */
PyObject *base;
int flags;
PyObject *weakreflist;
} PyArrayObject;
(from Advanced Numpy)
We’re mainly interested in the data type and the data itself, and since we’re only interested in one-dimensional arrays, we can ignore most of the array object above.
Getting the data back out looks like:
>>> import numpy as np
>>> a = np.array([1,2,3], dtype=np.uint8)
>>> a.data
<memory at 0x7f946950fd08>
>>> bytes(a.data)
b'\x01\x02\x03'
This is the underlying memory region associated with this array. Note that regions can overlap, but this shouldn’t be an issue for our purposes.
An Incredibly Short Note about Endianness
With this approach, you need to keep track of the endianness of your data. You may find that you’ll need to swap the endianness of your dataset once it arrives at the browser.
You can determine the byte order of your server directly with the sys
module:
>>> import sys
>>> sys.byteorder
'little'
On the frontend, determining endianness requires checking directly:
const endianness = () => {
const buf = new ArrayBuffer(2)
const u8 = new Uint8Array(buf)
const u16 = new Uint16Array(buf)
u8[0] = 1
if (u16[1] === 1) {
return 'little'
} else {
return 'big'
}
}
Swapping the byte order manually is kind of a pain, you can maybe use a DataView to swap bytes around or implement something inside of a web worker (because ArrayBuffers are transferable, this should be fast). Or you could even handle it on the backend.
Putting it all together
In this example, we’ll assume that the array is of 32-bit floats, and we’ll just assume that the endianness on the backend matches the frontend (not guaranteed!).
With the aiohttp web server:
from aiohttp import web
from my_analysis import get_dataset
async def serve_array(request):
# load numpy array
arr = get_dataset()
# send off the bytes
return web.Response(bytes=bytes(arr.data))
# start the webserver
app = web.Application()
app.add_routes([web.get('/get/array', handle)])
if __name__ == '__main__':
web.run_app(app)
And then on the frontend:
fetch('https://api.example.com/get/array')
.then(r => r.arrayBuffer())
.then(buf => {
const arr = new Float32Array(buf)
console.log('got array from numpy!', arr)
})