Peer-to-peer updates for edge compute nodes
23 May 2018
/ fruit
Low-cost, low-power edge compute devices and nodes are key components
of Internet of Thing (IoT) systems that are embedded in smart homes and
smart cities.
They generally start small but can rapidly scale to many thousands of
nodes. Devices can be inaccessible, mobile, or in private residential
locations, so remote administration is essential to deploy updates and
install new applications. This begs the question — how can we
effectively manage and update such devices?
There are a number of DevOps tools available, all of which tend to
follow one of two patterns:
-
Either devices connect to a central server and pull updates, or
-
The server contacts each device in turn to push updates.
The push model is effective in data centres, with devices
connected via a high-performance, professionally-administered network
to ensure they’re reachable from the management server. However, it’s
much less effective when devices have intermittent connectivity and are
located behind Network Address Translations (NATs), firewalls, or other
middleboxes that limit reachability — especially where there’s no
system administrator to correct connectivity problems.
The pull model addresses some of these issues but, in turn,
introduces scalability problems and a single point of failure — unless
a Content Distribution Network (CDN) or other caching solution is used
— introducing additional cost and complexity.
If we are to update devices for a reasonable cost, irrespective of the
scale or heterogeneity of the system, we must develop more robust,
scalable, and decentralized tools for cluster management.
Tools for cluster management: what’s required
In our paper,
Peer to Peer Secure Update for Heterogeneous Edge Devices
(presented at the proceedings of the IEEE/IFIP International Workshop
on Decentralized
Orchestration and Management of Distributed Heterogeneous Things),
we assert that the components of such tools need to include connectivity
discovery and NAT traversal, overlays, and peer-to-peer updates.
Systems deployed in arbitrary edge networks must include robust
connectivity discovery and NAT traversal, to ensure they can
communicate with the outside world. Edge networks generally don’t
accept arbitrary incoming connections, and often extensively filter
outbound traffic. Management systems running in such environments must
systematically probe for external connectivity and discover NAT
bindings using multiple techniques, including the ICE algorithm for
systematic STUN-based UDP hole punching, TURN relays or other indirect
paths, or tunnelling over UDP, TCP, HTTPS, and Web Sockets.
These systems need to also probe local connectivity, using techniques
such as multicast DNS service discovery, since there could be devices
behind the same edge firewall that are only indirectly reachable from
the wider network. Protocols such as Universal Plug and Play (UPnP) can
also help discover topology and connectivity.
Discovering devices and pathways to connectivity is key — many tools
have been developed in this space but have not been systematically used
for DevOps. Once devices have been discovered and paths to connectivity
found, an overlay can be built.
Building an overlay
The primary goal of building an overlay is connectivity, not
performance; with the intent of reaching all devices irrespective of
how they’re connected, directly reachable or not, and regardless of the
presence of NATs or other middleboxes.
The service should be similar to that of HashiCorp’s Serf: an open
source gossip protocol pushing update notifications and simple
configuration changes, without tracking membership or failures. Scaling
such a service requires managing devices when they’re available, rather
than tracking and updating them synchronously.
Finally, an existing peer-to-peer swarming protocol, for example,
BitTorrent, is needed to download and install larger updates. In our
study, we augmented the torrent files with public key signatures, UUID,
and versioning information to ensure download authenticity. BitTorrent
hashes the content to ensure integrity.
Management of edge compute nodes has significantly different challenges
than managing data centre nodes
The peer-to-peer management system that we developed according to the
above principles is currently being integrated into the
FRμIT testbed —
an experimental federated edge compute testbed built using Raspberry Pi
nodes &dmash; where an early prototype is available for download.
The key lesson we learned from this testbed is that management of edge
compute nodes has significantly different challenges than managing data
centre nodes, and that existing tools are insufficiently robust or
scalable.
DevOps tools need to be able to manage devices that are not directly
reachable, at the time they become available. And if management is to
scale in a cost-effective manner, we must give up on precise control
and tracking of devices.
Many challenges remain, but we believe that peer-to-peer updates are an
essential part of future management tools, as the only way to get
effective and scalable connectivity.
For further details, see our
paper.
This is a reprint of an original post to the
APNIC blog.