“Productising” your Arduino IoT project — Reliability Checklist

Published in

Quickbird

15 min readAug 27, 2017

Every year more people get their hands on Arduino, ESP’s, etc. and learn how to make Twitter controlled lights or hydroponic monitoring systems. It’s an awesome feeling when your contraption comes alive, and hackers are a practical bunch and occasionally do something useful. You will find that the more serious and useful the project, the more important it is that the contraption works reliably.

I’ve been working with IoT contraptions for a couple of years and I want to show how they aren’t just toys, but can be used as solutions to one-off problems or made into functional prototypes for products you might later make as a business.

Variety of different prototyping kits has gone off the scale in the past few years. This is my collection

Handling Failure

There are going to be failure modes special to your project, like connection to your server, and I couldn’t possibly tell you how to handle all of them, but I did my best to list the most common ones.
On the other hand, don’t tie yourself in knots trying to handle everything that could possibly go wrong. Think about the ‘cost of failure’ and cost of just dealing with the occasional failure. In some cases, saying ‘we’ll just replace it if it beaks’ is a perfectly valid answer. After all, there are some truly unique ways to fuck up.

There are trully unique problems out there!

Internet of Shit (@internetofshit) | Twitter

The latest Tweets from Internet of Shit (@internetofshit). whatever, put a chip in it. say hello: internetofshit@gmail…

twitter.com

How reliable are Arduinos and ‘educational’ kits?

There is nothing inherently wrong with Arduinos — they have warts such as the undocumented I2C hang, but you just need to know about them to work around them.

The bigger point is that an MCU board is just one part of whatever system you are building, and there are other decision you are going to make that will determine it’s reliability as a whole. You have to account for everything that can go wrong and test it. It’s not trivial — in-fact people have written research papers on the subject.
This is my ‘Checklist of Doom’ for microcontrollers with an emphasis on Arduinos — ‘proper’ computers like Raspberry PI have even more things that can go wrong. It will cover most common problems and give you a good starting point.

Don’t build/buy just one of Anything

This is a cheeky one, but I cannot emphasise it enough. Many times I was facing a problem and I had no way to know if it was caused by systematic problem in my design, a damaged board, shorting contacts, bad soldering, or evil spirits. Now I always have spare parts and build at least two copies of any contraption.

If you have just one prototype, you aren’t testing reliability, you are testing your luck.

Power Circuit

This is the most important part of your system. Usually you start off with the Arduino or another development board being the source of power for your project, powered directly through the USB connector. Most USB power supplies, be that a phone charger or a port on a computer — do not produce clean 5 volts — they are usually higher at 5.X volts and often noisy. In this case reading analog signals you will give incorrect results.

When you plug in a 12 volts power supply into the barrel jack on an Arduino, you will get clean and accurate 5 volts, but for every 5 milliwatts of power provided to the Arduino, 7 milliwatts get dumped as heat on the regulator — that’s how linear regulators work. As you consume more power the regulator gets hotter and it is easy to over stress it without realising —they could die 2 months into operation (happened to me). Same goes for the 3 volt power rail, whether you are powered through USB or the barrel jack.

There are many ways to deal with the problem — but for starters there are development boards with good power supplies — among them, in no particular order, Dfrobot, Ruggeduino, and Olimex.

Arduino-compatible development boards with decent power supplies

Alternatively, you can supply a lower voltage into the barrel jack to reduce the amount of heat dumped on the regulator, for instance 9 volt power supplies are common and easy to find.

If you are dealing with motors, robots, relays and the like, they should be supplied with power separately from your microcontroller. For bigger projects, the design of power supply is a whole subject in itself, and you should invest time in learning about it. Make sure you cover the following:

Does the system over-stress the power supply in ‘normal’ operation?
Does the power supply overheat, as is typical for Arduinos?
Do you have bursts of power demand — typical for relays, 3G modems, etc.
Do you have the voltage you need — i.e. 5 volts. not 5.4.
Do you have noise is the system, i.e. the voltage jitters up and down.
Do power-hungry items like relays and motors make sure your board is protected.

I2C hang

I2C bus is a two-wire communication bus that allows a microcontroller to talk to a wide variety of chips (typically sensors), and all of them can share the same wire, which is great.

This is where Arduino has created a pitfall. Every reasonable MCU programming framework, such as MBED, will issue a command to the sensor, and if there is no response withing a defined period of time, it will assume that the sensor is disconnected or damaged. The function will return an error and your code will continue working.
However, when Arduino is given a command to talk to an I2C sensor, it will wait for a response forever. Disconnecting or damaging I2C sensors will cause your program to stop dead.

This can be fine if all the I2C devices you are using are integral to your system. For example if you have a timed irrigation system, and have an I2C clock, without knowing the time your system can’t work anyway. In other cases it is a major problem — we had I2C soil moisture probes that are connected and disconnected quite often, and having our system get stuck every time owuld be unacceptable.

There are some re-implementations of the I2C library where this problem does not occur,. This is the one I have used in my projects and can recommend.

rambo/I2C

Arduino I2C Master library (originally by Wayne Truchsess)

github.com

Watchdog Timer

Sometimes despite our best efforts our code gets stuck or malfunctions. This could be cause by I2C hang, bugs in our code, or cosmic radiation. Whatever the cause, if we deployed an Arduino on a remote island or in space, we can’t afford to keep sending people to click the reset button. This is where the watchdog timer comes in.

Watchdog timer is a countdown mechanism withing the microcontroller that will reset the MCU when it reaches 0. Let’s say you activated the Watchdog, and set it to a 4-second timer. Somewhere in your code you need to ‘Pet the Dog’ every 4 seconds. If your code ever gets stuck, the Watchdog will no longer gets ‘petted’ and will byte, causing the whole microcontroller to reset.
It is up to you to decide when and why you want to pet or not pet the dog, for instance you could reset the Micro-controller if you are experiencing networking problems. Watchdog time is one of the most effective ways to improve reliability of your project.

The Reset Line

Regardless of the reason why your microcontroller got reset, if you have other chips attached to your main board, you can’t assume that they got reset as well. This is especially true if you are using a Watchdog — it will only affect the MCU and not any other chips.
You have to write your firmware so that as soon as the microcontroller starts, it sets all the shields, sensors, motors, whatever — into a known state. Usually that just means resetting everything. Normally chips that need resetting have a special pin for it, but it is often left unconnected in different development kits and Arduino shields for ‘ease of use’.

The Arduino Ethernet Shield is a prime example — it has a network chip that manages communication, keep connections to the server open, etc. It is vital to know what state it is in, and the official Ethernet shield has the reset pin connected to GND, so you can’t reset the chip from software.

Official Arduino shield does not bother to give you control over the network chip, but guys from DFRobot did their homework.

Many hobbyist shields suffer from similar “improvements”, but the one by DFRobot doesn’t, and you can reset it using digital pin 4. If you are building an Ethernet-connected Arduino device, that shield is a good choice.

Board schematics are really best documentation for any shield you are buying, so you should get used to reading them. They are usually provided in PDF format.

RAM Fragmentation

Microcontrollers have a small amount of memory and your need to be careful with it. Not only can they run out of memory, but allocating and freeing blocks of memory that all have different size can lead to memory fragmentation.

The issue is similar to file fragmentation — when you write data to memory, system starts at the beginning and gradually fills it up. If you now free up some space, you might end up with fragments of empty space.

As illustrated here, you could have two fragments of free memory that add up to 10 Kb. You would not be bale to use them to create one item, like an array, of 7 Kb, because it needs a contiguous piece of RAM. ‘Proper’ operating systems have memory paging mechanisms to deal with the issue, but when it comes to MCUs, no-one is holding your hand.

To avoid these problems, allocate all the memory you need statically, or as soon as the system starts, and do not re-allocate it. You can certainly use dynamic memory allocation, but only if you have very good understanding of your memory layout and allocation mechanisms, and it’s not always worth it.

Permanent Memory

You will often use memory with your Arduinos to save settings, data readings or whatever. It’s important to consider what happens to your memory in case there is a power loss or error while you are in the middle of a write operation. Assuming you aren’t writing just one byte, but instead have a struct or some other larger piece of information, you will be left with strange/corrupt data from unfinished write operation.

Whatever memory type you use, you must deal with this problem in software. One of the simplest ways of doing this is to have two storage locations, A and B, and an indicator to tell you which data in newer. The first write would be to A, if you need to update it, you write to B, then again to A, etc..

However, there are also different types of memory, and the differences between them are significant.

EEPROM

This is the few kilobytes of memory internal to the AVR chip, and some other MCUs, that’s separate form the memory that stores your code. You can also buy it as a separate chip, and they usually have I2C interface.

EEPROM is written byte by byte, so you’ll read memory exactly in the state in which you left it. *
If you were writing a float, which consists of 4 bytes, and stopped half-way, you will have a strange value which is neither the old value of a float nor the new one.
Each cell in EEPROM has a limited number of writes, which for Arduino/AVR chips is at 100,000 cycles. After that you won’t be able to write any new data to the same location.

Flash Memory

There are different types of flash memory, but there are a few important points they all share:

Flash memory is typically much larger than EEPROM, often measured in megabytes when bough on separate chips.
Inside the microcontroller flash stores your program code.
It’s erased in blocks, typically of 64kbytes
It’s write endurance is typically lower than that of EEPROM.

SD card

SD cards are great in amount of data you can get per dollar, and are easy to connect to an Arduino or any other MCU. They contain a memory controller themselves, and it is typically faster than an Arduino or whatever microcontroller you are using. They can even be hacked to run your code, if you are so inclined.

However they are not as reliable as a flash chip when it comes to memory storage — SD cards are notorious for becoming corrupt due to sudden power cuts. That’s because they are designed for battery-operated devices and their controllers incorporate wear-levelling and other algorithms, they move about data and you have no control over it. So while on the plus side you don’t typically have to worry about their limited write cycles, you should only use them for non-critical data and when your device is battery-powered.

Solder Joints

For your device to work reliably, soldering work involved needs to be at least half-decent and reliable. This allies both to the kit you buy and your own work. I found the Soldering Tutorials from EEVBLOG to be helpful.

Protection from Motors

If you use motors or relays watch out — if they are allowed to dump back-emf current into the system they can affect your analog readings or even damage the microcontroller. You need to isolate them form the ‘brains’. For trivial cases, resistors and capacitors might do.

For serious work you should look to opto-isolation — that’s when you have two separate electrical circuits, and communication between them happens only through light. Opto-isolators are chips that package together LEDs and receivers. It is easy to find Arduino-friendly parts that already have isolation built in. Here are some decent motor controllers. Same goes for relays.

Left — relay board with opto-isolation chip (white) and separate power source for the relays. Right — opto-isolated motor driver

Internal Connectors

If you are prototyping with a hobbyist kit, no board will have everything you need, and you will typically end up with several circuit-boards connected by jumper wires.

Loose connectors, shorts and stuff plugged in the wrong place are notorious failure points for hardware projects. I’ve leaned my lesson when just 3 hours before a presentation a jumper wire came loose, shorted and burnt two circuit-boards in the system. It worked fine for several months prior to that, but as it just sat in one place undisturbed that doesn’t prove anything.

Left — Typical Jumper wire connection, Right — connections of the grove system.

To make sure Murphy does not bite you in the ass, you want to move from jumper wires to proper connectors as soon as possible — Sparkfun has a great introduction to connectors.

Left — micro-crimping tool, Center — Dupon connectors for multiple wires, right — JST-SH

SeeedStudio makes all kinds of stuff with grove connectors and is worth using when it suits your project. For other cases, get a crimping tool and learn to how to make connecting wires for your projects — it’s an essential skill just like soldering. Connectors come with different number of pins, from 2 to 20 and beyond. There are hundreds of different types of connectors , but the following are the most common ones and will get you started:

Use Dupon connectors to group jumper wires— a connection with 4+ pins is much more reliable, and works on any Arduino / dev board.
Use JST-SM for when you need a wire-to-wire connection
JST XHP (2.54 mm / 0.1') is excellent for use with with break-out boards instead of those 0.1 rows of pins they typically come with.
JST PHR (2.0 mm) have the same pitch as Grove connectors are are great for replacing them, or putting on your own breakout broads.

Left — water flow sensor with JST-SM, Center — breakout with 0.1' spaced connection suitable for JST XHP, Right — a sensor with JST PHR connector

Bolt and Box it

Trivial, but important none the less — once you figured out the parts you need, mount all your electronics on a panel — it can be a piece of plywood you’ve cut with a hacksaw or you can design a mounting panel for your project in Inkscape and order it made with a laser cutter for pennies. It can be a piece of acrylic and a laser cutter — about 3 mm thick works well. That’s why every circuitboard will have mounting holes, and typically they are M3, and so you will need some M3 standoffs or spacers and screws.

Left — M3 stand-offs. Right — everything bolted down.

Once you’ve mounted everything on a panel, you can place it inside a box — and there is an entire industry that builds all kinds of enclosures for electronics. You can find all kind of waterproof boxes made out of different materials. Most of my enclosures are form Hammond and Fibox.

Mounting circuit-boards on a panel will let you enclose your project and protect it from the elements.

External Connectors

Of course you need to pass signals and power in and out of the enclosure,
For that you can use cable glands (which I detest) or connectors. They come in cable mount, PCB-mount, and panel-mount varieties. You will want the latter, so that you can install them on the wall of the enclosure.

Left — ‘normal’ barrel jack connector, left-Center- Audio plug, Right — SP13 and Aviation plug connectors.

As we say in Russian, there are more different varieties of them than there are stray dogs. You’ll want to note number of contacts, rated voltage / current and if they are water-proof. Unlike internal connectors, these guys generally use solder, not crimps, and you will want heat-shrink tubing to protect the contacts. To get started, get yourself:

2.1 x 5.5 mm Barrel Jack and Sockets to power your project from normal wall-wart power supplies.
‘Aviation Plug’ GX-12 / GX 16 comes in 2–8 pin varieties, are very common and cheap
Weipu SP1310 are similar to aviation plugs but are waterproof.
3.5 mm Audio Jacks are cheap, familiar, common and compact, and are often used for sensor probes. The connectors have a peculiar quality — the contacts will all touch each-other while jack in being inserted or taken out. Make sure that whatever you are connecting is happy with such treatment. Also their 4-pole varieties often have awkward flat contacts that are difficult to solder by hand, so you’ll need to shop around or have serious sodering-fu.

Isolation

Once you have a box of electronics, typically you’ll want some sensor probes that come out of it. Let’s say you are measuring temperature of water with a probe. What happens if insulation on that water probe is damaged? Unless you take precautions, ground and power will end up shorted, and likely cause your system to fail.

Left — typical smart-home IoT ‘box of crap’. Right — MCU with opto-isolated inputs from Olimex. Right — Galvanic isolation board from DFrobot.

To address the problem limit the amount of current any particular probe can use by powering them ‘through’ resistors or poly-fuses. That would usually prevent damage / shutdown of the rest of the system. You could also go all the way and power all the probes separately and use opto-isolators for the signals.

Some sensors have to be isolated — PH and an EC sensor with interfere with each-other and require galvanic isolation to work properly.

Documentation and ERRATA

Your project could fail due to no fault of your own. It is typical to encounter terrible documentation that forgets to mention some crucial details, and discover it half-way through your project or never at all

The [Arduino] delay() function does weird random stuff when combined with PWM outputs in version 12, but not 11. It will also fail if the counter rolls over 0.

The other half of the problem is that people who make the possessors make mistakes too — and those will theoretically be found in ERRATA section of the datasheet, but sometimes aren’t. And you have to deal with that too.

Electromagnetic Interference

I have no experience with EMI, but if you work in an environment where that is a problem, for example near large electric motors, you should keep this in mind. You might consider having an EMI-protected or aluminium enclosure.