James Brusey

June 19, 2022January 10, 2023

Drawing diagrams and figures for research articles and theses

Devising easy to understand, clear, concise diagrams for your research article can be a daunting task. On the one hand, expressing your ideas graphically may not come naturally. On the other hand, it can be difficult to get graphic editing tools to produce the look that you want.

However, diagrams are also an important part of doing great research. When your readers first approach your article, the diagrams are likely to act as a gateway to the rest of the content. A good diagram doesn’t just take up space—it can bring the subject matter to life. It can also add a lot of character to your work and provide a personal touch that is so often missing otherwise. The care and attention to detail that you put into your diagrams will hugely impact how your paper is received.

Given the importance of this topic, I have put together some recommendations on producing diagrams based on my experiences with reviewing research papers and supervising doctoral students. This text is currently in draft form—I still need to add in some more example images. If you have any comments, I’d really like to hear from you!

Start with a rough sketch

A good diagram conveys an idea or a set of ideas in a concise way. For this reason, the first sketch that you do may not work as a good explanation of the idea. Be prepared to throw away the first version (or so) and for this reason, it is much easier to sketch out by hand before you start. Once you have a workable sketch, you’ll find it easier to lay out the text and graphics on the diagram in a neat way using a graphics package.

Figure 1: A rough sketch involving different sorts of elements (a controller is different from a load forecast, which is different again from a solar panel) and different sorts of interconnections (such as, the connection between the load controller (LC) and the main controller versus the transmission of information from the PV forecast to the energy management system).

Size your canvas appropriately before laying out

If you use a package such as Inkscape, it will, by default, give you an A4 page to draw upon. This leads you to draw a large diagram that covers the whole page, which is then shrunk to, say, 3 or 3.5 cms to scale it to fit into a two column document (suitable for most conferences). However, you will thus end up with tiny text, large amounts of space between boxes, thin and spiderlike lines and arrows, and large amounts of padding within the boxes around any text.

To avoid this problem, start by sizing the diagram to fit to your column width. You should aim to avoid any resizing of the diagram when importing it. If you adjust the canvas size, it is useful to leave a small (say 1mm) space, around the outside of the figure since, even if the graphical elements are positioned only inside the edge of the paper, aliasing effects can cause them to flow slightly outside, and they will look cut off if they are right on the edge.

What will happen if you don’t take this advice? You might still be able to use scaling to fit a diagram to your page. However the font sizes won’t match up. One trick to get around this is to use a different font than is used in the main body of the text. For example, use Helvetica in the diagram and Times Roman in the body text. This way, the font size mismatch will be less obvious.

Careful use of the scale transform applied to the whole diagram can also be used to adjust an existing diagram to fit in your target space. Make sure you adjust horizontal and vertical proportionally. You may need to rectify font sizes slightly afterwards (they might be 9.1 pt instead of 9 pt, e.g.)

Figure 2: In Inkscape, you can find the option to resize the canvas under File / Document properties... and select the Page tab.

For line drawings, output vector graphics

When you drawing is shown on the final printed page, the available final resolution may be thousands of dots per inch (DPI). For this reason, a pixel graphic that works well on the screen may look pixelated and ugly on the printed page. Once you become aware of this issue, such eyesores stick out. To avoid the problem, make sure you can output vector graphics from your drawing tool before you start.

There are a few cases where it is still a good idea to output raster images. For example, if you have a scatter plot graph that contains many thousands of individual dots, then rendering this as a vector image can slow the PDF viewer down a lot when it shows your final page. In this case, I recommend using a PNG (or raster image) instead.

If your tool doesn’t support line drawings of the quality you desire, spend some time investigating some other tools.

Selecting a suitable drawing tool

There are many excellent drawing tools are available. All seem to have strengths and weaknesses. The main point here is to not to accept the default, or at least, not to accept it without some probing. Here are a few that I’ve found useful but there are plenty more and if you have other favourites, please let me know! Note that I deliberately exclude non-free software here. For example, Microsoft Powerpoint and Microsoft Visio are well regarded but if you lose access to the license, you will no longer be able to edit your image.

Tool	Strong at	Not so good for
Inkscape	Basic line drawings, shadows	Connected elements, generating from code, accuracy
tikz	Accuracy, relative positioning, generated from code	Quick sketches
graphviz	Drawing networks of connected items	Specifying where to draw ovals or boxes or how to route lines
dia	Electronics, data flow diagrams, UML, exporting to LaTeX (pgf) code	Precise diagrams
Google docs	Including in a google slides presentation	Precise diagrams

Use a consistent sizing of fonts and lines

Avoid having some lines thicker than others unless it was your intent to convey extra information this way. In this case, be careful that the reader understands that extra information in the way that you think she does. Try to avoid allowing lines to be too thin or thick (1 pt should generally be considered a minimum).

Avoid small font sizes: some conferences and journals explicitly request nothing smaller than 8pt in graphics.

Avoid overly large font sizes. Scaling of a small diagram can expand text – avoid this by turning off rescaling in LyX or LaTeX and by limiting the maximum font size to around 12 or less.

As a general rule, you should try to match the caption font size (typically 9pt).

Type consistency

Aim to have a particular type of graphical element (such as an arrow, box or circle) having a consistent meaning across the whole diagram. For example, avoid using arrows in one place to indicate transfer of control and in another, transfer of information.

Try to use standard diagrams

If you are representing data flows, use a data flow diagram (DFD). If you are representing class hierarchy, use a UML class diagram. If you can’t find a diagram style that suits, you may need to make your own but consider borrowing strong elements from existing formats.

Be careful with resizing

If you resize text, it can make a fundamental alteration to the font – squashing horizontally and stretching vertically or vice versa. This text will look slightly wrong (but you probably won’t be able to say exactly why it’s wrong unless you look closely).

A similar problem occurs with many other things (such as the widths of lines, which will be altered by squeezing or stretching).

The solution is to completely avoid resizing using the stretch tool. Resizing of boxes and lines can be achieved by using the “edit paths” tool. Text should be resized by changing the font size.

If you have existing text that has been stretched or squashed, the simplest fix is to cut and paste the text to a new text box. You’ll probably be surprised how much it changes!

Drawing arrows between shapes

The way that most computer programs (and most computer scientists) that draw diagrams is not, in my view, aesthetically pleasing. They tend to use the rule: draw from the centre of an edge to the centre of the target edge.

However, I (and many other people) prefer that you draw arrows aligned to a line going through the centroid (or centre of mass).

Furthermore, the eye appreciates curves rather than straight lines; so you could keep with centre edge but curve the line

But actually I think that this works poorly when there are many arrows and it is better to draw a curve centroid to centroid.

To construct this last one, you need to either use clipping (I think Inkscape might support this) or you align your arrow with a curve that is drawn centre to centre that starts out going right and ends going right. The control points need to be done by eye to make a line that you find appropriate.

Figure 3: An example diagram using GraphViz (specifically, the dot program). Note that arrows meet the surface of the oval so that the line of the arrow points to the centre of the oval. Some further improvements to make here are to ensure that text does not crash into the lines or arrowheads.

Sizing boxes with text

Generally speaking, vertical and horizontal padding between text and the edge of a box surrounding it should be (a) even above and below / left to right and (b) roughly the same between horizontal and vertical.

Choose a good colour scheme

http://colorbrewer2.org/ provides a nice way to choose a colour scheme that is both pleasant and consistent. It also helps with producing diagrams that might also work if they are printed on a black and white printer or are viewed by people who have impaired colour vision.

I must admit that I always found it a bit of a pain to carefully type in the hex codes for each individual item that I wanted to colour. So I was especially pleased when I discovered the export option in colorbrewer. Note that the resulting palette will be named something like `CB_qual_Paste1_5′ and you need to restart Inkscape after moving the file into your palette directory to get it to load.

Figure 4: Colorbrewer2.org also supports exporting a palette, which can be downloaded and inserted into the appropriate directory for Inkscape (see https://inkscape-manuals.readthedocs.io/en/latest/palette.html).

Print out and review

Many small mistakes can be spotted by printing the graphic out in the correct size (i.e., the size that it will eventually be printed at) and examining carefully. There are several things to check for:

have boxes or lines become pixellated?
is the text readable (too small or large)?
is it well balanced
are there extraneous artefacts (e.g. small graphical elements that are not supposed to be there)

Check for spelling errors

The print out and review stage is also a good time to make sure that the spelling is correct. Although Inkscape and many other graphical programs will check for errors, they can’t spot substitutions such as “through” to “trough” or “perform” to “preform”. The only way to be sure is to read through all the text carefully.

Make your figure caption informative

Captions help the reader understand a diagram. However, there seems to be a trend towards making captions short and rather cryptic. For example, “System architecture” doesn’t tell you anything. Probably there are some boxes and arrows—but what do the boxes and arrows stand for? Is a box a piece of software or a physical computer or a metaphorical entity that flows over multiple devices? What do arrows really mean? Flow of information, perhaps? Direction of control? Physical connection? If colour is used, or some element is made larger or bolder, was this just a slip of the mouse or in important part of the information that was intended to be conveyed. Thus a good figure caption can help to clarify those elements that are ambiguous.

A common idiom is to start a caption with a noun phrase that identifies the thing that you are looking at. However, you shouldn’t stop there. Aim for about 3 or 4 lines of text but use more or less depending on how easily your diagram can be explained.

A check-list for graphics

The following check-list should be used to ensure your graphics are of good quality:

Is the graphic sized correctly for the target column size (about 3.5 cm for double column, e.g.)?
Are fonts consistent and sized so that they are readable?
Is padding around text minimal (not wasting too much space)?
Is the colour scheme appropriate for the use?
Have you printed out and reviewed on paper?
Spell checked?

Some examples

Figure 5: In this original version of an architecture diagram, notice how the arrow heads are unclear. The blob on the right of the diagram is supposed to represent the finite element model of the structure (a subway terminal) but this also is unclear.

Figure 6: In the revised architecture diagram, the wireless sensors are more clearly shown with antennae that actually look like antennae. Clipart from a free clipart website was used here. The finite element model is also more clearly portrayed. Unfortunately, there is an error in the clipart used for showing the computer screen—the month of October has been skipped.

July 2, 2020July 4, 2020

Making tables reproducible

Key ideas

DON’T manually transfer table values to LaTeX – DO put the data in a separate file that gets loaded during compilation.
DON’T format your values and truncate decimal places manually – DO use a script to truncate values consistently.
DON’T manually insert units or convert exponents – DO use siunitx to format numbers and units.
DON’T end up with a jumble of scripts – DO tie your workflow together with a Makefile

Introduction

Developing reproducible research is a key element in producing robust results and good science.

To achieve research that is reproducible by others, we must first be able to reproduce it ourselves. That is, be able to come back to our source files in 6 months (or more!) and re-run any part of the analysis, reproduce any graph, check the values in the tables, and generally ensure that your results were not a fluke. How important reproducibility is to you will be discipline dependent but no self-respecting researcher should be publishing a paper that has results that cannot even be reproduced given access to the original data.

One element of reproducibility, that I want to focus on here, is to produce tables in such a way that manual error is avoided and that any value in the table can be recalculated.

Manual handling of numbers is a common source of error but it needn’t be. Once scripts are set up to automatically generate tables from the source data, they are easily modified to suit the next table, the next paper, or the next project. If you are not using a tool that supports you producing your tables directly, then this will be the biggest hurdle. However, without this step, it will be hard, not just to automate your tables, but to make your work completely reproducible.

Your tools will dictate, to some extent, how easy automation is to do. Try to avoid tools that encourage manual handling, such as Excel and Word, and switch instead to tools that make automation easier, like R and Python. I also recommend that you make use of GNU Make to tie everything together and make it easy to remember what to do when you come back in 6 months time.

Background

I don’t want to provide a detailed literature review here but I do want to make a note of some important trends in science.

The Open science movement is leading the way towards more transparent scientific practices. Scientists are starting to realise that the intellectual honesty that goes with open source software should also apply to their outputs. This means more than just making the output (or journal paper) freely available – it means making the `source code’ of that output available, including the original data used and analysis scripts.
A systematic study of biomedical research in 2005 by Ioannidis found that most published research findings are false. It seems likely that the problems identified for biomedicine are worse, not better, for other disciplines.
A 2016 survey of 1500 scientists asked if there was a reproducibility crisis? More than half said `yes’, with another third saying that there was a slight crisis. Scientists reported trouble reproducing others and many said they even had trouble reproducing their own experiments, when they attempted to do so.

The key message is that reproducibility is of fundamental importance and that we all need to work harder at enabling it for our own research.

Automating the process starting with table loading

Figure 1: Example results file to be converted into a table

Most researchers will manually transcribe this data into a LaTeX (or Word) document, like so:

\begin{tabular}{...}
Policy & Avg. Reward & Comfort score & Energy use (Wh)\\
bang-bang-et & $-2.82$ & $-0.72$ & $1950$ \\
bang-bang-avg & $-2.27$ & $-0.87$ & $721$ \\
...

Note how a few things needed to be manually transformed in this process.

Numbers need to be written in math-mode (using $ signs) to give a consistent font and to ensure that the minus signs look right.
Some rows or columns may not be relevant (the 1.48E+12 is actually to do with the Unix clock time when the result was generated).
The numbers need to be truncated or rounded appropriately. It’s a good question to ask “what’s appropriate?” here. If you have standard deviation or confidence intervals then round appropriately for that. Leaving in a large number of digits suggests that you don’t understand the uncertainty in your data.
Exponents need to be translated into a printable form. Note that the siunitx package has a nice facility for doing this automatically.

With so many little details to be taken care of, automating looks hard. Fortunately, there are some cool tools to help.

If the job is a simple one (or can be made simple), try using csvsimple to load in table. I won’t describe this here but there’s lots of help on the Internet.

The csvsimple package allows you to put extra commands, such as \si{} (from siunitx), around each table entry but for specialist needs (such as, truncating numbers) you may need to write your own script that writes a `tex’ file. This tex file will then need to be included into your main file with \input.

Truncating numbers appropriately

Quoting table values to 10 decimal places is clearly not appropriate. Most experiments, if tried again, will yield slightly different values. We should aim to express numbers in a way that appropriately reflects our uncertainty about the true value.

For example, imagine that we have an experiment that involves 10 trials and we record the mean measurement value from those trials. The standard deviation provides useful information about the likely precision of the mean. Confidence intervals can often be derived from the standard deviation given the sample size and assuming a normal distribution.

As a rule of thumb, the standard deviation should be expressed to one significant figure unless the number is between 11 and 19 (times some power of ten) in which case you can use two significant figures.

The measurement value should be expressed to agree in terms of decimal places with the standard deviation.

For example, a value resulting from a spreadsheet calculation of an average and standard deviation might be 10.1298 ± 0.2595. This should be expressed as 10.1 ± 0.3 or 10.1 (0.3) where the number in parenthesis is taken to be the estimated standard deviation. The estimate indicates that the value is only known to within three tenths of a unit of measurement. The figures beyond the tenths place are not informative to the reader and should be truncated.

Note that I’m glossing over the details here and it is worth reading more about measurement uncertainty.

The following code roughly obeys the above rules. The trick is to use a nice feature of the python string formatter that allows the number of digits after the decimal point to be parameterised.

def mean_string(m, s):
    return '{:6.{sig}f} ± {:.1g}'.format(m, s, sig=-int(np.floor(np.log10(s))))

>>> mean_string(0.016933, 0.005105)
' 0.017 ± 0.005'

The way this works is to work out the base 10 log of the s.d. This number will typically be negative (I haven’t dealt with s.d. > 1!). Taking the floor of this number will tell you how many digits after the decimal point need to be included to format the mean.

For this simple code, the s.d. is simply formatted with 1 significant figure. This might be improved by first finding out if the first two digits of the s.d. are between 11 and 19 inclusive and in that case formatting with 2 significant digits.

Makefiles to tie together and document

I always find that when I come back to a project after leaving it for a few weeks that I cannot remember what I’ve done. Some vague recollection exists, perhaps, of scripts that process one file into another but the details and ordering of the procedure have vanished from my memory.

In theory, one might document the process. However, this still leaves you with a manual process that may get out of step with the last version of the documentation.

A better approach is to weave the documentation and the code together into a master script. GNU Make provides a simple and effective method for doing this.

I should note that GNU Make simply does not handle having spaces or special characters in filenames—don’t even try. However, it is usually not such a burden to use hypens or underscores.

An excellent tutorial on using Make for reproducible research is provided by Arnold et al.

Conclusions and next steps

Automating your research can seem like cycling up a steep hill. Progress appears slow and it’s much harder work than usual. However, once over the hump, you’ll find the going much easier and generally much more rewarding.

In closing, I’d like to point again to documentation provided by The Turing Way Community as a good general source of information on how to make your research more reproducible. You may also want to look at an online course on reproducible science.

[1] The Turing Way Community, Becky Arnold, Louise Bowler, Sarah Gibson, Patricia Herterich, Rosie Higman, … Kirstie Whitaker. (2019, March 25). The Turing Way: A Handbook for Reproducible Data Science (Version v0.0.4). Zenodo. http://doi.org/10.5281/zenodo.3233986

Acknowledgements

This blog post was produced using Emacs, org-mode, and org2blog.

March 23, 2018February 7, 2019

Vagrant-based TinyOS Install

Here’s a quick procedure to get you up and running with TinyOS. Although it’s aimed and Windows users, it should work on Mac and Linux with some tweaks here and there.

The method proposed here runs the virtualbox in “headless” mode, which means that you only see a command line under Ubuntu Linux and not the whole windowing system. Don’t worry, you can use your favourite Windows editor to edit as all the important files will be accessible under Windows.

Step 1. Check that you have virtualization turned on

https://bce.berkeley.edu/enabling-virtualization-in-your-pc-bios.html

Virtualization is needed for the virtualbox to run normally but some laptop manufacturers turn it off by default. If you skip this step, you might see errors when you try to start Ubuntu. For example, it might suggest you enable PAE/NX (but don’t do this!). If you have a 64-bit CPU, this shouldn’t be needed and if you don’t, you can switch to using a 32-bit version of Ubuntu by changing the Vagrantfile.

Step 2. Install VirtualBox

Go to https://www.virtualbox.org/wiki/Downloads and download a recent version.

Step 3. Download and install the VirtualBox Oracle VM VirtualBox Extension Pack from the same address.

The extension pack is required for USB 3.0 and so if you later find that you haven’t got this option enabled, it’s probably because your extension pack is not properly installed. Note that downloading it is not enough, you need to get it installed into VirtualBox.

Step 4. Download and install Vagrant

http://www.vagrantup.com/downloads.html

Vagrant is a great tool for setting up virtual machines automatically.

Step 5. Download this Vagrantfile

https://raw.githubusercontent.com/jbrusey/cogent-house/master/Vagrantfile
After clicking on the link, you’ll see a text file. You need to tell your web-browser to save the page.

Note: Windows wants to put a file type of “.txt” on the end of this file name. You will need to manually rename this after downloading.

ren Vagrantfile.txt Vagrantfile

It contains a script—written in Ruby—that tells Vagrant what to do to install TinyOS on a Ubuntu virtual machine. Here’s the start:

# -*- mode: ruby -*-
# vi: set ft=ruby :
$script = <<SCRIPT

The first line is an instruction to Emacs and the second to Vim, both to set the syntax highlighting etc to Ruby mode. The next line builds a script file (written in bash) and puts that in $script.

The script is run when the virtual box is provisioned or set-up for the first time. Let’s look at the first step:

wget -O - http://tinyprod.net/repos/debian/tinyprod.key | sudo apt-key add -
# add tinyprod to apt sources
tee -a /etc/apt/sources.list >/dev/null <<APTEOF
deb http://tinyprod.net/repos/debian squeeze main
deb http://tinyprod.net/repos/debian msp430-46 main
APTEOF

This downloads the public signing key for the tinyprod repository, installs it, then appends the tinyprod repository to the list of apt sources.
The rest of the script installs various bits of software, including linux-image-extra-virtual, which is needed for the FTDI driver that talks to the mote.

Step 6. Make a folder for your system

Create a folder or directory and put the Vagrantfile that you downloaded in the last step here.

Step 7. Open a terminal window and cd to folder

Open a terminal window. On Windows, this is done by going to the start menu and typing cmd. When you get a command prompt, you need to cd to the directory that you just created. If you don’t know how to do this, try searching for a tutorial on google. When you’ve got to the right directory, use dir to check to see that you can see the Vagrantfile there.

Step 8. Install vbguest plugin

Note: with the latest release, it seems to be better to put off installing the plugin until after Step 11.

To install, in the command window, type

vagrant plugin install vagrant-vbguest

to install the vagrant virtualbox guest installer. The VirtualBox guest is software that will be installed on your guest Ubuntu system. This is important as it allows us to share some disk space between the host (Windows) and the guest (Ubuntu), which is going to make it easier to edit files.

Step 9. Start the main install

From the command window, while in the folder with the Vagrantfile, type this:

vagrant up

This may take a while as it has lots to download. If you are on a slow connection and can move to a faster one, now is the time. Otherwise, time to make a coffee!

Note: if you get a message suggesting you need to do a vagrant init at this stage, it means that you are in the wrong directory. Don’t just do the vagrant init! Rather, you should go back and cd into the directory you created for the Vagrantfile.

Step 10. Reload

Once the vagrant up completes, you should see a recommendation about rebooting several times. You can do this with

vagrant reload

This step is needed because the provisioning process installs kernel drivers that are not installed until after a reboot.

Step 11. Test it out

To login to the system (after bringing it up with vagrant up), use

vagrant ssh

This will bring you to a command prompt that looks like this.

Welcome to Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-143-generic x86_64)
...
Setting up for TinyOS
vagrant@vagrant-ubuntu-trusty-64:~$

If the guest additions have installed correctly, ls /vagrant should be non-empty, which means that it has correctly shared the folder that you put Vagrantfile in.

ls /vagrant
Vagrantfile

Now we’ll copy the Blink app into that directory:

cd /vagrant
cp -r /opt/tinyos-main/apps/Blink .

Compiling it is easy:

cd Blink
make telosb

. The result should look something like this:

mkdir -p build/telosb
    compiling BlinkAppC to a telosb binary
ncc -o build/telosb/main.exe  -Os -fnesc-separator=__ -Wall -Wshadow -Wnesc-all -target=telosb -fnesc-cfile=build/telosb/app.c -board= -DDEFINED_TOS_AM_GROUP=0x22 -DIDENT_APPNAME=\"BlinkAppC\" -DIDENT_USERNAME=\"vagrant\" -DIDENT_HOSTNAME=\"vagrant-ubuntu-\" -DIDENT_USERHASH=0x08307f04L -DIDENT_TIMESTAMP=0x5ab51409L -DIDENT_UIDHASH=0x787f74f2L  BlinkAppC.nc -lm 
    compiled BlinkAppC to build/telosb/main.exe
            2538 bytes in ROM
              56 bytes in RAM
msp430-objcopy --output-target=ihex build/telosb/main.exe build/telosb/main.ihex
    writing TOS image

Program a Telos mote

The Vagrantfile is set up to recognise Telos motes. Assuming that this is what you have, plugging the mote in and typing motelist should show you something like:

Reference  Device           Description
---------- ---------------- ---------------------------------------------
FTWGRY9A   /dev/ttyUSB0     FTDI MTM-CM5000MSP

If you get “no motes found” at this stage, it probably means that virtualbox is not grabbing the mote for you. See below for more info on how to fix this.

With the mote still plugged in, type

make telosb install

and you should see the same compile output as before, plus:

    found mote on /dev/ttyUSB0 (using bsl,auto)
    installing telosb binary using bsl
tos-bsl --telosb -c /dev/ttyUSB0 -r -e -I -p build/telosb/main.ihex.out
MSP430 Bootstrap Loader Version: 1.39-goodfet-8
Mass Erase...
Transmit default password ...
Invoking BSL...
Transmit default password ...
Current bootstrap loader version: 1.61 (Device ID: f16c)
Changing baudrate to 38400 ...
Program ...
2598 bytes programmed.
Reset device ...

If you have problems here, it is usually something to do with the configuration of the VM (such as, not having USB 3.0 configured).

Once you’ve gotten this far, you can go and do the tutorial at: http://tinyos.stanford.edu/tinyos-wiki/index.php/Getting_Started_with_TinyOS

Things that can go wrong

These instructions depend on components that sometimes get broken. For example, the squeeze repository for tinyprod seems fine whereas the wheezy, not so much. The first rule when things don’t happen as expected is to carefully read the error message and google it if you still don’t understand.

`/vagrant` is empty or not found

This usually means that vboxsf didn’t get installed or is the wrong version for your guest operating system.

`motelist` doesn’t give a `/dev/tty` address

If your mote is plugged in but not being listed by motelist, there are several things to check.

It could be something to do with the USB port. Try another one, if you can.
It could be that the device filter is wrong. The Vagrantfile given has a filter for clone Telos motes only, so you may need to add additional filters to get your host to give the guest VM access, if you have a different sort of device.
If you see a line for the mote, but no /dev/tty, it could be that the FTDI drivers are not installed. Try using dpkg -l linux-image-extra-virtual to see if you have them installed. If not, there may be other elements not installed and it is worth checking everything by looking through the Vagrantfile. Another possible cause here is that your system needs to be rebooted.

You only have a 32-bit processor

There’s a simple fix for this one. Just edit the Vagrantfile and change the line that says

config.vm.box = "ubuntu/trusty64"

so that it points to "ubuntu/trusty32". There is a problem with this image, though, in that the ethernet driver doesn’t work with the default ethernet card in VirtualBox. To fix this, change the ethernet card setting in VirtualBox to emulate a different card.

Vagrant hangs after “VirtualBox Guest Additions: Starting.”

I had this happen after upgrading to Vagrant 2.2.3 and VirtualBox 6.0.4. It seems to be due to a problem with the vagrant-vbguest plugin.

Having done many hours of testing last night trying various options, it seems like the simplest approach is to Ctrl-C out of the VirtualBox Guest Additions: Starting. message and then vagrant reload. Once it is up, use vagrant ssh to check that it is working ok. To test, try

ls /vagrant

and check that you can see files there. (it should not be empty)

Also, plug in a telos mote and type

motelist

which should give something like

Reference  Device           Description
---------- ---------------- ---------------------------------------------
MFVKDTBJ   /dev/ttyUSB0     FTDI MTM-CM5000MSP

More detail on why …

The instruction that is hanging is

modprobe vboxsf

It seems like the vagrant plugin uninstalls the virtualbox-guest-* packages but the kernel modules associated with this are still in memory. Specifically “vboxvideo” cannot seem to be unloaded – I’ve tried lots of ways. The simplest fix is to just reboot.

This now seems to work with the latest VirtualBox and Vagrant versions.

August 22, 2017

Calculator < Spreadsheet < R < ? for large n

It has previously been suggested that the introduction double entry accounting funded the renaissance. This seems quite illogical at first sight. How can a change to an accounting method produce more money? I suspect that the answer is that the new method reduced error and thus reduced the number of petty disputes. It also made theft harder for employees. With the new system, less checking was needed; trust was increased; supplier and consumer were in greater harmony. The point is: a new method, even for something as simple as adding numbers, can have a remarkable effect.

I recently watched my sister use a calculator on an Ipad to add up the numbers on a bill to check them. Initially, I was impressed by her fiscal diligence. Being me, I tried to convince her to use the spreadsheet that she has built in to her Ipad. She was quite unimpressed with this idea and kept using her Ipad calculator.

And then the numbers didn’t add up. Fortunately, I wasn’t so silly as to keep pressing her to use a spreadsheet and also fortunately, the calculator app being used had a history. It showed that a sneaky “zero times” had been inserted at the front.

The lesson is that, for small tasks, calculators (and mental arithmetic) are just fine but they don’t scale. As soon as the size of the problem increases (even slightly) one is soon left with repeating the calculation several times to check that you have the right answer.

It’s interesting to see that spreadsheet tools have a similar problem.

During a writing course that I teach to engineering and computer science PhD students, I ask them to take some real-life data, involving temperatures in different rooms of a house over time, and produce a meaningful graph. Since the data have more than two dimensions (room, time, temperature) producing a graph with most spreadsheet tools is difficult. Interestingly, very few of the otherwise well-educated students manage to produce a correct graph. Most often, they get a graph like this:

This particular graph is wrong because it doesn’t distinguish between different rooms and truncates the date value incorrectly. Spreadsheet tools produce a wrong graph easily. Unfortunately, it is almost impossible to get such tools to produce a correct graph.

R, and in particular the ggplot2 library for producing graphs, are much better tools for this task. Here is one solution as R code.

library(ggplot2)
library(readr)
library(dplyr)
x <- read_csv("getData.csv", skip = 2,
              col_names=c("nodeId", "house","room","time","value"))
y <- mutate(x, time= as.POSIXct(time, format="%d/%m/%Y %H:%M:%S", tz="GMT"))
ggplot(y, aes(x=time, y=value, colour=room)) +
  geom_point() + geom_line()

This code looks complicated but it’s been created step by step in Rstudio, a tool that makes it easy to construct the code and test each piece as you go.

Here are the key steps:

read the data into memory
convert / massage data
plot the massaged data

Step one is to read in the data into memory. I prefer read_csv from the readr package because it’s fast and it doesn’t make too many assumptions or try to interpret the data. A first attempt might look like:

library(readr)
x <- read_csv("getData.csv")

This almost works but the extra header lines cause a problem. skip = 2 gets rid of these but loses the header information so a simple fix is to manually specify the column names with col_names.

The next step is to convert the time from a string to a time / date format. R has a number of ways of storing times but POSIXct, which stores date and time as seconds since 1st Jan 1970, is usually a good choice. Note that when converting, you need to say what timezone is involved. Since the data was gathered in the UK, GMT is probably a safe bet.

The conversion requires a format string. This is another improvement over the spreadsheet tools, which guess the format and sometimes wrongly choose M/D/Y when the date is really D/M/Y.

With R, we have a variety of ways to update a column. The following might look simpler to some:

x$time <- as.POSIXct(x$time, format="%d/%m/%Y %H:%M:%S", tz="GMT")

This uses $ to reference a data frame column by name.

This syntax is a bit unreadable though and dplyr offers a nice alternative. For this short script, either will do.

The final step is to plot the data. R has in-built plotting tools but ggplot2 provides much more flexibility. We start by telling it which data frame and what axes. Note that the axes parameters go inside aes() (short for aesthetic).

ggplot(y, aes(x=time, y=value, colour=room)) + geom_point() + geom_line()

ggplot() on its own doesn’t produce a plot. For that, we need to add a layer. Here, we’ve added two: one that produces a point for each data point and one that draws lines between the points. The result looks like this:

Temperature plot created with R and ggplot2

Note that we’ve still got work to do here: for example, the y axis might be better labelled “Temperature (deg. C)”. However, it is a good start and much nicer looking that anything that I’ve been able to produce with a spreadsheet tool alone.

Wrapping up

My conclusion is not that one needs to stop using calculators or spreadsheets but rather that one needs to be aware that as n grows larger (where n is number of items, number of dimensions, or complexity of the task), basic, generic tools fail and more sophisticated and specialised tools are needed. When the problem gets hard, don’t forget to upgrade the tool.

July 30, 2017

Daily planning

I am terribly easily distracted. I can start my day with a firm plan to get on with writing a scholarly paper and end up playing all day with some programming tool that I have just discovered.

Why is this a problem? Well, sometimes it doesn’t matter that I occasionally get distracted and I am able to string together sufficient periods of focus and concentration that I perform the work that I need to. However, it sets a bound on what I can do in a certain period of time that is lower than my real ability.

The solution? Make a plan each day. At least, that’s what the self-help books will say. And they’re not wrong but the quality of the plan and the way that you react to your plan can make a big difference. Let me start by telling you about my planning approach – which is roughly sketched in the mind-map here.

The A5 notebook

Let’s go through this one bit at a time. Start by getting an A5 spiral bound notebook. The A5 size is good because it doesn’t clutter your desk and is big enough to hold your daily plan. Spiral bound is better than other sorts of bindings because it means that you keep a record (better than tear-off sheets) and that it opens flat without having to hold it open.

Divide the page vertically in half. You can do this by folding the page to get a top-to-bottom crease. Another way is to rule a line. On the lefthand side of the line we’ll put the day. I usually start at 8:30 but you can start as early or late as you like. With an A5 page, I reserve one line per half-hour slot by skipping the top line and then writing 9-skip-10-skip … 5, from top to bottom.

Book fixed slots

There are some things in your day that will happen at a particular time and other things that might move about. For example, the time that you arrive in the morning is generally fixed. Similarly, it is good to fix your lunchtime and leaving time. Appointments that have been booked are generally also immobile. These fixed slots can be written into your daily schedule. I generally use a brace sign “}” to indicate slots that occupy more than a half-hour.

So, start by writing in your meetings, lunchtime, start and end times. Add a fifteen minute slot for the planning session (generally at the start but can also be done at the end of the day). Add a further fifteen minute slot for review at the end of the day.

This forms the skeleton of your day.

Schedule your e-mail processing

If you are like me, you are easily distracted by your e-mails. Try scheduling slots when you can look at e-mails. Avoid scheduling e-mail slots at the start of the day, where they can influence your planning process. Instead, try scheduling them, say before lunch and just before going home. This has the effect that there is a definite limit to the amount of time processing e-mails.

E-mails can hijack your priorities. Don’t let them! Keep your e-mail program shutdown until the allocated slot.

If this seems like an advanced concept that maybe you’re not ready for, that’s fine. But when e-mail processing starts to rule your day, it’s time to tame the tiger and restrict how much time you spend on it.

The Task List

The list of tasks goes on the right hand side of the page. You can start writing close to the crease or ruled line; the annotations that we will add later can go on the left of the line.

Step one – copy forward

Step one is to copy forward any tasks from the previous day. This shouldn’t be done dully. If a task is being constantly copied forward without being completed day after day, it indicates a problem. Perhaps the task is not really that important? Perhaps it is waiting on something else? Perhaps it is simply too hard?

There may not be simple solutions to such sticky tasks, however here are some things to try:

Do it. Sometimes a much-put-off task is only seemingly scary. Once you start to do it, you find that you get it done quickly and without fuss.
Dump it. Perhaps it is not as important as you thought it was.
Delegate it. You may not be the right person for a certain task. Be ready to seek help from someone else.
Schedule it. Is it waiting on something else? Will it soon become easy to do? Is it being pushed back because it is a low priority task when you have a lot of high priority work to do? Sometimes the answer is to schedule a reminder for next week, when perhaps there will be more time.
Break it down. It may be too large a task to tackle in one day. You may be making progress but because you are not ticking off to do items for this task each day, it feels like no progress is being made. You can avoid this by treating it like a project with sub-tasks. Sometimes even the sub-tasks will need to be broken down into sub-sub tasks before it feels like you are making progress.

Step two – copy from master list

Step two is to copy items from your master task list. Personally, I keep a long-term or master list of tasks on my computer but it can also be in paper form, if you prefer. Again, you must choose these tasks carefully so that you don’t avoid difficult tasks but also don’t overload your task list on any one day. We’ll need to check for overloading later on.

Step three – break down large tasks

Step three is to break down larger tasks into smaller tasks. On a day planner, the task “write a paper” is too large a task to do in one day. It needs to be broken down into smaller chunks.

Breaking a large task into smaller chunks can help in a number of ways: It helps give a sense of progress; it makes it easier to estimate how long the task will take overall; it makes it clearer how to go about a task.

Step four – assign priority

Priority helps determine the order in which things should be done. That’s important because, without making a conscious decision about priority, all the easy tasks will get done first and the hard tasks will be left for tomorrow.

The first step is to decide which tasks need to be done today without fail. These are “A” tasks. If you never have A tasks, don’t worry. It’s not a sign that what you do is not important. All other tasks are “B” tasks. There’s really no need to have further categories but as with any of the suggestions here, you can do as you please. (And please write a comment if you have a useful improvement!)

Within the A and B tasks, we now need to set a priority ordering.

Priority has to come from your overall priorities. Do you know what they are? They might be 1. Write X journal papers per year; 2. Help out other team members; 3. Teach class on Y. etc. Whatever they are, it’s useful to have your priority list handy when you are prioritising.

When you’ve finished this stage, all tasks will have a letter (e.g., B) and a number (e.g., 3) to make up a code (B3) that we write to the left of the task.

Step five – estimate task duration

Identifying how long a task will take can be hard. It’s a common mistake to underestimate how long something will take us. There are two solutions to this problem. First, break-down large tasks into smaller chunks. Usually these smaller chunks will be easier to estimate. For example, I don’t know how long it will take me to write a paper but I can have a pretty good guess at how long it will take me to write an abstract. I can guess even more accurately if the task is “rough first draft of abstract”.

Second, get some practise and reflect on the results. As with many things, practise will improve your ability. For example, the first time you estimate how long for a rough draft of an abstract, I might estimate 30 minutes (or 1 pomodoro – see below). However, when I came to write that abstract, I realised that I need to spend 30 minutes brainstorming and organising ideas before I was ready to put finger to keyboard. So the real time, was more like 2 pomodoros.

So what are these pomodoros? The term comes from the Pomodoro Technique, which is a wonderful method for staying focused and organising your time. The day planning technique used here is intended to be used in conjunction with the Pomodoro Technique. In PT, a “pomodoro” simply refers to 25 minutes of work followed by 5 minutes of rest.

When you are finished with this stage, you’ll have a duration for each task. Durations less than 1 may mean that you are breaking down your tasks into too small chunks. Durations larger than 6 may indicate that the task is too large.

Carrying out the plan

Once you have a plan, you can start carrying it out. As you do so, note down in your timeline for the day which task was done when. This will help later when you come to review.

Keep to your priority ordering, doing the A tasks first.

Try following the pomodoro technique of taking five minute breaks every 30 minutes.

Finally, be aware of your limits. If you have a day full of meetings, don’t expect yourself to also achieve on the task front.

Review

Before retiring at the end of the day, review progress on tasks. Were the meetings fruitful? Did you progress with the tasks? Even if not everything went to plan, don’t fret. Find three things to be happy about.

Did you do something nice for someone? Note it down. Also you can record your progress on forming good habits, such as regular meditation or exercise.

If you like to have an overall assessment, draw a smiley (or frown-y, or wobbly mouthed) face to reflect on how you feel.

At the end of the week, review progress over the whole period. Again, try to focus on the positives.

Finally, feel free to adapt the approach to suit you. And when you come up with a really useful change, don’t forget to tell me about it!

Acknowledgements

My planning approach is largely based on a paper-based approach originally advocated by Priority Management. My approach differs in a number of respects but is derivative.

FAQs

Why don’t you pre-print sheets with this layout?

This is a personal preference but there are several reasons:

Separate sheets would need to be bound into a book to form a permanent record
Printed or photocopied paper doesn’t absorb ink well and has a rough feel. Plain paper is just so much nicer to write on.
Photocopied sheets tend to be A4, which is unnecessarily large and clutters the desk.

July 28, 2017July 28, 2017

What are you looking up here for?

When I was young, a favourite activity on the way home from school was to stop into a magic shop in a tiny arcade in the centre of Melbourne. Sometimes, if we were lucky, the owner would show a group of us some trick to encourage us to spend some money. I remember specifically standing in that shop one afternoon, agog with its delights when I finally spied a sign on the ceiling that said “what are you looking up here for?”

So I can start posting on this blog only if I am completely honest with the reader and admit that I cannot pretend to offer anything yet. Please look in on me some other time and there might be something here of value … but not just yet.

Start with a rough sketch

Size your canvas appropriately before laying out

For line drawings, output vector graphics

Selecting a suitable drawing tool

Use a consistent sizing of fonts and lines

Type consistency

Try to use standard diagrams

Be careful with resizing

Drawing arrows between shapes

Sizing boxes with text

Choose a good colour scheme

Print out and review

Check for spelling errors

Make your figure caption informative

A check-list for graphics

Some examples

Key ideas

Introduction

Background

Automating the process starting with table loading

Truncating numbers appropriately

Makefiles to tie together and document

Conclusions and next steps

Acknowledgements

Step 1. Check that you have virtualization turned on

Step 2. Install VirtualBox

Step 3. Download and install the VirtualBox Oracle VM VirtualBox Extension Pack from the same address.

Step 4. Download and install Vagrant

Step 5. Download this Vagrantfile

Step 6. Make a folder for your system

Step 7. Open a terminal window and cd to folder

Step 8. Install vbguest plugin

Step 9. Start the main install

Step 10. Reload

Step 11. Test it out

Program a Telos mote

Things that can go wrong

/vagrant is empty or not found

motelist doesn’t give a /dev/tty address

You only have a 32-bit processor

Vagrant hangs after “VirtualBox Guest Additions: Starting.”

More detail on why …

Wrapping up

The A5 notebook

Book fixed slots

Schedule your e-mail processing

The Task List

Step one – copy forward

Step two – copy from master list

Step three – break down large tasks

Step four – assign priority

Step five – estimate task duration

Carrying out the plan

Review

Acknowledgements

FAQs

Why don’t you pre-print sheets with this layout?

`/vagrant` is empty or not found

`motelist` doesn’t give a `/dev/tty` address