27 Apr 2017

Bruce's Modelling Gripes, No. 5: None or not many results

The point of a model lies mostly in its results. Indeed, I often only bother to read how a model has been constructed if the results look interesting. One has to remember that how a model is constructed - its basis, the choices you have made, the programming challenges are far, FAR more interesting to you, the programmer, than anyone else. Besides, if you have gone to all the trouble of making a model, the least you can do is extract some results and analyse them.

Ideally one should include the following:
  1. Some indicative results to give readers an idea of the typical or important behaviour of the model. This really helps understand a model's nature and also hints at its "purpose". This can be at a variety of levels - whatever helps to make the results meaningful and vivid. It could include visualisations of example runs as well as the normal graphs -- even following the events happening to a single agent, if that helps.
  2. A sensitivity analysis - checking how varying parameters affects the key results. This involves some judgement as it is usually impossible to do a comprehensive survey. What kind of sensitivity analysis, over what dimensions depends on the nature of the model, but not including ANY sensitivity analysis generally means that you are not (yet?) serious about the model (and if you are not taking it seriously others probably will not either).
  3. In the better papers, some hypotheses about the key mechanisms that seem to determine the significant results are explicitly stated and then tested with some focussed simulation experiments -- trying to falsify the explanations offered. These results with maybe some statistics should be exhibited [note 1].
  4. If the simulation is being validated or otherwise compared against data, this should be (a) shown then (b) measured. [note 2] 
  5. If the simulation is claimed to be predictive, its success at repeatedly predicting data (unknown to the modeller at the time) should be tabulated. It is especially useful in this context to give an idea of when the model predicts and when it does not.
What you show does depend on your model purpose. If the model is merely to illustrate an idea, then some indicative results may be sufficient for your goal, but more may still be helpful to the reader. If you are aiming to support an explanation of some data then a lot more is required. A theoretical exploration of some abstract mechanisms probably requires a very comprehensive display of results.

If you have no, or very few results, you should ask yourself if there is any point in publishing. In most occasions it might be better to wait until you have.



Note 1: p-values are probably not relevant here, since by doing enough runs one can pretty much get any p-value one desires. However checking you have the right power is probably important.  See

Note 2: that only some aspects of the results will be considered significant and other aspects considered model artefacts - it is good practice to be explicit about this. See

20 Apr 2017

Bruce's Modelling Gripes, No. 4: Not being open about code etc.

There is no excuse, these days (at least for academic purposes), not to be totally transparent about the details of the simulation. This is simply good scientific practice, so others can check, probe, play around with and inspect what you have done. The collective good that comes of this far outweighs and personal embarrassment at any errors or shortcomings discovered.

This should include:
  • The simulation code itself, publicly archived somewhere, e.g. openabm.org
  • Instructions as to how to get the code running (any libraries, special instructions, data needed etc.)
  • A full description of the simulation, its structures and algorithms (e.g. using the ODD structure or similar)
  • Links or references to any papers or related models
Other things that are useful are:
  • A set of indicative/example results
  • A sensitivity analysis
  • An account of how you developed the simulation, what you tried (even if it did not work)
For most academics, somehow or other, public money has been spent on funding you do the work, the public have a right to see the results and that you uphold the highest academic standards of openness and transparency.  You should do this even if the code is not perfectly documented and rather dodgy!  There is no excuse.

For more on this see:
Edmonds, B. & Polhill, G. (2015) Open Modelling for Simulators. In Terán, O. & Aguilar, J. (Eds.) Societal Benefits of Freely Accessible Technologies and Knowledge Resources. IGI Global, 237-254. DOI: 10.4018/978-1-4666-8336-5. (Previous version at http://cfpm.org/discussionpapers/172)

12 Apr 2017

Bruce's Modelling Gripes, No. 3: 'thin' or non-existent validation

Agent-based models (ABM) are the ultimate in being able to flexibly represent systems we observe. They are also very suggestive in their output - usually more suggestive than their validity would support. For these reasons it is very easy to be taken-in by an ABM, either one's own or another's. Furthermore, good practice in the form of visualisations and co-developing the model with stakeholders increase the danger that all those that might critique the model are cognitively committed to it.

It is to avoid such (self) deception that validation is undertaken - to make sure our reliance on a model is well-founded and not illusory. Unfortunately, effective validation is resource- and data- intensive and is far from easy. The flexibility in making a model needs to be matched by the strength of the validation checks. To be blunt - it is easy to (unintentionally or otherwise) 'fit' a complex ABM to a single line of data, so it looks convincing. This kind of validation where one aspect of the simulation outcomes is compared against one line of data is called 'thin validation'.

In physics or other sciences they do do this kind of validation, but they have well-validated micro-foundations so the effective flexibility of their modelling is far less than for a social simulation, where assumptions about the behaviour of the units (typically humans) are not well known. That is why such thin validation is inadequate for social phenomena.

Of course the kind and relevance of validation depends upon your purpose for modelling. If you are not really talking about the observed world but only exploring an entirely abstract world of mechanisms it might not be relevant. If you are only illustrating an idea or possible series of events and interactions then 'thin' validation might be sufficient.

However if you are attempting to predict unknown data or establish an explanation for what is observed then you probably need multi-dimensional or multi-aspect validation - checking the simulation in many different ways at once. Thus one might check if the right kind of social network emerges, and statistics about micro-level behaviours are correct and that emergent aggregate time-series are correct and that in snapshots of the simulation there is the right kind of distribution of key attributes. This is 'fat' validation, and then starts to be adequate to pin down our complex constructions.

Papers that have thin or non-existent validation (e.g. they rely on plausibility alone as a check) might be interesting but should not be interpreted as saying anything reliable about the world we observe.

10 Apr 2017

Bruce's Modelling Gripes, No. 2: Making specious excuses for pragmatic limitations on modelling

We are limited beings, with limited time as well as computational and mental capacity. Any modelling of very complex phenomena (social, ecological, biological etc.) will thus be limited in terms of: (a) how much time we can spend on it (b) how much detail we can cope with (checking, validating, understanding etc.) (c) what assumptions we can check

This is find, but instead of simply being honest about these limits there is a tendency to excuse them, to pretend that these limitations are more fundamentally justified. Three examples are as follows:
  • "for the sake of simplicity" (e.g. these articles in JASSS), this implies that simpler models are somehow better in ways beyond that of straightforward pragmatic convenience (e.g. easier to build, check, understand, communicate etc.)
  • That more complicated models are less complex (e.g. Sun et al.2016) which shows a graph where "complexity may decrease after a certain threshold of model complicatedness". This is sheer wishful thinking, what is more likely to be true is that it is harder to notice complexity in more complicated models, but this is due to our cognitive limitations in terms of pattern recognition, not anything to do with emergent complexity.
  • Changing English to make our achievements sound more impressive than they are, e.g. to call any calculation using a model a "prediction", when everybody else uses this word to really mean prediction (i.e. anticipating unknown data/observations sufficiently accurately using a model).
These weasel words would not matter so much if they were (a) purely internal to the field and everyone understood their meaning and (b) they were not used in public/policy consultations or grant applications where they might be taken seriously. Newcomers to the field often take these excuses too literally and so change what they attempt to do as if these excuses were genuine! This can be an excuse for following the easy option when they should be pushing the boundaries of what is possible. When policy makers/grant funders misunderstand these claims, an inevitable disappointment/disillusionment may follow, damaging the reputation of the field.

References

Sun, Z., Lorscheid, I., Millington, J. D., Lauf, S., Magliocca, N. R., Groeneveld, J., ... & Buchmann, C. M. (2016). Simple or complicated agent-based models? A complicated issue. Environmental Modelling & Software, 86, 56-67.