Unfortunately, I've got to announce a bug in pyslim - if you've been using it, update and read on. #tskit
It's possibly a bad one: pyslim.recapitate( ) has been introducing a short bottleneck to size Ne=1 (!) in most cases.
EDIT: I accidentally pushed out the bugfix release without the fix to the bug.It is now fixed FOR REALS in 1.0.3, now on pip and soon on conda. :facepalm:
To fix it, just make sure you're using pyslim >= 1.0.3 (e.g., put this in your requirements.txt). If you've got older code that called <tree_sequence>.recapitate() then you're okay (but do update anyhow).
Overly detailed explanation to follow.
EDIT: changed to 1.0.3
Recapitation uses msprime to simulate back in time from the roots of any uncoalesced trees. To do that for a given ancestral_Ne we had to set up a sensible population model, this model was "all existing populations split from an ancestral population of size ancestral_Ne". I set the time of that split to the value SLiM stored in metadata for how long it ran - which I thought would be the time all the roots would be at.
And, it is, sometimes. But, not always - turns out it depends on the stage the simulation was started in and when the tree sequence was saved. So, turns out for most people the split didn't happen until 1 or 2 generations past where the roots were.
And, I had to set the size of those 'populations' - that weren't supposed to be actually used - to something, so they were set to (diploid effective) size 1.0.
The upshot was that there's often a bottleneck to size Ne=1 for 1 or 2 generations before whatever the desired ancestral Ne was.
<facepalm>
I'm wrorried this will affect people's hard-won results, and feel bad I missed this. We put a lot of time into writing unit tests to make sure this sort of thing doesn't happen, so it's discouraging it did.
On the other hand, it was actually a pretty sublte thing to find, and we never would have found it if not for a wide user base and active engagement. And the hard work of @bodkan@fosstodon.org to identify there was an issue.
So: how does it affect you? (If you're still reading you probably are using the method.) Well, have a look at the description at
https://github.com/tskit-dev/pyslim/blob/main/CHANGELOG.rst
and re-run recapitation steps if necessary.
Might it affect published results? Recapitation provides the "standing genetic variation" present at the start of the SLiM simulation. Happily, most people run SLiM for a long time, so this is a minor contribution by the end of the simulation. So unless what you're doing depends strongly on *that* diversity, you shouldn't be affected much. (And, arguably, if your results are dependent on the details of recapitation, then you should be worried anyhow.)
However, if you're just running a short SLiM simulation and relying on pyslim.recapitate() to provide initial diversity, then you could be thrown off - the bottleneck would only reduce diversity by ~ 50%, but could introduce very long shared haplotypes.
Again - many apologies. Write me if you'd like help figuring out if you were affected or what to do about it. And thanks again to @bodkan@fosstodon.org for spotting it!
@petrelharp oof that’s a subtle bug to find. Bummer but glad you caught it! Wonder if this is part of why the early space is the place results looked so weird with recapitation (also glad we went and ran those really long slim sims instead)
@cj_battey Hello, CJ!! Good thought, but the bug only goes back to the end of 2021, so I don't think so?