Unfortunately, I've got to announce a bug in pyslim - if you've been using it, update and read on. #tskit
It's possibly a bad one: pyslim.recapitate( ) has been introducing a short bottleneck to size Ne=1 (!) in most cases.
EDIT: I accidentally pushed out the bugfix release without the fix to the bug.It is now fixed FOR REALS in 1.0.3, now on pip and soon on conda. :facepalm:
To fix it, just make sure you're using pyslim >= 1.0.3 (e.g., put this in your requirements.txt). If you've got older code that called <tree_sequence>.recapitate() then you're okay (but do update anyhow).
Overly detailed explanation to follow.
EDIT: changed to 1.0.3
Recapitation uses msprime to simulate back in time from the roots of any uncoalesced trees. To do that for a given ancestral_Ne we had to set up a sensible population model, this model was "all existing populations split from an ancestral population of size ancestral_Ne". I set the time of that split to the value SLiM stored in metadata for how long it ran - which I thought would be the time all the roots would be at.
And, it is, sometimes. But, not always - turns out it depends on the stage the simulation was started in and when the tree sequence was saved. So, turns out for most people the split didn't happen until 1 or 2 generations past where the roots were.
And, I had to set the size of those 'populations' - that weren't supposed to be actually used - to something, so they were set to (diploid effective) size 1.0.
The upshot was that there's often a bottleneck to size Ne=1 for 1 or 2 generations before whatever the desired ancestral Ne was.
<facepalm>
I'm wrorried this will affect people's hard-won results, and feel bad I missed this. We put a lot of time into writing unit tests to make sure this sort of thing doesn't happen, so it's discouraging it did.
On the other hand, it was actually a pretty sublte thing to find, and we never would have found it if not for a wide user base and active engagement. And the hard work of @bodkan@fosstodon.org to identify there was an issue.
So: how does it affect you? (If you're still reading you probably are using the method.) Well, have a look at the description at
https://github.com/tskit-dev/pyslim/blob/main/CHANGELOG.rst
and re-run recapitation steps if necessary.
Might it affect published results? Recapitation provides the "standing genetic variation" present at the start of the SLiM simulation. Happily, most people run SLiM for a long time, so this is a minor contribution by the end of the simulation. So unless what you're doing depends strongly on *that* diversity, you shouldn't be affected much. (And, arguably, if your results are dependent on the details of recapitation, then you should be worried anyhow.)
However, if you're just running a short SLiM simulation and relying on pyslim.recapitate() to provide initial diversity, then you could be thrown off - the bottleneck would only reduce diversity by ~ 50%, but could introduce very long shared haplotypes.
Again - many apologies. Write me if you'd like help figuring out if you were affected or what to do about it. And thanks again to @bodkan@fosstodon.org for spotting it!