The genetic landscape of Scotland and the Isles

Significance Modern genetic analysis has revealed genetic differentiation across the south of Britain and Ireland. This structure demonstrates the impact of hegemonies and migrations from the histories of Britain and Ireland. How this structure compares to the north of Britain, Scotland, and its surrounding Isles is less clear. We present genomic analysis of 2,544 British and Irish, including previously unstudied Scottish, Shetlandic and Manx individuals. We demonstrate widespread structure across Scotland that echoes past kingdoms, and quantify the considerable structure that is found on its surrounding isles. Furthermore, we show the extent of Norse Viking ancestry across northern Britain and estimate a region of origin for ancient Gaelic Icelanders.


Supplementary Data 1 -Methods and Materials
, and were sampled on the basis that each individuals' four grandparents were born on the 107 island. These genotypes were generated as part of this study. The plurality of the Irish samples 108 (n=194/398) were sampled from the Irish DNA Atlas cohort, and have previous been reported. 6 The

109
Irish DNA Atlas individuals were recruited on the basis that all eight great-grandparents were born

119
All participants in all studies gave written informed consent. Ethical approval for the GS

203
We projected the ChromoPainter co-ancestry matrix into lower dimensional space using t-204 distributed stochastic neighbour embedding (t-SNE) 21,22 . We used R 23 (version 3.5.0) package Rtsne 205 to perform the t-SNE analysis on the co-ancestry matrix with 5,000 iterations using a perplexity of 30, 206 a learning rate of 200 and an initial PCA calculated over 100 dimensions. We found that these 207 parameters were able to visualise the genetic data without spurious artefacts from the t-SNE 208 algorithm. We plotted the dimensions using the ggplot2 R package.

210
In addition to the t-SNE analysis, we performed principal component analysis on the

211
ChromoPainter haplotype sharing "chunkcount" matrix using scripts in R 23 provided by the authors of

292
To supplement the population structure analysis of the British Isles and Ireland, we 293 investigated the levels of homozygosity 28 in each of our merged fineStructure 17 clusters by utilising 294 plink v1.9 14,15 to assess the Runs of Homozygosity (ROH) in each of our clusters. We used the 295 merged dataset used for fineStructure analysis of 2,554 individuals and 341,923 common markers,

296
and with plink we recorded ROH using a window of 1000kb, moving every 50 SNPs, with 1 297 heterozygous position and 5 missing positions allowed within the window. A ROH was called if it had 298 a minimum of 25 SNPs, a maximum inverse density of 50 kb/SNP, and did not contain a gap of more 299 than 100kb. These parameters we have found in the past to word well in detecting ROH 6,29,30 . We 300 additionally varied the minimum size for an ROH to be called, investigating; 1, 2.5, 5, and 10Mb in 301 minimum length. We recorded the average total ROH for each fineStructure k = 42 merged cluster 302 over each of the minimum lengths (Fig. S4). To calculate the inbreeding coefficient F ROH5 we 303 measured the average total ROH > 5Mb a population exhibited and divided it by the length of 304 autosomal genome in our panel of SNPs (2,878,106 kb).

305
We note that our ancestrally geographically limited sample ascertainment biases us towards 306 an inflated ROH value across clusters as we are sampling individuals with closer recent ancestry than 307 is the case for individuals with ancestry from multiple places. However, this should largely not affect 308 the relative levels between clusters.

309
Whilst we refer to these results in the main text, we explore them in greater detail here as 310 they represent a comprehensive sample of comparative ROH across the British Isles and Ireland.

311
We observe the lowest levels of autozygosity in the group of clusters we denote as 'England

316
Ireland shows slightly elevated levels of ROH compared to the majority of England and

317
Scotland, as has been reported previously 29,31 . We demonstrate that the highest levels of ROH in the  that whilst there appears to be some evidence of elevated levels of short ROH in N Scotland, these 334 levels are not as substantial as found in the west of Scotland. We therefore conclude that the 335 observed structure between this cluster and the rest of Scotland is primarily driven by its Northern

336
Isles ancestry, not simply genetic isolation.

375
With these data we performed EEMS analysis using the program runeems_snps, using 10 376 initial independent EEMS runs, each with a different random seed. Each of these initial Markov Chain

383
Using all ten final replicates for input, we plotting the results of our EEMS analysis in the 384 statistical software language R 23 (version 3.5.0), using the custom package rEEMSplots provided by 385 the authors of EEMS. The average estimated effective migration surface is shown in Fig. 2 of the 386 main manuscript. We show below the posterior probability trace log for all ten final chains (Fig. S5a),

387
the observed vs fitted genetic dissimilarities between pairs of demes (Fig. S5b), and the placement of 388 samples to demes over the estimated effective migration surface (Fig. S5c).

389
The ten independent EEMS duplicates appear to have all converged, moving around a similar

445
To our knowledge, we present the first high-density, genome-wide analysis of the genetics of 446

474
S6c). Whilst Isle of Man 2 shows a slightly elevated levels of short ROH, we do not find a significant 475 difference of ROH levels between the two groups (t-test between ROH >1Mb, p = 0.747.

476
To conclude, we find evidence of further structure within the Isle of Man. Unfortunately, 477 without geographic data on the regional origin of the ancestry of our Isle of Man sample we are more 478 limited in our interpretation of these results than the other regions in the British Isles and Ireland that 479 we report. We detect three genetic groups of Isle of Man samples in our analyses.     . We further excluded markers from five regions of high linkage disequilibrium in our 547 dataset; chr2:135.5Mb-137Mb, chr6:0Kb-750Kb, chr6:25.5Mb-33.55Mb, chr8:7.5Mb-120Mb, and 548 chr11:46Mb-57Mbleaving a total of 209,028 common markers.

564
Whilst in the main text we focus on the genetic affinity of these ancient groups and modern 565 genetic regions in Britain and Ireland, we also investigated the individual affinity of each ancient 566 individual to each modern genetic region (Fig. S8). We observe the same trend shown in Figure

617
In the main text of our results we discuss the population genetics across the British Isles and

618
Ireland with respect to ancestry both within the Isles and from Continental Europe. Here we discuss in 619 greater detail the geographic distributions, the ancestries, and the wider historical context of the 620 structure that we describe across Britain and Ireland.

690
Tiree, Colonsay, Bute and Arran will be required to reveal the relationships among them, their degree 691 of isolation and whether they show such fine-scale structure as in the Northern Isles.

692
The Dark Age Kingdoms of north Britain cast long shadows in the genetics of Scots today.

693
The great northeast to southwest genetic divide which we observe in Scotland reflects remarkably

705
DNA will be required to clarify the historical context of these correlations.

706
The availability of well-documented genealogies from the Northern Isles allowed our sampling 707 of grandparents to be based on parishes, rather than council areas as was the case for the majority of 708