Trajectory and uniqueness of mutational signatures in yeast mutators

Significance Deficiencies in genome maintenance genes result in increased mutagenesis and genome rearrangements that impact cell viability, species adaptation, and evolvability. The accumulation of somatic mutations is also a landmark of most tumor cells but it remains difficult to retrospectively determine their mechanistic origin(s). Here, we conducted a prospective reciprocal approach to inactivate evolutionary conserved genes involved in various genome maintenance processes and characterize de novo mutations in diploid S. cerevisiae mutation accumulation lines. Our results revealed the diversity, trajectory, complexity, and ultimate uniqueness of the clonal mutational landscapes. Some mutational signatures resemble those found in human tumors.

by Sanger sequencing. RAD51 was subsequently deleted in this strain to form the rad51∆ rev3-D1142A/D1144A double-mutant.
Mutations calling. The detection of de novo mutations was made as described previously (8).
Bioinformatic tools were run into graphical interface of Institut Curie's Galaxy (9) instance (http://galaxy-public.curie.fr). Freebayes (10) was used for SNP (Single Nucleotide Polymorphism), MNP (Multiple Nucleotide Polymorphism), "complex" events (combinations of SNP and indels) and small indels detection. g-deNoise (11) filtering was applied to Freebayes mutations lists to exclude calls located into repeated regions; U-genome (unique) size is 11,060,020 bp and is 76,5% coding (8,466,395 positions of this genome is located in coding sequence). For each strain, Freebayes de novo mutations were defined as mutations that were (i) absent from the parental strain (for the SK1/BY diploids, all WT BY4741 and SK1 ORT7237 Freebayes calls were also excluded), (ii) specific of one mutation accumulation line, (iii) called with good quality (QUAL>30) and sufficient reads depth (at least ~15% of sample mean depth coverage, see Dataset S2) and, (iv) characterized by an allelic ratio >0.4 and <0.6 (heterozygous calls) or >0.9 (homozygous calls). Positions with 2 alleles calls (NUMALT=2) were counted separately (Dataset S6) and were defined as the mutations where (i) each of the 2 calls were absent from the parental clone, (ii) both calls were heterozygous (allelic ratio >0.4 and <0.6), (iii) the sum of the 2 calls had an allelic ratio >0.9, (iv) QUAL>30 and, (v) a depth of coverage above threshold (see above). The mutations occurring in more than one clone but not all are reported in Dataset S5, all with QUAL>30 and mean depth coverage above threshold (see above). The potential mutations occurring in the mitochondrial DNA were not considered. The small variants were annotated by means of SnpEff (12). The depth of coverage and copy number variations were calculated with GATK (13) and Control-FREEC (14), respectively. The structural variants (SV) were detected with Lumpy (15) and Delly (16). .vcf files were merged with SURVIVOR (17)  de novo mutations calling in lineages. The de novo mutations were identified as described above but without any initial filtering on allelic ratio. Only calls found in at least 2 consecutive passages were kept. Calls with allelic ratio ~1/3 or 2/3 in all passages without evidence of gain of coverage were discarded. Allelic ratio of flanking markers were examined for calls with ambiguous allelic ratio along the passages to confirm the heterozygous or homozygous nature of the de novo mutations. Altogether, we eliminated 4 and 20 potential mutations in the tsa1 and rad27 lineages, respectively. The SK1/BY rad27 clone C lineage lacks data of passage 8 because no growing cells were recovered from the plates kept at +4°C. The passages 2 and 3 showed unresolved LOH profiles compared with the whole lineage although some de novo mutations and LOH showed consistency with the following passages.
SK1 markers genotyping. The SK1 markers list was defined from Illumina whole genome sequencing of the SK1/S288c diploid AND1702 (4) (mean depth coverage = 85X) and variants calling method with Freebayes (heterozygous, QUAL>30, depth of coverage>15 and g-deNoise filtering (11)) using the S288c sequence as reference genome. This list was compared with SK1 haploid (ORT7237 (4)) homozygous variant calls and BY4741 variant calls, to keep the 53,523 SNPs common to SK1 but absent from BY4741 (Dataset S11). Genotyping of the hybrid diploid clones was made by examining the mutation calls at those 53,523 polymorphic positions of the genome, after reads mapping on S288c reference genome. A given position is genotyped as "SK1" if the allelic ratio is >0.9; or "BY" if the allelic ratio is 0 or "heterozygous" if the allelic ratio is between 0.4 and 0.6. Low quality calls (QUAL<30) or with low depth of coverage (lower than 15% mean depth coverage of the sample) were discarded from the genotyping analysis.
de novo LOH analysis. Markers genotyping was made as described above for each mutation accumulation lines. Positions that were scored as homozygous in parental strain were discarded.
The LOH regions were robustly defined to include at least 3 consecutives homozygous SK1 markers that exhibited the same allele, high variant calling quality (QUAL>30) and a minimum of ~15% mean depth coverage. The heterozygous regions with ~0.5 allelic ratio (>0.4 and <0.6), the regions with allelic ratio ~1/3 (>0 and £0.4) and the regions with allelic ratio ~2/3 (³0.6 and £0.9) were also identified by 3 consecutive markers. Consecutive LOH regions carrying the same parental alleles were merged. The LOH present in all clones of one parent were eliminated. The LOH were considered to be terminal when the first or last homozygous markers were respectively the first or the last genotyped marker of a chromosome, interstitial if not.
Copy number of LOH was calculated using Control-FREEC (14) and is the copy number at the middle position of the LOH. LOH junctions were defined as the regions between first/last homozygous SK1 polymorphic marker of the LOH and adjacent non homozygous marker.

Fig. S3. Number of mutations in haploid and diploid mutations lines (BY background).
For haploids, mutations counts are sum of SNP, small indels, aneuploidies and SV from Serero et al (2). For diploids, mutations counts are as in Fig. 1C but MNP+complex mutations were not included because they were not scored in haploids. Data were normalized to number of clones, passages and nucleotides.