Hi,
We have sequenced four different mutated forms of a bacteria (Pseudomonas aeruginosa) and task is to find the snps which are exclusive to each sample. Aligned reads to a reference genome as a first attempt. Then did denovo and aligned reads to the denovo assembly (usign SPAdes). In both cases at quite a few locations more than one nucleotides are being supported by many reads. Just to add the fact that reads are only supporting maximum of two nucleotides at quite a few locations, and never three nucleotides. For example below is the read count from igv for a particular location.
"CP000744.1:54,471
Total count: 439 A : 265 (60%, 121+, 144- ) C : 0 G : 0 T : 174 (40%, 98+, 76- ) N : 0
How should we interpret this considering that bacteria are haploid?
First interpretation could be that there was sequencing error and correct sequence was A. Sequencing error to me seems unlikely because of such high numbers (265, 174) and also because same pattern is repeated in other locations as well.
Second interpretation could be there was contamination and more than one type of cells were present in the sample? This may be a possibility but I first want to make sure that I am not missing out on some other reason.
ps: I have asked this question at Biostar as well.
https://www.biostars.org/p/238739/
Thanks for reading post,
Ambi.