Biopython: Cant use .count() for biopython
My goal here is to receive the amount of time 'g' appears in a DNA sequence.
I imported a DNA sequence via Biopython using list comprehension
seq = [record for record in SeqIO.parse('sequences/hiv.gbk.rtf', 'fasta')]
I then tried using the .count() method on the newly created list comp variable
I get an error that reads
NotImplementedError: SeqRecord comparison is deliberately not implemented. Explicitly compare the attributes of interest.
Anyone know what the dealio is? Biopython's manual says all standard python methods should work.
You are trying to apply
count to a list. You would to need to apply it to the sequence of each element, e.g.
or if you want to get the sum of all sequences
print(sum([s.seq.count('g') for s in seq]))
Here is a minimal working example
from Bio import SeqIO txt = """>gnl|TC-DB|O60669|2.A.1.13.5 Monocarboxylate transporter 2 - Homo sapiens (Human). MPPMPSAPPVHPPPDGGWGWIVVGAAFISIGFSYAFPKAVTVFFKEIQQIFHTTYSEIAW >gnl|TC-DB|O60706|3.A.1.208.23 ATP-binding cassette sub-family C member 9 OS=Homo sapiens GN=ABCC9 PE=1 SV=2 MSLSFCGNNISSYNINDGVLQNSCFVDALNLVPHVFLLFITFPILFIGWGSQSSKVQIHH >gnl|TC-DB|O60721|3.A.1.208.23 Sodium/potassium/calcium exchanger 1 OS=Homo sapiens GN=SLC24A1 PE=1 SV=1 MGKLIRMGPQERWLLRTKRLHWSRLLFLLGMLIIGSTYQHLRRPRGLSSLWAAVSSHQPI >gnl|TC-DB|O60779|2.A.1.13.5 Thiamine transporter 1 (THTR-1) (ThTr1) (Thiamine carrier 1) (TC1) - Homo sapiens (Human). MDVPGPVSRRAAAAAATVLLRTARVRRECWFLPTALLCAYGFFASLRPSEPFLTPYLLGP""" filename = 'sequences.fa' with open(filename, 'w') as f: f.write(txt) seqs = [record for record in SeqIO.parse(filename, 'fasta')] print(sum([s.seq.count('P') for s in seqs])) >>> 21 print(seqs.seq.count('P')) >>> 9