I have a fasta file as shown below. I would like to convert the three letter codes to one letter code. How can I do this with python or R?
>2ppo
ARGHISLEULEULYS
>3oot
METHISARGARGMET
desired output
>2ppo
RHLLK
>3oot
MHRRM
your suggestions would be appreciated!!
BioPython already has built-in dictionaries to help with such translations. Following commands will show you a whole list of available dictionaries:
import Bio
help(Bio.SeqUtils.IUPACData)
The predefined dictionary you are looking for:
Bio.SeqUtils.IUPACData.protein_letters_3to1['Ala']