How do I convert the three letter amino acid codes to one letter code with python or R?

user1725152 picture user1725152 · Oct 6, 2012 · Viewed 15.8k times · Source

I have a fasta file as shown below. I would like to convert the three letter codes to one letter code. How can I do this with python or R?

>2ppo
ARGHISLEULEULYS
>3oot
METHISARGARGMET

desired output

>2ppo
RHLLK
>3oot
MHRRM

your suggestions would be appreciated!!

Answer

Henk Neefs picture Henk Neefs · Jan 5, 2014

BioPython already has built-in dictionaries to help with such translations. Following commands will show you a whole list of available dictionaries:

import Bio
help(Bio.SeqUtils.IUPACData)

The predefined dictionary you are looking for:

Bio.SeqUtils.IUPACData.protein_letters_3to1['Ala']