Molecule Tutorials - Herong's Tutorial Examples - v1.26, by Herong Yang
What Is FASTA
This section provides a quick introduction of FASTA, FastA, a universal file format or representing either a nucleotide sequence or a peptide (protein) sequence, in which base pairs or amino acids are represented using single-letter codes.
What Is FASTA? - FASTA, or FastA, is a file format for representing either a nucleotide sequence or a peptide (protein) sequence, in which base pairs or amino acids are represented using single-letter codes.
FASTA file format was introduced by the FASTA software, which is a DNA and protein sequence search and alignment tool developed by by David J. Lipman and William R. Pearson in 1985.
FASTA is now become a near universal standard in the field of bioinformatics. And it is supported by every bioinformatic tools.
A FASTA file may contain multiple sequences. Each sequence starts with one line to provide an identifier and other information with a prefix of ">" as the first characher of the line. After the identifier line, the sequence data is provided in one or more lines.
As an example, the smallest protein sequence, Trp-Cage, can be written in a FASTA format file as:
>1L2Y_1|Chain A|TC5b|null NLYIQWLKDGGPSSGRPPPS
Here is another example of FASTA files with 2 protein sequences:
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK >1L2Y_1|Chain A|TC5b|null NLYIQWLKDGGPSSGRPPPS
Table of Contents
Molecule Names and Identifications
Peptide, Peptide Bond, Amino Acid Residues
Protein Visualization - Ribbon Diagram
Composed Proteins or Protein Complexes
wwpdb.org - Worldwide PDB (Protein Data Bank)
Nucleobase, Nucleoside, Nucleotide, DNA and RNA
ChEMBL Database - European Molecular Biology Laboratory
PubChem Database - National Library of Medicine
INSDC (International Nucleotide Sequence Database Collaboration)
HGNC (HUGO Gene Nomenclature Committee)