Reference Genome Sequence Data File

This section provides a tutorial example on how to download the Reference Genome Sequence Data File, provided by NCBI (National Center for Biotechnology Information.

What Is Reference Genome Sequence Data File? - Reference Genome Sequence Data File in FASTA format that contains reference human genome sequences provided by NCBI (National Center for Biotechnology Information.

Here is what I did to download the Reference Human Genome data file.

1. Get the data file with "curl" command:

herong$ curl ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/\
  GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz > genome.gz

2. Unzip and verify the file.

herong$ gunzip genome.gz

herong$ head -100 genome
>NC_000001.11 Homo sapiens chromosome 1, GRCh38.p13 Primary Assembly
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
...

3. Count number of chromosomes in the data file.

herong$ grep ">NC_" genome_reviews
>NC_000001.11 Homo sapiens chromosome 1, GRCh38.p13 Primary Assembly
>NC_000002.12 Homo sapiens chromosome 2, GRCh38.p13 Primary Assembly
>NC_000003.12 Homo sapiens chromosome 3, GRCh38.p13 Primary Assembly
>NC_000004.12 Homo sapiens chromosome 4, GRCh38.p13 Primary Assembly
>NC_000005.10 Homo sapiens chromosome 5, GRCh38.p13 Primary Assembly
>NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
>NC_000007.14 Homo sapiens chromosome 7, GRCh38.p13 Primary Assembly
>NC_000008.11 Homo sapiens chromosome 8, GRCh38.p13 Primary Assembly
>NC_000009.12 Homo sapiens chromosome 9, GRCh38.p13 Primary Assembly
>NC_000010.11 Homo sapiens chromosome 10, GRCh38.p13 Primary Assembly
>NC_000011.10 Homo sapiens chromosome 11, GRCh38.p13 Primary Assembly
>NC_000012.12 Homo sapiens chromosome 12, GRCh38.p13 Primary Assembly
>NC_000013.11 Homo sapiens chromosome 13, GRCh38.p13 Primary Assembly
>NC_000014.9 Homo sapiens chromosome 14, GRCh38.p13 Primary Assembly
>NC_000015.10 Homo sapiens chromosome 15, GRCh38.p13 Primary Assembly
>NC_000016.10 Homo sapiens chromosome 16, GRCh38.p13 Primary Assembly
>NC_000017.11 Homo sapiens chromosome 17, GRCh38.p13 Primary Assembly
>NC_000018.10 Homo sapiens chromosome 18, GRCh38.p13 Primary Assembly
>NC_000019.10 Homo sapiens chromosome 19, GRCh38.p13 Primary Assembly
>NC_000020.11 Homo sapiens chromosome 20, GRCh38.p13 Primary Assembly
>NC_000021.9 Homo sapiens chromosome 21, GRCh38.p13 Primary Assembly
>NC_000022.11 Homo sapiens chromosome 22, GRCh38.p13 Primary Assembly
>NC_000023.11 Homo sapiens chromosome X, GRCh38.p13 Primary Assembly
>NC_000024.10 Homo sapiens chromosome Y, GRCh38.p13 Primary Assembly
>NC_012920.1 Homo sapiens mitochondrion, complete genome

Table of Contents

 About This Book

 Introduction of Molecules

 Molecule Names and Identifications

 Molecule Mass and Weight

 Protein and Amino Acid

 Nucleobase, Nucleoside, Nucleotide, DNA and RNA

 Gene and Chromosome

 Protein Kinase (PK)

 DNA Sequencing

 Gene Mutation

 SDF (Structure Data File)

 PyMol Installation

 PyMol GUI and CLI

 PyMol Selections

 PyMol Editing Functions

 PyMol Measurement Functions

 PyMol Movie Functions

 PyMol Python Integration

 PyMol Object Functions

 ChEMBL Database - European Molecular Biology Laboratory

 PubChem Database - National Library of Medicine

 PDB (Protein Data Bank)

INSDC (International Nucleotide Sequence Database Collaboration)

 What Is INSDC

Reference Genome Sequence Data File

 RefSeq Proteins of Human Genome

 HGNC (HUGO Gene Nomenclature Committee)

 Relocated Tutorials

 Resources and Tools

 Molecule Related Terminologies

 References

 Full Version in PDF/EPUB