Welcome

Bioinformatics is a wide ranging term that covers many aspects of carrying out biological analyses in silico, i.e., on a computer

With the current amount of sequence data being generated by the numerous genome sequencing projects (both eukaryote and prokaryote), it is useful to understand what analyses can be performed and how to access the information in the databases.

The National Center for Biotechnology Information (NCBI) hosts several related resources and databases, and the tools for mining those databases. There are also educational resources at NCBI which are worth investigating.

The purpose of this practical is to familiarise you with the available sequence databases and the tools available for interrogating those databases. This information can then be used for interactive searches to further characterise a gene or protein. The practical is organised as a series of tasks with associated questions for you to answer.

Basic Local Alignment Search Tool (BLAST)

The BLAST suite of tools are among the most frequently used database query programmes, and are available at NCBI. Below is a table with the different BLAST tools and what they do.

Program Description
blastp Compares an amino acid query sequence against a protein sequence database.
blastn Compares a nucleotide query sequence against a nucleotide sequence database.
blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.
tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
tblastx Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. Please note that the tblastx program cannot be used with the nr database on the BLAST Web page because it is computationally intensive.

Here are links to a number of Bioinformatics and genomics web sites:

BLAST Exercise

Before starting, call up the NCBI web site (link above).

Task 1: Accessing Databases using Accession Numbers

DNA Accession # NM_000539

Type in the accession number in the search box provided.

Follow the links to call up the Genbank nucleotide entry. From the Genbank entry identify the following pieces of information:

  1. Name of the gene
  2. Species from which it is derived
  3. Size of the sequence
  4. Function of protein
  5. Any other information you think is useful

Task 2: BLASTn Searching

DNA sequence CCAGACGCCTGGCTTGAAGATCAAGGAGGAGGAGGAAGGCGCGGATGCTGCTGTGCGCTC

A DNA sequence is provided. From the NCBI front page, follow the links to BLAST and then choose Nucleotide BLAST.

Type (or paste) the DNA sequence provided into the box, and select the nucleotide collection (nr/nt) database from the drop down menu from the “Choose Search Set” section, under “Database”.

  1. Take a look at the range of results of the BLAST search and list the accession numbers and the score of the top four hits. (Hint: you can use the “Download” link for obtaining a text file for the Hit table which you can open in Excel). Using the “Edit and Resubmit” link in the top left, repeat the BLAST search, this time selecting the “Reference RNA sequences (refseq_rna)” database from the drop down menu from the “Choose Search Set” section, under “Database”.
  2. How do these results differ from the previous BLAST search results?
  3. Follow the links for the top four hits, and answer the same questions as in task 1. Repeat the BLAST search, again selecting the “Reference RNA sequences (refseq_rna)” database from the drop down menu from the “Choose Search Set” section, under “Database”. Either select to exclude mammals, by typing mammals in Organism box, and check box to exclude, or select vertebrate group (aves/reptiles for example) in the organism box.
  4. What do you now see?

Task 3: Translated BLAST (Nucleotide > Protein)

Taking the same DNA sequence, go to the translated BLAST section on the BLAST page (can choose to “Edit and Resubmit” if you consult table above). Choose the appropriate link to compare the DNA sequence against the protein database.

  1. List the accession numbers and scores of the top four hits
  2. Compare the results with those obtained for task 2. Repeat the BLAST search either excluding mammals or selecting other vertebrates, as before.
  3. What do you now see?

Task 4 Protein BLAST

Peptide DTVTSPQRAGPLAGGVTTFV

Use BLASTP to compare the peptide sequence below against the nr database.

  1. List the accession numbers and scores of the top four hits
  2. Copy the accession number of the top hit, and use to search with BLASTP the non-redundant (nr) database
  3. From the results give a summary of the range of organisms this protein is found in.
  4. Follow the conserved domains link (if available), and include information on the nature of the conserved domains

Note for Incorporation

Please see this answer on Biostars to understand the header issue when downloading a BLAST hit table.