Readme for NCBI blast ftp site
                       Last updated on February 15, 2004
This file lists the subdirectories and files found on the NCBI BLAST 
ftp site (ftp://ftp.ncbi.nlm.nih.gov/blast/).  It provides the basic 
information on file content, and on how the files should be used. 
1. Introduction
NCBI BLAST ftp site provides standalone blast, client server blast, 
and wwwblast packages for different platforms.  It also provides 
commonly used blast databases in preformatted as well as FASTA format. 
Some documents on the blast executables and other related subjects are
also provided.
2. File list and content
A description of the files are listed in the tables below, one table 
for each directory or subdirectory.
 
2.1 ftp://ftp.ncbi.nlm.nih.gov/blast/ directory content
The blast ftp directory contains several subdirectories each for a 
specific set of files.  
+------------------+-------------------------------------------------+
|Name              |Content                                          |
+------------------+-------------------------------------------------+
blastftp.txt        this file
db                  subdirectory with database, in preformatted or 
                      FASTA form
demo                demonstration programs and documents from blast 
                      developers
documents           documents for programs in standalone blast, 
                      netblast, and wwwblast programs
executables         archives for binary distribution of blast programs
matrices            protein and nucleotide score matrices, only a 
                      subset are supported by blast
temp                temporary directory for miscellaneous files
+------------------+-------------------------------------------------+
2.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/ subdirectory
Databases larger than two gigabytes (2 GB) are formatted in multiple 
volumes, which are named using the “database.##.tar.gz” convention. 
All relevant volumes are required. An alias file is provided so that 
the database can be called using the alias name without the extension 
(.nal or .pal). For example, to call est database, simply use “–d est” 
option in the commandline (without the quotes). 
Certain databases are subsets of a larger parental database. For those 
databases, mask files, rather than actual databases, are provided. The
mask file needs the parent database to function properly. The parent 
databases should be generated on the same day as the mask file. For 
example, to use swissprot preformatted database, swissprot.tar.gz, one 
will need to get the nr.tar.gz with the same date stamp.
To use the preformatted blast database file, first inflate the file 
using gzip (unix, linux), WinZip (window), or StuffIt Expander (Mac), 
then extract the component files out from the resulting tar file using 
tar (unix, linux), WinZip (Window), or StuffIt Expander (Mac). The 
resulting files are ready for BLAST. 
+---------------------+----------------------------------------------+
|Name                 |Content                                       |
+---------------------+----------------------------------------------+
FASTA                  subdirectory with databases in FASTA format
blastdb.txt            content list of the blast database
est.00.tar.gz          first volume of the est database
est.01.tar.gz          second volume of the est database
est.02.tar.gz          third volume of the est database
                       all volumes are needed to reconstitute 
                         complete est database 
est_human.tar.gz       human est database, a mask file requires both 
                         volumes of est to work
est_mouse.tar.gz       mouse est database, a maks file needs both 
                         volumes of est to work
est_others.tar.gz      est database without human/mouse entries, a
                         mask file reqires both volumes of est
gss.tar.gz             genomic survery sequence database
htgs.00.tar.gz         first volume of the htgs database
htgs.01.tar.gz         second volume of the htgs database
htgs.02.tar.gz         all volumes are needed to reconstitute
htgs.03.tar.gz           complete htgs database 
human_genomic.tar.gz   human chromosome database containing 
                         concatenated contigs with adjusted gaps 
                         represented by N's
nr.tar.gz              non-redundant protein database
nt.00.tar.gz           first volume of the nucleotide nr database
nt.01.tar.gz           second volume of the nucleotide nr database
nt.02.tar.gz           all volumes are needed to reconstitute
                         complete nt database 
other_genomic.tar.gz   chromosome database for organisms other than 
                         human
pataa.tar.gz           patent protein database
patnt.tar.gz           patent nucleotide database
pdbaa.tar.gz           protein sequence database for pdb entries. It
                         is mask file and requires nr.tar.gz
pdbnt.tar.gz           nucleotide sequence database for pdb entries. 
                         They are not coding sequences for the 
                         corresponding protein structure entries!
sts.tar.gz             sequence tag site database
swissprot.tar.gz       swissprot sequence database, last major
                         release. It is mask file and requires 
                         nr.tar.gz to work properly
taxdb.tar.gz           taxonomy id database for use with new version 
                         of blast database (not fully implemented yet)
wgs.00.tar.gz          first volume of wgs assembly database
wgs.01.tar.gz          second volume of the wgs assembly database.
wgs.02.tar.gz          third volume of the wgs assembly database.
wgs.03.tar.gz          fourth volume of the wgs assembly database.
wgs.04.tar.gz          fifth volume of the wgs assembly database.
wgs.05.tar.gz          sixth volume of the wgs assembly database.
                         all volumes are needed.
+--------------------+-----------------------------------------------+
2.2.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA 
subdirectory
he FASTA database files are now stored in this subdirectory, it does 
contain some additional databases that are not available via the NCBI 
BLAST pages. Due to file size issues, the full est database is not 
provided. One needs to get the three subsets and concatenate them 
together to get the complete est database.
These databases will need to be formatted using formatdb program found 
in the standalone blast executable package.  The recommended 
commandlines to use are:
        formatdb –i input_db –p F –o T          for nucleotide  
                
        formatdb –i input_db –p T –o T          for protein
For additional information on formatdb, please see the formatdb.txt 
document under /blast/documents/ directory.
+------------------+--------------------------------------------------+
|Name              |Content                                           |
+------------------+--------------------------------------------------+
 alu.a.gz             proteins translated from alu.n 
 alu.n.gz             alu repeat sequences
 
 drosoph.aa.gz        Drosophila protein from genome annotation 
 drosoph.nt.gz        Drosophila genome
 ecoli.aa.gz          E.coli K-12 proteins from genome annotation 
 ecoli.nt.gz          E.coli K-12 genomic contigs
 est_human.gz         human subset of the est database
 est_mouse.gz         mouse subset of the est database
 est_others.gz        subset of est other than human or mouse entries
 gss.gz               Genomic Survey Sequences (mostly BAC ends) 
 htgs.gz              High Throughput Genomic Sequences
 human_genomic.gz     Human chromosomes formed by concatenating genomic 
                        contig assemblies (NT_######) and adjusting the 
                        gaps with N’s
 igSeqNt.gz           Immunoglobulin nucleotide sequences
 igSeqProt.gz         Immunoglobulin protein sequences
 mito.aa.gz           protein from the annotated mitochondrial genomes
 mito.nt.gz           mitochondrial genomes
month.aa.gz 
protein
                        sequences released or updated in the past 30 days
 month.est_human.gz   human subset of EST released/updated in the past 
                        30 days
 month.est_mouse.gz   mosue subset of EST released/updated in the past 
                        30 days
 month.est_others.gz  EST, wihtout entries from human or mouse, released
                       or updated in the past 30 days
 month.gss.gz         gss entries released/updated in the past 30 days 
 month.htgs.gz        htgs entries released/updated in the past 30 days
 month.nt.gz          subset of nt released/updated in the past 30 days
 nr.gz                non-redundant protein sequence database
 nt.gz                nucleotide database from GenBank excluding the
                        batch division htgs, est, gss,sts, pat divisions, 
                        and wgs entries.  Not non-redundant.
 other_genomic.gz     Chromosome entries other than human
 pataa.gz             Patent protein sequence database 
 patnt.gz             Patent nucleotide sequence database 
 pdbaa.gz             protein sequences for pdb entries
 pdbnt.gz             nucleotide entries for pdb entries.  They are NOT
                        the coding sequence forthe corresponding
                        protein entries
 sts.gz               Sequence Tag Sites database 
 swissprot.gz         swissprot database, last major release
 vector.gz            vector sequences from synthetic (syn) division
                        of GenBank
 wgs.gz               Whole Genome Shotgun sequence assembly
 yeast.aa.gz          protein translations from yeast genome annotation 
 yeast.nt.gz          yeast genomic sequence
+------------------+----------------------------------------------------+
2.3 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/demo/ directory
This directory contains some technical presentations from the BLAST 
developers along with some demo tools or documentation relevant to BLAST.
+------------------------+-----------------------------------------------+
|Name                    |Content                                        |
+------------------------+-----------------------------------------------+
 README.blast_demo         readme for blast_demo package
 README.first              readme for this directory
 README.parse_blast_xml    readme for parse_blast_xml package
 blast_demo.tar.gz         blast_demo package on blast db, blast object, 
                             and reformating blast alignment from 
                             blastobj file
 blast_exercises.doc       blast exercise questions answers
 blast_programming.ppt     PowerPoint presentation on BLAST programing
 blast_talk.ppt            PowerPoint presentation (O'Reilly conference)
 ieee_blast.final.ppt      PowerPoint presentation (IEEE conference)
 ieee_talk.pdf             Above IEEE presentation in PDF format
 parse_blast_xml.tar.gz    demo package on parsing xml styled blast output
 splitd.ppt                PowerPoint presentation on NCBI BLAST server’s 
                             splitd implementation
 test_suite.tar.gz         test package
+------------------------+-----------------------------------------------+
2.4 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/documents/ directory
This directory contains copies of the documentation on different BLAST 
programs distributed from this ftp site under the /blast/executables/ 
directory. blast.txt also contains detailed release history.
+------------------------+-----------------------------------------------+
|Name                    |Content                                        |
+------------------------+-----------------------------------------------+
 blast.txt                 readme for blastall and blastpgp
 blastclust.txt            readme for blastclust
 developer                 subdirectory with additional documentation
 blast_seqalign.txt        describing seqalign function
 readdb.txt                describing readdb function
 urlapi.txt                a short introduction on BLAST URL API which 
                             supersedes the blasturl
 formatdb.txt              readme for formatdb program
 impala.txt                readme for impala
 megablast.txt             readme for megablast
 netblast.txt              readme for netblast (blastcl3)
 rpsblast.txt              readme for rpsblast
 xml                       subdirectory with .dtd and .mod field 
                             description files for blast xml output
 xml/NCBI_BlastOutput.dtd  dtd file for blast xml output
 xml/NCBI_BlastOutput.mod  mod file for blast xml output
 xml/NCBI_Entity.mod       mod file for NCBI xml file
 xml/README.blxml          readme on blast xml output
+------------------------+-----------------------------------------------+
2.5 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/ 
directory
This directory contains several subdirectories each for a specific 
subsets of executable BLAST programs:
/LATEST-BLAST subdirectory contains the standalone blast binaries from 
        the latest major versioned release.
/LATEST-NETBLAST sudirectory contains the netblast binaries from the
        latest major versioned release. 
/LATEST-WWWBLAST subdirectory contains the wwwblast binaries from the
        latest major versioned release.
/release different releases, with the last one linked to LATEST 
        directories
/snapshot subdirectory contains patches or intermediate updates put up in 
        between major releases. For previous releases, go to release 
        subdirectory, where the old major releases are archived back to 
        version 2.0.10.
2.5.1 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST-BLAST,
 /LATEST-NETBLAST, and /LATEST-WWWBLAST subdirectories
All these three subdirectories link to the latest release directory, 
which contains the standalone BLAST executables package (blast initialed 
archives), blastcl3 client (netblast initialed archives), and server blast 
(wwwblast initialed archives).  
The standalone archive is needed to set up BLAST locally on user's own 
machine. It also provides the tools necessary to prepare custom databases 
and retrieve sequences from these prepared databases.  Different archives 
for commonly used platforms are available. 
The blast client archive contains the blastcl3 program which functions by 
formulating BLAST search locally first and forwarding the search to NCBI 
blast server for process. The search results returned by NCBI BLAST server 
is saved to an user-specified file on local computer disk.  
The server blast archive contains the web pages with embedded blast search 
forms similar to that of NCBI that can process the BLAST search request against 
local set of databases and return the result to a browser window. wwwblast 
is now in sync with the NCBI toolkit and the two above two packages.
+------------------------------------+-------------------------------+
|Name                                |Content                        |
+------------------------------------+-------------------------------+
 MD5SUM.txt
 blast-2.2.8-alpha-osf1.tar.gz        Standalone for COMPAQ/HP alpha 
                                       machine (OSF 5.1 and above)
 blast-2.2.8-amd64-linux.tar.gz       Standalone for AMD 64-bits PC 
                                        running linux
 blast-2.2.8-ia32-freebsd.tar.gz      Standalone for intel Pentium PC
                                        running freeBSD 
 blast-2.2.8-ia32-linux.tar.gz        Standalone for intel Pentium PC 
                                        running Linux
 blast-2.2.8-ia32-win32.exe           Standalone for intel Pentium PC 
                                        running Windows 
 blast-2.2.8-ia64-linux.tar.gz        Standalone for intel Itanium PC 
                                        running Linux
 blast-2.2.8-mips-irix-32-bit.tar.gz  Standalone for 32-bits SGI
 blast-2.2.8-mips-irix.tar.gz         Standalone for 64-bits SGI
 blast-2.2.8-powerpc-macosx.tar.gz    Standalone for MacOSX (terminal)
 blast-2.2.8-sparc-solaris.tar.gz     Standalone for Sun Sparc station 
                                         running Solaris 
 netblast-2.2.8-alpha-osf1.tar.gz     netblast for COMPAQ/HP alpha
                                        machine (OSF 5.1 and above)
 netblast-2.2.8-amd64-linux.tar.gz    netblast for AMD 64-bits PC
                                        running Linux
 netblast-2.2.8-ia32-freebsd.tar.gz   netblast for intel Pentium PC
                                        running freeBSD
 netblast-2.2.8-ia32-linux.tar.gz     netblast for intel Pentium PC 
                                         running Linux
 netblast-2.2.8-ia32-win32.exe        netblast for for intel Pentium
                                         PC running Windows 
 netblast-2.2.8-ia64-linux.tar.gz     netblast for for intel Itanium PC
                                         running Linux
 netblast-2.2.8-mips-irix.tar.gz      netblast for SGI 32-bits system 
 netblast-2.2.8-powerpc-macosx.tar.gz netblast for MacOSX
 netblast-2.2.8-sparc-solaris.tar.gz  netblast for Sun Sparc station
                                        running Solaris 
 wwwblast-2.2.8-alpha-osf1.tar.gz     wwwblast for COMPAQ/HP alpha
                                        machine (OSF 5.1 and above)
 wwwblast-2.2.8-amd64-linux.tar.gz    wwwblast for AMD 64-bits PC 
                                        running Linux
 wwwblast-2.2.8-ia32-freebsd.tar.gz   wwwblast for Intel Pentium PC 
                                        running Linux
 wwwblast-2.2.8-ia32-linux.tar.gz     wwwblast for Intel Pentium PC
                                        running Linux
 wwwblast-2.2.8-ia64-linux.tar.gz     wwwblast for Intel Itanium PC
                                        running Linux
 wwwblast-2.2.8-mips-irix.tar.gz      wwwblast for SGI 32-bits system
 wwwblast-2.2.8-powerpc-macosx.tar.gz wwwblast for MacOSX
 wwwblast-2.2.8-sparc-solaris.tar.gz  wwwblast for Sun Sparc station 
                                        running Solaris 
+------------------------------------+-------------------------------+
2.5.2 File content for ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release 
subdirectory
This directory contains past major releases of BLAST, as far back as 
version 2.0.10. Each release is in its own subdirectory. 
2.5.3 File content for ftp.ncbi.nlm.nih.gov/blast/executables/snapshot 
subdirectory
This subdirectory contains intermediate enhanced or patched archives 
released after the last major release.  They are organized according 
to the date and only contains the binaries for the affected platforms.
2.5.4 File content for ftp.ncbi.nlm.nih.gov/blast/executables/special 
subdirectory
From time to time, we make binaries for some rare platforms under 
special circumstances.  Those files are archived here.
2.6 File content ftp://ftp.ncbi.nlm.nih.gov/blast/matrices directory
This directory contains the scoring matrices, which are files that can 
be used by BLAST alignment assessment.  The file are text files with 
special format that can be viewed directly by a browser.
For valid statistical analysis, blastn uses only identity matrix and 
blastp only supports a limited subset of the BLOSUM and PAM matrices: 
BLOSUM 45, 62, 80, plus PAM30 and 70.
2.7 File content of the ftp://ftp.ncbi.nlm.nih.gov/blast/temp 
subdirectory
An left-over subdirectory of miscellaneous files or tools. 
3. Techinical Support
Additional questions/comments on this ftp site should be directed to 
NCBI blast-help group at:
        blast-help@ncbi.nlm.nih.gov
Other questions on general NCBI resources should be directed to:
        info@ncbi.nlm.nih.gov