Seed Linkage
README
In
order to run Seed Linkage in your local machine you need to have installed the
following programs:
·
MySQL server. It can be downloaded
from: http://dev.mysql.com/downloads/mysql/5.1.html.
·
BLAST package. It can be
downloaded from: ftp.ncbi.nih.gov/blast/executables/LASTEST
·
PHP. It can be downloaded
from: http://www.php.net/downloads.php
For
more information about how to set up these programs, please refer to their
respective manuals.
Extracting
Seed Linkage:
1. Download Seed Linkage. It can be downloaded from: http://biodados.icb.ufmg.br/seedlinkage/seedlinkage.1.0.tar.gz
2. After Seed Linkage download, the next step is to unpack:
a. Create a folder to here you want to extract Seed Linkage.
b. Move Seed Linkage package into the recently created folder
c. Type the following command in the terminal/bash: tar –xvzf seedlinkage.1.0.tar.gz
3. Several files and folders should have been created inside the new
folder.
Setting up
the MySQL table:
1. Use an existing MySQL database, or create a
new one. To create a new database issue the following command from MySQL terminal:
a. create database DATABASE_NAME;
2. Enter inside the database. It can be accomplish by the following
command:
a. use DATABASE_NAME;
3. Create a MySQL table with the fields id(text) and txi(text). Use the
following command in order to create the table:
a. create table TABLE_NAME (id (text), txi
(text));
4. The table must be populated with a list of ALL the sequences ids and
sequence tax ids in columns id and txi, respectively.
ALL sequences used by Seed Linkage must be listed on the table. Make sure all
sequences have their tax id listed.
Setting up
BLAST database and Input file:
1. A BLAST database must be formatted to Seed Linkage run properly.
a. Make a FASTA file with all protein sequences that will be used by Seed
Linkage. Make sure ALL sequences ids listed in the MySQL
table are in this FASTA file. The MySQL table ids and
the FASTA file ids must be IDENTICAL.
b. Copy the FASTA file to the database folder, inside the Seed Linkage
folder.
c. Inside the database folder, format the database using BLAST formatdb. It can be accomplished with the following
command:
formatdb –i FASTA_FILE_NAME –p T –o
T;
2. The input file is a simple text file. Just list all sequences ids that
you want as seeds. Each id must be in a single line. DO NOT put more than one
sequence id in each line. The input file
must look like this:
ID_SEQUENCE_ONE
ID_SEQUENCE_TWO
ID_SEQUENCE_THREE
ID_SEQUENCE_............
Editing config.php
1. You must edit the file name config.php, which
is located inside the Seed Linkage program folder. Please, DO NOT edit any other field besides those described here. Changes
elsewhere can cause program malfunction. Editing can be accomplished using any
text editor such as “vi”. The following fields, if not equal default setting
must be changed. Only change the
contents inside the double quotes (“”).
a. $mysql_server="localhost"; Only change this
setting if the MySQL server is located in another
server different from your local machine. If that is the case change the
content “localhost” to your server name.
b. $user="username"; Change username to your MySQL
user name.
c. $pwd=”password”; Change password to your MySQL password or leave this field blanck
if your MySQL user does not require password.
d. $mysql_db="seed_linkage";
Change seed_linkage to your MySQL
database name.
e. $table_mysql=”proteomes”; Change proteomes to
the name of your MySQL table.
f. $multifasta_path=”./databases/proteomes.fasta”; Change only proteomes.fasta
to your fasta file name, (Keep ./databases/) if your
database is located inside the Seed Linkage’s database folder. If your database is located in elsewhere
change the whole path, and name.
g. $blastall_path =”/usr/local/genome/
blast/bin/blastall"; Change the path to the blastall path in your machine.
h. $fastacmd_path="/usr/local/genome/blast/bin/fastacmd";
Change the path to the fastacmd path in your machine.
i. $formatdb_path="/usr/local/genome/blast/bin/formatdb";
Change the path to the formatdb path in your machine.
j. $blast_parameters=" -p blastp -Ff -e 1e-10 -m8"; Change if you want to run
Seed Linkage with different BLAST parameters.
Running
Seed Linkage
To run Seed Linkage type the following command on the
command line:
php batch_seed_linkage.php –i INPUT_FILE –d DATABASE –r VALUE –s VALUE –c VALUE –l VALUE
Parameters
description:
a. r: recursion
b. s: similarity cutoff
c. c: coverage cutoff
d. l: inparalog
distance (as default we use it as 0.3, for further explanation, please refer to
Seed Linkage paper).
Output
A file with the extension “.clusters” will be created.
Identical clusters may have been created during this process. In order to
disambiguate the clusters, another script must be executed. To execute evaluate_similarity.php script, type the following in the
command line:
php evaluate_similarity.php
NAME_OF_THE_.CLUSTERS_FILE
After the execution of evaluate_similarity,
a ”.disambiguated “ file will be create. The
disambiguated should contain three columns:
CLUSTER NUMBER TAX ID SEQUENCE ID
Cluster number refers to the number of the cluster
that this particular sequence belongs, tax id is the taxon
id for that same sequence and sequence id is the sequence id provided.