Seed Linkage README

 

In order to run Seed Linkage in your local machine you need to have installed the following programs:

·       MySQL server.  It can be downloaded from: http://dev.mysql.com/downloads/mysql/5.1.html.

·       BLAST package. It can be downloaded from: ftp.ncbi.nih.gov/blast/executables/LASTEST

·       PHP. It can be downloaded from: http://www.php.net/downloads.php

For more information about how to set up these programs, please refer to their respective manuals.

 

Extracting Seed Linkage:

1.     Download Seed Linkage. It can be downloaded from: http://biodados.icb.ufmg.br/seedlinkage/seedlinkage.1.0.tar.gz

2.     After Seed Linkage download, the next step is to unpack:

a.     Create a folder to here you want to extract Seed Linkage.

b.    Move Seed Linkage package into the recently created folder

c.     Type the following command in the terminal/bash: tar –xvzf seedlinkage.1.0.tar.gz

3.     Several files and folders should have been created inside the new folder.

 

Setting up the MySQL table:

1.     Use an existing MySQL database, or create a new one. To create a new database issue the following command from MySQL terminal:

a.     create database DATABASE_NAME;

2.     Enter inside the database. It can be accomplish by the following command:

a.     use DATABASE_NAME;

3.     Create a MySQL table with the fields id(text) and txi(text). Use the following command in order to create the table:

a.     create table TABLE_NAME (id  (text), txi (text));

4.     The table must be populated with a list of ALL the sequences ids and sequence tax ids in columns id and txi, respectively. ALL sequences used by Seed Linkage must be listed on the table. Make sure all sequences have their tax id listed.

 

Setting up BLAST database and Input file:

1.     A BLAST database must be formatted to Seed Linkage run properly.

a.     Make a FASTA file with all protein sequences that will be used by Seed Linkage. Make sure ALL sequences ids listed in the MySQL table are in this FASTA file. The MySQL table ids and the FASTA file ids must be IDENTICAL.

b.    Copy the FASTA file to the database folder, inside the Seed Linkage folder.

c.     Inside the database folder, format the database using BLAST formatdb. It can be accomplished with the following command:

formatdbi FASTA_FILE_NAME –p T –o T; 

2.     The input file is a simple text file. Just list all sequences ids that you want as seeds. Each id must be in a single line. DO NOT put more than one sequence id in each line.  The input file must look like this:

ID_SEQUENCE_ONE

ID_SEQUENCE_TWO

ID_SEQUENCE_THREE

ID_SEQUENCE_............

 

Editing config.php

1.     You must edit the file name config.php, which is located inside the Seed Linkage program folder. Please, DO NOT edit any other field besides those described here. Changes elsewhere can cause program malfunction. Editing can be accomplished using any text editor such as “vi”. The following fields, if not equal default setting must be changed.  Only change the contents inside the double quotes (“”).

a.     $mysql_server="localhost";  Only change this setting if the MySQL server is located in another server different from your local machine. If that is the case change the content “localhost” to your server name. 

b.    $user="username"; Change username to your MySQL user name.

c.     $pwd=”password”; Change password to your MySQL password or leave this field blanck if your MySQL user does not require password.

d.    $mysql_db="seed_linkage"; Change seed_linkage to your MySQL database name.

e.     $table_mysql=”proteomes”; Change proteomes to the name of your MySQL table.

f.      $multifasta_path=”./databases/proteomes.fasta”; Change only proteomes.fasta to your fasta file name, (Keep ./databases/) if your database is located inside the Seed Linkage’s database folder.  If your database is located in elsewhere change the whole path, and name.

g.     $blastall_path =”/usr/local/genome/ blast/bin/blastall"; Change the path to the blastall path in your machine.

h.    $fastacmd_path="/usr/local/genome/blast/bin/fastacmd"; Change the path to the fastacmd path in your machine.

i.      $formatdb_path="/usr/local/genome/blast/bin/formatdb"; Change the path to the formatdb path in your machine.

j.      $blast_parameters=" -p blastp -Ff -e 1e-10 -m8"; Change if you want to run Seed Linkage with different BLAST parameters.

 

Running Seed Linkage

To run Seed Linkage type the following command on the command line:

 

php batch_seed_linkage.phpi INPUT_FILE –d DATABASE –r VALUE –s  VALUE –c VALUE –l VALUE

 

Parameters description:

a.     r: recursion

b.    s: similarity cutoff

c.     c: coverage cutoff

d.    l: inparalog distance (as default we use it as 0.3, for further explanation, please refer to Seed Linkage paper).

 

 

Output

 

A file with the extension “.clusters” will be created. Identical clusters may have been created during this process. In order to disambiguate the clusters, another script must be executed. To execute evaluate_similarity.php script, type the following in the command line:

 

php evaluate_similarity.php NAME_OF_THE_.CLUSTERS_FILE

 

After the execution of evaluate_similarity, a ”.disambiguated “ file will be create. The disambiguated should contain three columns:

 

CLUSTER NUMBER                TAX ID             SEQUENCE ID

 

Cluster number refers to the number of the cluster that this particular sequence belongs, tax id is the taxon id for that same sequence and sequence id is the sequence id provided.