########################### ##### Linkage Scripts ##### ########################### Copyright 2001,2004 Greenwood Genetic Center This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA ************ **Contents** ************ Linkage.pm is the linkage module containing functions to processes the pedfile.dat and datafile.dat. It is required by all the programs included in this script. markerdist.pl is a simple script to output all the markers in a file and the distances between them. It is also a good simple example of how to use the Linkage module. mlink.pl is a script for automating the process of running mlink. It requires lsp, unknown, and mlink. linkmap.pl is a script for automating the process of running linkmap. It requires lsp, unknown, linkmap, and linklods. **************** **INSTALLATION** **************** 1. Uncompress and untar this archive to /usr/local/linkage: # tar -C /usr/local -zxvf linkage_scripts-0.11.tgz 2. Install the following utilities for linkage analysis: Note that these programs are covered by their respective licenses. a) The FASTLINK package by Cottingham et. al. In particular, the utilities unknown, mlink, and linkmap are required. Download: ftp://fastlink.nih.gov/pub/fastlink/ b) The Linkage Auxiliary Programs by Peter Cartwright In particular, the utility lsp is required. Download: ftp://linkage.rockefeller.edu/software/linkage/ c) The Linklods program by Jurg Ott Download: http://fog.bio.unipd.it/pub/fastlink/dos-binaries/ Linklods is a DOS program; however it can be made to run under Linux by using WINE (http://www.winehq.org/). Simply install WINE and copy linklods.exe to the /usr/local/linkage directory. The linklods.sh shell script will run linklods using WINE--you should edit this script, if necessary. Also, the Linkage manual at http://linkage.rockefeller.edu/soft/linkage/ is extremely helpful. 3. After downloading, uncompressing, and untarring the linkage scripts, create a symbolic link to the Linkage module to a perl module directory. To determine the location of the perl module directories for your system, execute the command # perl -e "print @INC" Assuming you would like to use the module directory /usr/lib/perl5/site_perl and the Linkage module is in /usr/local/linkage, execute the command # ln -s /usr/local/linkage/Linkage.pm /usr/lib/perl5/site_perl 4. Create symbolic links to make linklods, mlink.pl, and linkmap.pl executable without the need to type the entire path to these programs # ln -s /usr/local/linkage/linklods.sh /usr/local/bin/linklods # ln -s /usr/local/linkage/mlink.pl /usr/local/bin/mlink.pl # ln -s /usr/local/linkage/linkmap.pl /usr/local/bin/linkmap.pl *********************** **Running the Scripts** *********************** 1. Create a pedfile from your ipedfile and call it "pedin.dat". For example, if your existing file is called "ipedfile2.dat", use the command $ makeped ipedfile2.dat pedin.dat Does your pedigree file contain any loops? (y/n) -> n Do you want probands selected automaticaly? (y/n) -> y 2. Make a copy of your parameter file called "datain.dat". For example, if your existing file is called "datafile2.dat", use the command $ cp datafile2.dat datain.dat 4. Run one (or both) of the scripts. $ mlink.pl $ linkmap.pl mlink.pl creates two files called mlink.txt, which contains the same text outputted to the screen when mlink.pl finishes, and mlink.csv which contains the same data except the file is formatted for a spreadsheet program. Since linkmap.pl typically produces too much data to be outputted to the screen at one time, only the file linkmap.csv is produced. 5. If a script freezes it is probably due to one of the required utilities displaying an error of some sort, try using the debug option: $ mlink.pl DEBUG $ linkmap.pl DEBUG ******************************* **Automating Linkage Analysis** ******************************* The following sections will only be useful if you intend to write your own scripts to do linkage analysis or intend to edit my scripts. **************************************************** **General Strategy for Automating Linkage Analysis** **************************************************** Learn what programs need to be executed to get the results. One good strategy for this is to run lcp to create a shell script that does what you want done. By examining this shell script it is fairly simple to determine what the syntax is for using various programs. Also, examining the scripts I have written is a good starting point. **************************** **Using the Linkage Module** **************************** The Linkage module exports by default the functions parseDatafile() and parsePedfile(). These functions place the data in the datain.dat and pedin.dat into the exported variables %datainfo and @pedinfo. %datainfo is a hash containing the following elements from the datafile. See the Linkage manual for the datafile to determine what each element means. 1st Line of the Datafile $datainfo{"nlocus"} $datainfo{"risklocus"} $datainfo{"sexlink"} $datainfo{"nprogram"} 2nd Line of the Datafile $datainfo{"mutsys"} $datainfo{"mutmale"} $datainfo{"mutfem"} $datainfo{"disequil"} 3rd Line of the Datafile $datainfo{"order"} is a pointer to an array of elements. For example if the first locus on the chromosome is 3 (the first value on line 3), then $datainfo{"order"}->[0] will be 3. The Locus Data The data for the each locus is loaded into a 3D anonymous array. Only Affection Status and Numbered Alleles are supported. For example, if the first locus is a Numbered Allele, then the frequency for the first locus would be determined by if($datainfo{1}->[0]->[0] == 3) { #verify the first locus is a numbered allele print $datainfo{1}->[1]->[0]; } The Line After The Loci In The Datafile $datainfo{"sexdiff"} must be 0 (other values are not supported) $datainfo{"sexintf"} must be 0 (other values are not supported) $datainfo{"recomb"} is a pointer to an anonymous array containing the recombination values between each locus. For example, to find the values between the first and second locus use $datainfo{"recomb"}->[0]; By placing a "#NUM NAME" at the end of a comment for a locus, will cause NAME to be loaded into an anonymous array. To find the name of the first locus, use $datainfo{"names"}->[0] @pedinfo contains all of the pedinfo from the pedfile each entry is a pointer to an array with the the information on each person. For example to find the first person's family number, use $pedinfo[0]->[0] Also, the last two elements of this array are the original family and person numbers determined by the comments produced by makeped.