Tuesday, 3 July 2012

One of the most commonly used software for Linkage Disequilibrium (LD) analysis and for the relevant graphical visualization is Haploview. LD analysis in Haploview requires two different input files. The first one is a ped file. Depending on the type of analysis you are doing, this file can be created in five different formats, considered as standard in the scientific community (linkage, HapMap, ecc.). The second input file includes information on the genetic markers (name and position) that are analyzed in the study and then included in the first one. LINKAGE format is the most used locus file format. It includes the following information: Pedigree name, Individual ID, Father's ID, Mother's ID, Sex, Affection status, Marker genotypes (two columns for each marker). The generation of LINKAGE format files is very cumbersome and requires a significant amount of manual file manipulation, especially when high-dimensional datasets should be processed. In order to facilitate and automate the tedious process of creating these files, people that use R for the analysis of genetic data can now take advantage of a function, makeHaploviewInputFile, from the HapEstXXR package. This function takes as input parameters (obviously): (i) the information required to create the ped file (Pedigree name, Individual ID, Father's ID, Mother's ID, Sex, Affection status, Marker genotypes); (ii) the information required to create the marker file (name and position); (iii) the targets of the linkage and of the marker Information file required from Haploview and returned from this function. The next example shows an application of this function using two simulated datasets, one for the ped file and the other one for the marker file. We suppose that these two datasets are both in csv format. A copy of the dataset is shown below:
This dataset contains data on 10 individuals that have genotype information at 4 loci (SNP1-SNP4) in base-call format (SNPs in columns, samples in rows). As it regards the marker file, a copy of this dataset is shown below:


The first thing to do is to import these datasets in R:

myPedfile<-read.csv(“myPedFile.csv”,T) 
myMarkers<-read.csv(“myMarkersFile.csv”,T)


Then, we convert the genotypes data included in the first dataset into “snp” objects by using the setupSNP function of SNPassoc package:

myGeno<-setupSNP(myPedfile,7:10 , sep="")

Before using the makeHaploviewInputFile function we need to convert the genotypes into gene contents (the number of copies of a particular allele in a genotype is referred to as the gene content). In fact, the genotype matrix used by the makeHaploviewInputFile function is in a numeric format. The additive function of SNPassoc package takes “snp” object (myGeno) as an input argument and returns a numerical variable 0, 1, 2 based on copies of the minor allele. By executing this command on each locus (column), we obtain the numeric format of myGeno. An example of R code to obtain these numeric dataset is shown below:

numericDataset<-data.frame(matrix(NA,nrow(myGeno),ncol(myGeno)))
for (i in 1:ncol(myGeno)){

numericDataset[,i]<-additive(myGeno[,i])

}

names(numericDataset)<-names(myGeno)

Since in the genotype matrix used by the makeHaploviewInputFile function subjects homozygous for the major (most frequent) allele are coded as 1, those homozygous for the minor (less frequent) allele are coded as 2 and the heterozygous subjects as 3, we convert this dataframe in the required coding scheme:


geno<-data.frame(matrix(NA,nrow(numericDataset),ncol(numericDataset)))
for (i in 1:ncol(numericDataset)){
geno[which(numericDataset [,i]==1),i]<-3
geno[which(numericDataset [,i]==0),i]<-2
geno[which(numericDataset [,i]==2),i]<-1
}
names(geno)<-names(numericDataset)


We can now define the input arguments for the makeHaploviewInputFile function:


PD<- myPedfile $Family.ID
ID<- myPedfile $ID
FID<- myPedfile$FA.Ther.ID
MID<- myPedfile$Mother.ID
Sex<- myPedfile$Sex
AS<- myPedfile$AffectionStatus
SNPID<- myMarkerFile[,1]
Coordinate<- myMarkerFile[,2]


Finally, using the makeHaploviewInputFile function we obtain the two required input files:

makeHaploviewInputFile(PN,ID,FID,MID,Sex,AS,geno, SNPID, Coordinate,"pedfile","markerfile")