Monday, May 27, 2013

Perl script for Bootstrap analysis of Microarray data

New Blog, First Post!

So I've written perl code for bootstrap resampling and analysis of microarray data. Tested it on classic ALL-AML dataset from book by dov stekel and it worked.

Data file should be in csv format (attached file "allaml.csv"), bootstrap analysis will be done on only first line of the file ( you can  modify it ). Sample data i've attached is a gene expression data from ALL-AML dataset "allaml.xlsx" and "allaml.csv"

1000 random samples will be generated by the script ( you can modify that too ).

For randomization i've used shuffle subroutine from perl module List::Util, i don't know how good random sample it generates ( so use the script on your own risk ).

It will create a csv file named "means.csv" those are the mean difference of all samples.

One more file "meanint.csv" this file count of mean difference value how many times it appear in those 1000 samples first column is for mean difference second is their count. This file can be used for plotting open it in excel and you can plot there.

To check significance you can calculate mean difference of original sample and check whether it falls within confidence interval or not. If it falls within confidence interval then two samples are not significantly different otherwise if it falls outside of confidence interval then samples can be significantly different.

Confidence interval can be between (a) and ( + b) or we say 5% to 95% of mean difference data is not significantly different. Out of this range there is some significant difference. So we have to check where our original sample mean lies.

PS:-

  • Dont't know much about bootstrap analysis so if you find script is wrong please let me know.

  • The script appends data to already existing "means.csv" & "meanint.csv" so if you plan to run script second time in same folder delete alresdy existing files.


Download script from MediaFire -
http://www.mediafire.com/download/ox1du1ua33iyn4p/bs.pl.zip