The Lasso Page is maintained by the inventor of lasso and provides most important references for lasso.
The book Elements of Statistical Learning (pdf) describes the lasso in detail.
Lasso in R: lars: Least Angle Regression, Lasso and Forward Stagewise, and glmnet: Lasso and elastic-net regularized generalized linear models (Note: lars() function from the lars package is probably much slower than glmnet() from glmnet.)
Graphical lasso in R (glasso: Graphical lasso- estimation of Gaussian graphical models)
Joint graphical lasso in R (JGL: Performs the Joint Graphical Lasso for sparse inverse covariance estimation on multiple classes)
I want to compute the P-value from the joint cumulative distribution of an n-dimensional order statistic.
One efficient way is using the following recursive formula.
However, the facts are (or would be):
- I am too stupid to write a recursive function.
- I didn’t find the efficient formula at first.
- In other cases, the efficient formula have not been derived yet, or too complicated to derive.
In Statistics Monte Carlo simulation is a “quick” way to compute some complicated formulas. By saying “quick”, I mean I can see the results without knowing or deriving “ugly” Math formulas. It’s actually a very “slow” method in computing aspect.
Anyway, the R function is here.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
I have been doing some research about co-expression network. “co-expression” means that genes have similar expression profiles across different conditions or tissues. In the network, genes are nodes, and “co-expression” relationship between two genes can be reprensented as edges. The co-expressed genes may involve in similar pathways or biological process.
In a small part of my research, I am testing some algorithms to detect co-expression relationship. One way to test algorithm is simulation. In an ideal (simple) case, the expression values of two co-expressed genes can be considered as bivariate normal distributed. To generate expression values of such gene pair or a group of genes given a correlation coefficient, is just to simulate multivariate normal distribution. MASS library in R has an function, mvrnorm, to do that, but it requires a covariance matrix.
The function below is to firstly generate the covariance matrix in order to use the mvnorm function. Because we only know the correlation coefficient, i.e. co-expression relationship (degree), the mean and variance of each gene’s expression profile are random generated in the function. Then the matrix can be calulated as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
A couple of months ago, I transfered the website engine from wordpress into Octopress. A lot of errors were found in previous posts due to format incompatibility, especially some perl scripts.
Please read carefully before using any code or script, and leave a comment if you find some “terrible” error.
[update-2013-08-08] now only perl howto related posts
[update-2015-01-01] All format issues are (putatively) resolved.
I had been using 000webhost.com to host my website till several days ago when I noticed my website was suspendend for “violating 20%+ CPU usage limit for more than 1000 times.”
The 000wbehost server is good and stable. Most importantly it’s free. Now, I have to transfter to another free and good web hosting service. Github is a good choice. But github does not support wordpress. I tried to transferr the website to github before, but I am not comfortable to write blogs using Markdown.
It’s difficult to find another free service supporting wordpress, and lots of people said the static blogging engine is much better than wordpress. Looks like I will stay here for a while.
- Degree: Phd (=MS+“n” years experience)
- NGS data processing experience
- Biology + Statistics knowledge
- Programming: Statistics (R/Matlab/SAS), Script language (Python/Perl), OOP (C++/Java), Database (SQL)
- Written and oral communication skills
Wiki:23andMe is a privately held personal genomics and biotechnology company based in Mountain View, California that provides rapid genetic testing. The company is named for the 23 pairs of chromosomes in a normal human cell. Their personal genome test kit was named “Invention of the Year” by Time magazine in 2008.
Jobs:Engineering: HPC Systems Administrator
Senior Software Engineer
Storage Systems Architect/Engineer Science: Backend Software Engineer
Health Content Scientist
Statistical Geneticist focusing on Parkinson’s Disease
User Interface Designer
Senior Software Engineer
Senior Computational Biologist
Senior Data Scientist
Senior Applications Support Scientist
Senior Bioinformatics Scientist
(to be continued)
|Name||Machine cost||Read length (bases)||Cost per megabase|
|Illumina MiSeq||US$125,000||500||14–70 cents|
|Illumina HiSeq||US$690,000||300||4–5 cents|
|Ion Torrent PGM||US$49,000||400||60 cents–$5|
|Ion Torrent Proton||US$224,000||200||1–9 cents|