[Paper]Full-length Transcriptome Assembly From RNA-Seq Data Without a Reference Genome

May 22nd, 2011 12:00 am | Comments

http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1883.html

Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

[Paper]De Novo Assembly and Analysis of RNA-seq Data

May 22nd, 2011 12:00 am | Comments

http://www.nature.com/nmeth/journal/v7/n11/full/nmeth.1517.html

We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.

GATK: The Genome Analysis Toolkit

May 22nd, 2011 12:00 am | Comments

The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it’s a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.

We aim to work well with both samtools and Picard by providing complementary tools to those available in those two packages. Our SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus, and have been pushing to make these capabilities as general-purpose and powerful as possible. My group’s mandate is to ensure the success of the human medical resequencing projects we’ve undertaken at the Broad over the next 2-3 years, which involves providing a robust, production-quality development library that underlies tools for common analysis problems (like SNP calling) as well as enabling exploratory research on NGS data.

Take a look at File:CBBO 100709 v3.pptx.pdf to view a presentation that provides an introduction to some of the capabilities of the GATK and its application to the 1000 Genomes project.

Transfer Protein Alignment to DNA Alignment by Bioperl

May 20th, 2011 12:00 am | Comments

aaln_to_daln.pl

# it needs two files as input,
# 1) protein alignment result
# 2)cds sequences of the proteins
use strict;
use Bio::SeqIO;
use Bio::AlignIO;
use Bio::Align::Utilities qw(aa_to_dna_aln);
my $alignio = Bio::AlignIO->new(-format => 'NEXUS',
-file => 'pro.nxs');

my $aa_aln = $alignio->next_aln;

my $seqdata= 'cds.fa';
my $seqio = new Bio::SeqIO(-file => $seqdata,
-format => 'fasta');
my %seqs;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
$seqs{$seq->display_id} = $seq;
}

my $dna_aln = aa_to_dna_aln($aa_aln, %seqs);

my $out = Bio::AlignIO->new(-file => ">cds.phylip" , '-format' => 'NEXUS');
$out ->write_aln($dna_aln);

KEGG: Kyoto Encyclopedia of Genes and Genomes

Apr 5th, 2011 12:00 am | Comments

The increasing amount of genome sequence data is the basis for understanding life as a molecular system and for developing medical, pharmaceutical, and other practical applications. Since 1995 we have been developing knowledge-based methods for uncovering higher-order systemic behaviors of the cell and the organism from genomic and molecular information. The reference knowledge is stored in KEGG, Kyoto Encyclopedia of Genes and Genomes, and associated bioinformatics technologies are being developed both for basic research and practical applications.

Example for searching KEGG with Perl:

See this for more information.

kegg.pl

#!/usr/bin/perl
use SOAP::Lite;
$wsdl   = 'http://soap.genome.jp/KEGG.wsdl';
$serv   = SOAP::Lite->service($wsdl);
$offset = 1;
$limit  = 5;
$top5   = $serv->get_best_neighbors_by_gene('vvi:100261203', $offset, $limit);
foreach $hit (@{$top5}) {
 print "$hit->{genes_id1}\t$hit->{genes_id2}\t$hit->{sw_score}\n";
}

InterPro

Apr 5th, 2011 12:00 am | Comments

InterPro is an integrated database of predictive protein “signatures” used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.

Search InterPro with Perl: iprscan_soaplite.pl (SOAP::Lite) iprscan_xmlcompile.pl (XML::Compile::SOAP)
usage:

perl iprscan_soaplite.pl Seqfile --email mymai@mymail.com

see this for more information.

google.pl - Command Line Google Search in a Shell With Perl

Apr 5th, 2011 12:00 am | Comments

google.pl

#!/usr/bin/perl
# google.pl - command line tool to search google
# 2009 by Stefan Grothkopp, this code is public domain use it as you wish!

use LWP::Simple;
use Term::ANSIColor;

# change this to false for b/w output
$use_color = true;
#result size: large=8, small=4
$result_size = "large";

# unescape unicode characters in" content"
sub unescape {
my($str) = splice(@_);
$str =~ s/\u(.{4})/chr(hex($1))/eg;
return $str;
}

# number of command line args
$numArgs = $#ARGV + 1;

if($numArgs ==0){
# print usage info if no argument is given
print "Usage:n";
print "$0 <searchterm>n";
} else {
# use first argument as query string
$q = $ARGV[0];
# url encode query string
$q =~ s/([^A-Za-z0-9])/sprintf("%%%02X", ord($1))/seg;

# get json encoded search result from google ajax api
my $content = get("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&start=0&rsz=$result_size&q=$q");
#Get web page in content
die "get failed" if (!defined $content);

# ugly result parsing (did not want to depend on a parser lib for this quick hack)
while($content =~ s/"unescapedUrl":"([^"]*)".*?"titleNoFormatting":"([^"]*)".*?"content":"([^"]*)"//){

# those three data items are extrated, there are more
$title = unescape($2);
$desc = unescape($3);
$url = unescape($1);

# print result
if($use_color){
print colored ['blue'], "$titlen";
print "$descn";
print colored ['green'], "$urlnn";
print color 'reset';
}
else{
print "$titlen$descn$urlnn";
}
}
}

Fedora下配置VNC服务器

Apr 5th, 2011 12:00 am | Comments

1、

rpm -qa | grep vnc

可以查找到
vnc-1.0.0-2.fc11.i586
vnc-server-1.0.0-2.fc11.i586
否则安装vnc和vncserver

2、

env

假设可找到HOME=/home/user用户

3、

ifconfig

假设ip=192.168.119.131

4、

vncserver:10

指令创建用户，开启10为终端

5、

vi /home/user/.vnc/xstartup

修改最后一行tmp& 为gnome&

6、root权限

vi /etc/sysconfig/vncservers

修改：
VNCSERVERS=”10：user”
VNCSERVERARGS[10]=”-geometry 800x600”

7、root权限

vi /etc/sysconfig/iptables

添加
-A INPUT -m state –state NEW -m tcp -p tcp –dport 5910 -j ACCEPT
-A INPUT -m state –state NEW -m udp -p udp –dport 5910 -j ACCEPT

8、修改selinux

vi /etc/selinux/config

修改：
SELINUX=disabed

9、root

service xinetd restart
service iptables restart
chkconfig --level 345 vncserver on//开机自启动
service vncserver restart

至此服务器已经启动
在其他电脑上
ping 192.169.119.131 没问题
安装vncviewer
输入192.168.119.131:5910
恭喜成功

10、VNC黑屏问题
将黑屏用户的xstartup（一般为：/home/user用户名称/.vnc/xstartup）文件的属性修改为755（rwxr-xr-x）.

chmod 755 /home/user/.vnc/xstartup

重启vncserver服务即可！

service vncserver restart

Mysql Common Command

Mar 24th, 2011 12:00 am | Comments

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html --- layout: post title: MySql常用命令总结 categories: - Linux - OS tags: - Command - Linux - Mysql - Note published: true comments: true ---

一、连接MYSQL。

======= <![CDATA[Category: Linux | Bioops]]> 2016-06-28T17:00:47+00:00 http://bioops.info/ Octopress <![CDATA[Vi编辑器的属性设置]]> 2011-11-25T00:00:00+00:00 http://bioops.info/2011/11/linux-vi-configure 在当前用户目录下用vi新建并打开.vimrc文件

vi ~/.vimrc

" 在文件中写入以下设置

" 设置显示行号
set nu

" 设置tab的长度为4
set tabstop=4 

" 打开状态栏标尺
set ruler

" 突出当前行
set cursorline

" 自动语法高亮
syntax on

保存退出后即可生效

]]> <![CDATA[Shortcut Cheat Sheet]]> 2011-11-13T00:00:00+00:00 http://bioops.info/2011/11/shortcut-cheat-sheet

]]> <![CDATA[做生物信息常用到的linux命令]]> 2011-08-27T00:00:00+00:00 http://bioops.info/2011/08/linux-command-bioinformatics 1，统计一个序列文件中的序列个数（grep用好了可以非常快捷方便地处理一些数据）

grep -c '>' seqfile

2, 查看大文件头几行或最后几行

head seqfile
tail seqfile

3，文件行数

wc -l seqfile

4，矩阵格式的文件，提取其中的某几列（例如blast -m 8）

 cut -f 1,2,11 seq.cblast > seq.abc

5，awk和sed

先学的perl，后知道awk和sed，认识到很多事情用awk和sed解决比写perl脚本方便多了

比如fastq转换fasta文件：

awk ‘NR % 4 == 1 || NR % 4 == 2′ myfile.fastq | sed -e ‘s/@/>/’ > myfile.fasta

6，screen管理远程任务，可以在远程会话断开后继续在后台运行，详见此文。

7，vi/vim就不必细说了，编程必备。(emacs党自动替换成emacs)

8，暂时就这么多了。好长时间没做过东西了。等用到或者想到了再加。

]]> <![CDATA[Fedora下配置VNC服务器]]> 2011-04-05T00:00:00+00:00 http://bioops.info/2011/04/fedora-vncserver 1、

rpm -qa | grep vnc

可以查找到
vnc-1.0.0-2.fc11.i586
vnc-server-1.0.0-2.fc11.i586
否则安装vnc和vncserver

2、

env

假设可找到HOME=/home/user用户

3、

ifconfig

假设ip=192.168.119.131

4、

vncserver:10

指令创建用户，开启10为终端

5、

vi /home/user/.vnc/xstartup

修改最后一行tmp& 为gnome&

6、root权限

vi /etc/sysconfig/vncservers

修改：
VNCSERVERS=”10：user”
VNCSERVERARGS[10]=”-geometry 800x600”

7、root权限

vi /etc/sysconfig/iptables

添加
-A INPUT -m state –state NEW -m tcp -p tcp –dport 5910 -j ACCEPT
-A INPUT -m state –state NEW -m udp -p udp –dport 5910 -j ACCEPT

8、修改selinux

vi /etc/selinux/config

修改：
SELINUX=disabed

9、root

service xinetd restart
service iptables restart
chkconfig --level 345 vncserver on//开机自启动
service vncserver restart

至此服务器已经启动
在其他电脑上
ping 192.169.119.131 没问题
安装vncviewer
输入192.168.119.131:5910
恭喜成功

10、VNC黑屏问题
将黑屏用户的xstartup（一般为：/home/user用户名称/.vnc/xstartup）文件的属性修改为755（rwxr-xr-x）.

chmod 755 /home/user/.vnc/xstartup

重启vncserver服务即可！

service vncserver restart

]]> <![CDATA[MySql常用命令总结]]> 2011-03-24T00:00:00+00:00 http://bioops.info/2011/03/mysql-common-command 一、连接MYSQL。

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml

格式： mysql -h主机地址 -u用户名－p用户密码

1、例1：连接到本机上的MYSQL。

首先在打开DOS窗口，然后进入目录 mysqlbin，再键入命令mysql -uroot -p，回车后提示你输密码，如果刚安装好MYSQL，超级用户root是没有密码的，故直接回车即可进入到MYSQL中了，MYSQL的提示符是：mysql>

2、例2：连接到远程主机上的MYSQL。假设远程主机的IP为：110.110.110.110，用户名为root,密码为abcd123。则键入以下命令：

mysql -h110.110.110.110 -uroot -pabcd123

（注:u与root可以不用加空格，其它也一样）

3、退出MYSQL命令： exit （回车）

other:

Use yum to install both mysql command line tool and the server:
yum -y install mysql mysql-server
Enable the MySQL service:
/sbin/chkconfig mysqld on
Start the MySQL server:
/sbin/service mysqld start
Set the MySQL root password:
mysqladmin -u root password 'new-password'The quotes around the new password are required.
log on mysql:
mysql -h 127.0.0.1 -u user_name -p
mysqld --basedir=/ --datadir=/usr/local/mysql/data --user=mysql --pid-file=/usr/local/mysql/data/RFSIM.pid --skip-external-locking --port=3306 --socket=/tmp/mysql.sock

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html

出现”/var/lib/mysql/mysql.sock“不存在的解决方法
=======

出现”/var/lib/mysql/mysql.sock“不存在的解决方法
>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml 1、
创建/修改文件 /etc/my.cnf，至少增加/修改一行
[mysql]
[client]
socket = /tmp/mysql.sock
#在这里写上你的mysql.sock的正确位置，通常不是在 /tmp/ 下就是在 /var/lib/mysql/ 下
2、
指定IP地址，使用tcp方式连接mysql，而不使用本地sock方式
#mysql -h127.0.0.1 -uuser -ppassword
3、
为 mysql.sock 加个连接，比如说实际的mysql.sock在 /tmp/ 下，则
#ln -s /tmp/mysql.sock /var/lib/mysql/mysql.sock即可

二、修改密码。

格式：mysqladmin -u用户名 -p旧密码 password 新密码

1、例1：给root加个密码ab12。首先在DOS下进入目录mysqlbin，然后键入以下命令

mysqladmin -uroot -password ab12

注：因为开始时root没有密码，所以-p旧密码一项就可以省略了。

2、例2：再将root的密码改为djg345。

mysqladmin -uroot -pab12 password djg345

三、增加新用户。（注意：和上面不同，下面的因为是MYSQL环境中的命令，所以后面都带一个分号作为命令结束符）

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html

格式：grant select on 数据库.* to 用户名@登录主机 identified by “密码”

例1、增加一个用户test1密码为abc，让他可以在任何主机上登录，并对所有数据库有查询、插入、修改、删除的权限。首先用以root用户连入MYSQL，然后键入以下命令：

grant select,insert,update,delete on *.* to test1@”%” Identified by “abc”;

=======

格式：grant select on 数据库.* to 用户名@登录主机 identified by “密码”

grant select,insert,update,delete on *.* to test1@”%” Identified by “abc”;

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml

但例1增加的用户是十分危险的，你想如某个人知道test1的密码，那么他就可以在internet上的任何一台电脑上登录你的mysql数据库并对你的数据可以为所欲为了，解决办法见例2。

例2、增加一个用户test2密码为abc,让他只可以在localhost上登录，并可以对数据库mydb进行查询、插入、修改、删除的操作（localhost指本地主机，即MYSQL数据库所在的那台主机），这样用户即使用知道test2的密码，他也无法从internet上直接访问数据库，只能通过MYSQL主机上的web页来访问了。

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html

grant select,insert,update,delete on mydb.* to test2@localhost identified by “abc”;

如果你不想test2有密码，可以再打一个命令将密码消掉。

grant select,insert,update,delete on mydb.* to test2@localhost identified by “”;

=======

grant select,insert,update,delete on mydb.* to test2@localhost identified by “abc”;

如果你不想test2有密码，可以再打一个命令将密码消掉。

grant select,insert,update,delete on mydb.* to test2@localhost identified by “”;

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml

在上篇我们讲了登录、增加用户、密码更改等问题。下篇我们来看看MYSQL中有关数据库方面的操作。注意：你必须首先登录到MYSQL中，以下操作都是在MYSQL的提示符下进行的，而且每个命令以分号结束。

一、操作技巧

1、如果你打命令时，回车后发现忘记加分号，你无须重打一遍命令，只要打个分号回车就可以了。也就是说你可以把一个完整的命令分成几行来打，完后用分号作结束标志就OK。

2、你可以使用光标上下键调出以前的命令。但以前我用过的一个MYSQL旧版本不支持。我现在用的是mysql-3.23.27-beta-win。

二、显示命令

1、显示数据库列表。

show databases;

刚开始时才两个数据库：mysql和test。mysql库很重要它里面有MYSQL的系统信息，我们改密码和新增用户，实际上就是用这个库进行操作。

2、显示库中的数据表：

use mysql；／／打开库，学过FOXBASE的一定不会陌生吧

show tables;

3、显示数据表的结构：

describe 表名;

4、建库：

create database 库名;

5、建表：

use 库名；

create table 表名 (字段设定列表)；

6、删库和删表:

drop database 库名;

drop table 表名；

7、将表中记录清空：

delete from 表名;

8、显示表中的记录：

select * from 表名;

三、一个建库和建表以及插入数据的实例

drop database if exists school; //如果存在SCHOOL则删除 
create database school; //建立库SCHOOL 
use school; //打开库SCHOOL 
create table teacher //建立表TEACHER 
( 
id int(3) auto_increment not null primary key, 
name char(10) not null, 
address varchar(50) default '深圳', 
year date 
); //建表结束 
//以下为插入字段 
insert into teacher values('','glchengang','深圳一中','1976-10-10'); 
insert into teacher values('','jack','深圳一中','1975-12-23');

注：在建表中（1）将ID设为长度为3的数字字段:int(3)并让它每个记录自动加一:auto_increment并不能为空:not null而且让他成为主字段primary key（2）将NAME设为长度为10的字符字段（3）将ADDRESS设为长度50的字符字段，而且缺省值为深圳。varchar和char有什么区别呢，只有等以后的文章再说了。（4）将YEAR设为日期字段。

如果你在mysql提示符键入上面的命令也可以，但不方便调试。你可以将以上命令原样写入一个文本文件中假设为school.sql，然后复制到c:\下，并在DOS状态进入目录\mysql\bin，然后键入以下命令：

mysql -uroot -p密码 < c:\school.sql

如果成功，空出一行无任何显示；如有错误，会有提示。（以上命令已经调试，你只要将//的注释去掉即可使用）。

四、将文本数据转到数据库中

1、文本数据应符合的格式：字段数据之间用tab键隔开，null值用\n来代替.

例：

3 rose 深圳二中 1976-10-10

4 mike 深圳一中 1975-12-23

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html

2、数据传入命令 load data local infile “文件名” into table 表名;

=======

2、数据传入命令 load data local infile “文件名” into table 表名;

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml

注意：你最好将文件复制到\mysql\bin目录下，并且要先用use命令打表所在的库。

五、备份数据库：（命令在DOS的\mysql\bin目录下执行）

<<<<<<< HEAD:source/_posts/2011-03-24-mysql-common-command.html

mysqldump –opt school>school.bbb

注释:将数据库school备份到school.bbb文件，school.bbb是一个文本文件，文件名任取，打开看看你会有新发现。

后记：其实MYSQL的对数据库的操作与其它的SQL类数据库大同小异，您最好找本将SQL的书看看。我在这里只介绍一些基本的，其实我也就只懂这些了，呵呵。最好的MYSQL教程还是”晏子”译的”MYSQL中文参考手册”不仅免费每个相关网站都有下载，而且它是最权威的。可惜不是象”PHP4中文手册”那样是chm的格式，在查找函数命令的时候不太方便。

=======

mysqldump –opt school>school.bbb

注释:将数据库school备份到school.bbb文件，school.bbb是一个文本文件，文件名任取，打开看看你会有新发现。

]]> >>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:category/linux/atom.xml

Linux 查看系统信息命令

Mar 24th, 2011 12:00 am | Comments

系统

uname -a               # 查看内核/操作系统/CPU信息
head -n 1 /etc/issue   # 查看操作系统版本
cat /proc/cpuinfo      # 查看CPU信息
hostname               # 查看计算机名
lspci -tv              # 列出所有PCI设备
lsusb -tv              # 列出所有USB设备
lsmod                  # 列出加载的内核模块
env                    # 查看环境变量

资源

free -m                # 查看内存使用量和交换区使用量
df -h                  # 查看各分区使用情况
du -sh <目录名>        # 查看指定目录的大小
grep MemTotal /proc/meminfo   # 查看内存总量
grep MemFree /proc/meminfo    # 查看空闲内存量
uptime                 # 查看系统运行时间、用户数、负载
cat /proc/loadavg      # 查看系统负载

磁盘和分区

mount | column -t      # 查看挂接的分区状态
fdisk -l               # 查看所有分区
swapon -s              # 查看所有交换分区
hdparm -i /dev/hda     # 查看磁盘参数(仅适用于IDE设备)
dmesg | grep IDE       # 查看启动时IDE设备检测状况

网络

ifconfig               # 查看所有网络接口的属性
iptables -L            # 查看防火墙设置
route -n               # 查看路由表
netstat -lntp          # 查看所有监听端口
netstat -antp          # 查看所有已经建立的连接
netstat -s             # 查看网络统计信息

进程

ps -ef                 # 查看所有进程
top                    # 实时显示进程状态

用户

w                      # 查看活动用户
id <用户名>            # 查看指定用户信息
last                   # 查看用户登录日志
cut -d: -f1 /etc/passwd   # 查看系统所有用户
cut -d: -f1 /etc/group    # 查看系统所有组
crontab -l             # 查看当前用户的计划任务

服务

chkconfig --list       # 列出所有系统服务
chkconfig --list | grep on    # 列出所有启动的系统服务

程序

rpm -qa                # 查看所有安装的软件包

另外这里还有非常多的命令，可以作为参考。

← Older Blog Archives Newer →