Bioops

Bioinformatics=(ACGAAG->AK)+(#!/bin/sh)+(P(A|B)=P(B|A)*P(A)/P(B))

Contig

| Comments

contig is a contiguous sequence of bases that has been constructed by aligning reads and building consensus. Contigs are strung together, with gaps in between, to create a supercontig or scaffold.

Keep in mind: A supercontig or scaffold is typically comprised of hundreds of contigs, and a contig typically consists of thousands of reads.

Software Packages for Whole Genome Alignment

| Comments

More softwares and information will be added. (latest updated on 28/02/2011)

Several available tools for alignment of whole genomes.

WABA (Kent and Zahler 2000) Wobble Aware Bulk Aligner for cross-species whole genome alignment

LASTZ or BLASTZ * recommended (Schwartz et al. 2003) LASTZ is a program for aligning DNA sequences, a pairwise aligner. Originally designed to handle sequences the size of human chromosomes and from different species, it is also useful for sequences produced by NGS sequencing technologies such as Roche 454.

LAGAN (Brudno et al. 2003) The Lagan Tookit is a set of alignment programs for comparative genomics. The three main components are a pairwise aligner (LAGAN), a multiple aligner (M-LAGAN), and a glocal aligner (Shuffle-LAGAN). All three are based on the CHAOS local alignment tool and combine speed (regions up to several megabases can be aligned in minutes) with high accuracy. The results of the alignment can be visualized using the VISTA visualization tool.

MUMMER *recommended (3 papers for 3 versions 1.02.13.0) MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.

AVID (or click here)(Bray et al. 2002) AVID is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long.

Cgaln (Nakato and Gotoh et al. 2010) Cgaln (Coarse grained alignment) is a program designed to align a pair of whole genomic sequences of not only bacteria but also entire chromosomes of vertebrates on a nominal desktop computer. Cgaln performs an alignment job in two steps, at the block level and then at the nucleotide level. The former “coarse-grained” alignment can explore genomic rearrangements and reduce the regions to be analyzed in the next step. The latter is devoted to detailed alignment within the limited regions found in the first stage. The output of Cgaln is ‘glocal’ in the sense that rearrangements are taken into consideration while each alignable region is extended as long as possible. Thus, Cgaln is not only fast and memory-efficient, but also can filter noisy outputs without missing the most important homologous segment pairs.

LAST **highly recommended (Kiełbasa et al. 2011)

LAST can:

  • Handle bigsequence data, e.g:
    • Compare two vertebrate genomes
    • Align billions of DNA reads to a genome
  • Indicate the reliability of each aligned column.
  • Use sequence quality data properly.
  • Compare DNA to proteins, with frameshifts.
  • Compare PSSMs to sequences
  • Calculate the likelihood of chance similarities between random sequences.
Alfresco (Dalca and Brudno 2008) A key feature of the program is to use available analysis programs relevant to comparative genome sequence analysis, combine the results of these, and graphically present them in an intuitive way, thereby facilitating the analysis of large genomic regions.

Softwares for finding nearly identical regions in whole genomes very fast

BLAT (read everything about it on Wikipedia or UCSC Genome Browser and FAQ) (Kent 2002) BLAT (the BLAST-Like Alignment Tool) is a software program developed by Jim Kent at UCSC to identify similarities between DNA sequences and protein sequences.BLAT is much faster than older tools such as BLAST for nucleotide and protein alignments, and it can also perform spliced alignments of RNA to DNA.

BLAST (megablast) OK! Everyone knows it! Just click it for latest version of blast. (Please keep in mind it’s BLAST not BLAST+, for BLAST+, click here)

SSAHA2 (Ning et al 2001 paper about SSAHA) (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences. SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.) are supported. A pile-up pipeline for analysis and genotype calling is available as a separate package.

Software Packages for Discovering Structural Variation With Next-generation Sequencing

| Comments

More softwares and information will be added. (latest updated on 28/02/2011)

Highly recommended you to read this paper. Mapping copy number variation by population-scale genome sequencing (Mills et al. 2011)

PEMer (Korbel et al. 2009) It comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database.

SegSeq (Chiang et al. 2009)  Detect and localize copy-number alterations from massively parallel sequence data. A simple approach would be to partition the genome into windows of fixed size, estimate the tumor-normal ratios for each window and use standard segmentation algorithms to decompose the genome into regions of equivalent copy number.

VariationHunter (Hormozdiari et al. 2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. It’s based on maximum parsimony

MoDIL (Lee et al.2009) MoDIL, or Mixture of Distributions Indel Locator, a novel method for finding medium sized indels from high throughput sequencing datasets. The MoDIL algorithm compares the distribution of insert sizes in the sequenced library to the distribution of the observed mapped distances at a particular genomic location.

Pindel (Ye et al. 2009) A pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads.

BreakDancer (Chen et al. 2009) BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

cnvHMM copy number analysis using Hidden Markov algorithm

Geometric Analysis of Structural Variants (GASV) (Sindi et al 2009) A geometric approach for identification, classification and comparison of structural variants. The software is for analysis of structural variation from paired-end sequencing and/or array-CGH data.

Sequence Variant Analyzer (SVA) SVA is a computer software project designed to annotate, visualize, and analyze the genetic variants identified through next-generation sequencing studies, including whole-genome sequencing (WGS) and exome sequencing studies.

SWT It is a collection of R functions for statistical analysisof genome-wide data, by Qunyuan Zhang (qunyuan@wustl.edu), DSG.

VarScan (Koboldt et al. 2009) VarScan is a platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples. (not actually structural variation)

CNV-seq (Xie and Tammi. 2009) The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV.

BreakSeq (Lam et al. 2010) a pipeline for annotation, classification and analysis of SVs at single nucleotide resolution.

CopyMap (Zöllner 2010) The program package CopyMap identifies copy number variation from oligo-hybridization and CGH data. Using a time-dependent hidden Markov model to combine evidence of copy number variants (CNVs) across multiple carriers, CopyMap is substantially more accurate than standard hidden Markov methods in identifying CNVs and calling CNV-carriers. Moreover, CopyMap provides more precise estimates of CNV-boundaries.

SLOPE (Abel et al. 2010) A quick and accurate method for locating non-SNP structural variation from targeted next-generation sequence data.
By focusing on a small (a few kb to a few Mb) target reference sequence, SLOPE can perform fast and flexible split-read alignments and determine ‘chimeric’ sequences with single-base resolution.
SLOPE aims to detect sequence breakpoints from only one side of a split read, and therefore does not rely on the insert size for detection.

HYDRA (Quinlan et al. 2010) Hydra detects structural variation (SV) breakpoints by clustering discordant paired-end alignments whose “signatures” corroborate the same putative breakpoint. Hydra can detect breakpoints caused by all classes of structural variation. Moreover, it was designed to detect variation in both unique and duplicated genomic regions; therefore, it will examine paired-end reads having multiple discordant alignments.

CnD (Simpson et al. 2010) A copy number variant caller for inbred strains.
The target organism is assumed to be inbred, and therefore homozygous, so regions of apparent heterozygous SNPs (as called by MAQ) can be used to detect copy number gains. cnD uses both the rate of these paralogous sequence variants, and the raw sequence depth, to call copy number gains and losses using a hidden markov model.

AGE (Abyzov and Gerstein 2011) defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision.

CNVnator (Abyzov et al. 2011) An approach to discover, genotype and characterize typical and atypical CNVs from family and population genome sequencing.

Other relevant softwares

SAMtools SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Maq Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.

Words of Food 几乎所有食物的英文翻译

| Comments

水果类(fruits):

西红柿tomato 菠萝pineapple 西瓜watermelon 香蕉banana 柚子shaddock(pomelo) 橙子orange 苹果apple 柠檬lemon 樱桃cherry 桃子peach 梨pear 枣Chinese date (去核枣 pitted date ) 椰子coconut 草莓strawberry 树莓raspberry 蓝莓blueberry 黑莓blackberry 葡萄grape 甘蔗sugar cane 芒果mango 木瓜 pawpaw或者papaya 杏子apricot 油桃nectarine 柿子persimmon 石榴pomegranate 榴莲jackfruit 槟榔果areca nut (西班牙产苦橙)bitter orange 猕猴桃kiwi fruit or Chinese gooseberry 金橘cumquat 蟠桃flat peach 荔枝litchi 青梅greengage 山楂果haw 水蜜桃honey peach 香瓜,甜瓜musk melon 李子plum 杨梅waxberry red bayberry 桂圆longan 沙果crab apple 杨桃starfruit 枇杷loquat 柑橘tangerine 莲雾wax-apple 番石榴guava

肉、蔬菜类(livestock家畜):

南瓜(倭瓜)pumpkin cushaw 甜玉米Sweet corn 牛肉beef 猪肉pork 羊肉mutton 羔羊肉lamb 鸡肉chicken 生菜 莴苣lettuce 白菜Chinese cabbage (celery cabbage)(甘蓝)卷心菜cabbage 萝卜radish 胡萝卜carrot 韭菜leek 木耳agarics 豌豆pea 马铃薯(土豆)potato 黄瓜cucumber 苦瓜balsam pear 秋葵okra 洋葱onion 芹菜celery 芹菜杆celery sticks 地瓜sweet potato 蘑菇mushroom 橄榄olive 菠菜spinach 冬瓜(Chinese)wax gourd 莲藕lotus root 紫菜laver 油菜cole rape 茄子eggplant 香菜caraway 枇杷loquat 青椒green pepper 四季豆 青刀豆garden bean 银耳silvery fungi 腱子肉tendon 肘子pork joint 茴香fennel(茴香油fennel oil 药用) 鲤鱼carp 咸猪肉bacon 金针蘑needle mushroom 扁豆lentil 槟榔areca 牛蒡great burdock 水萝卜summer radish 竹笋bamboo shoot 艾蒿Chinese mugwort 绿豆mung bean 毛豆green soy bean 瘦肉lean meat 肥肉speck 黄花菜day lily (day lily bud) 豆芽菜bean sprout 丝瓜towel gourd (注:在美国丝瓜或用来做丝瓜茎loofah洗澡的,不是食用的)

海鲜类(sea food):

虾仁Peeled Prawns 龙虾lobster 小龙虾crayfish(退缩者) 蟹crab 蟹足crab claws 小虾(虾米)shrimp 对虾、大虾prawn (烤)鱿鱼(toast)squid 海参sea cucumber 扇贝scallop 鲍鱼sea-ear abalone 小贝肉cockles 牡蛎oyster 鱼鳞scale 海蜇jellyfish鳖 海龟turtle 蚬蛤clam 鲅鱼culter 鲳鱼butterfish 虾籽shrimp egg 鲢鱼 银鲤鱼chub silver carp 黄花鱼yellow croaker

调料类(seasonings):

醋vinegar 酱油soy 盐salt 加碘盐iodized salt 糖sugar 白糖refined sugar 酱soy sauce 沙拉salad 辣椒hot(red)pepper 胡椒 (black)pepper 花椒wild pepper Chinese prickly ash powder 色拉油salad oil 调料fixing sauce seasoning 砂糖granulated sugar 红糖brown sugar 冰糖Rock Sugar 芝麻Sesame 芝麻酱Sesame paste 芝麻油Sesame oil 咖喱粉curry 番茄酱(汁)ketchup redeye 辣根horseradish 葱shallot (Spring onions) 姜ginger 蒜garlic 料酒cooking wine 蚝油oyster sauce 枸杞(枇杷,欧查果 )medlar 八角aniseed 酵母粉yeast barm Yellow pepper 黄椒 肉桂cinnamon (在美国十分受欢迎,很多事物都有肉桂料) 黄油butter 香草精vanilla extract(甜点必备) 面粉flour 洋葱onion

主食类(staple food):

三文治sandwich 米饭rice 粥congee (rice soup) 汤soup 饺子dumpling 面条noodle 比萨饼pizza 方便面instant noodle 香肠sausage 面包bread 黄油 (白塔油)butter 茶叶蛋Tea eggs 油菜rape 饼干cookies 咸菜(泡菜)pickle 馒头steamed bread 饼(蛋糕)cake 汉堡hamburger 火腿ham 奶酪cheese 馄饨皮wonton skin 高筋面粉Strong flour 小麦wheat 大麦barley 青稞highland barley 高粱broomcorn (kaoliang )春卷Spring rolls 芋头Taro 山药yam 鱼翅shark fin 黄花daylily 松花蛋 皮蛋preserved eggs 春卷spring roll 肉馅饼minced pie 糙米Brown rice 玉米corn 馅儿stuffing 开胃菜appetizer 面粉flour 燕麦oat 白薯 甘薯sweet potato 牛排steak 里脊肉fillet 凉粉bean jelly 糯米 江米sticky rice 燕窝bird’s nest 粟Chinese corn 肉丸子meat balls 枳橙citrange 点心(中式)dim sum 淀粉starch 蛋挞egg tart

干果类(dry fruits) :

腰果Cashew nuts 花生peanut 无花果fig 榛子filbert hazel 栗子chestnut 核桃walnut 杏仁almond 果脯preserved fruit 芋头taro 葡萄干raisin cordial 开心果pistachion 巴西果brazil nut 菱角,荸荠water chestnut (和国内食用法不同,做坚果食用)

酒水类(beverage):

红酒red wine 白酒white wine 白兰地brandy 葡萄酒sherry 汽水(软饮料)soda (盐)汽水sparkling water 果汁juice 冰棒Ice-lolly 啤酒beer 酸奶yoghurt 伏特加酒vodka 鸡尾酒cocktail 豆奶soy milk 豆浆soybean milk 七喜 7 UP 麒麟(日本啤酒kirin) 凉开水cold boiled water 汉斯啤酒Hans beer 浓缩果汁concentrated juice 冰镇啤酒iced(chilled ) beer 札幌(日本啤酒)Sapporo 爱尔啤酒(美国)ale A级牛奶grand A milk 班图酒bantu beer 半干雪利dry sark 参水牛奶blue milk 日本粗茶bancha 生啤酒draft beer 白啤酒white beer 大麦酒barley-bree 咖啡伴侣coffee mate

零食类(snack):

薄荷糖mint 薄脆饼干cracker 饼干biscuit 棒棒糖bonbon 茶tea(沏茶 make the tea) 话梅prune candied plum 锅巴rice crust 瓜子melon seed 冰棒(冰果)ice(frozen) sucker 冰淇凌ice cream 防腐剂preservative 圣代冰淇淋sundae 巧克力豆marble chocolate barley 布丁pudding

与食品有关的词语(some words about food):

炸fired 炝quick boiled 烩braise(烩牛舌 braised ox tongue) 烤roast 饱嗝burp 饱了 饱的full stuffed 解渴quench thirst (形容食物变坏spoil spoilage) 防腐剂preservative 产品有效期expiration date (形容酒品好: a good strong brew 绝味酿 )

应各位要求补充的中式西式食物

中式早點:

烧饼Clay oven rolls 油条Fried bread stick 韭菜盒Fried leek dumplings 水饺Boiled dumplings 蒸饺Steamed dumplings 馒头Steamed buns 割包Steamed sandwich 饭团Rice and vegetable roll 蛋饼Egg cakes 皮蛋100-year egg 咸鸭蛋Salted duck egg 豆浆Soybean milk

饭 类:

稀饭Rice porridge 白饭Plain white rice 油饭Glutinous oil rice 糯米饭Glutinous rice 卤肉饭Braised pork rice 蛋炒饭Fried rice with egg 地瓜粥Sweet potato congee

面 类:

馄饨面Wonton noodles 刀削面Sliced noodles 麻辣面Spicy hot noodles 麻酱面Sesame paste noodles 鴨肉面Duck with noodles 鱔魚面Eel noodles 乌龙面Seafood noodles 榨菜肉丝面Pork , pickled mustard green noodles 牡蛎细面Oyster thin noodles 板条Flat noodles 米粉Rice noodles 炒米粉Fried rice noodles 冬粉Green bean noodle

汤 类:

鱼丸汤Fish ball soup 貢丸汤Meat ball soup 蛋花汤Egg & vegetable soup 蛤蜊汤Clams soup 牡蛎汤Oyster soup 紫菜汤Seaweed soup 酸辣汤Sweet sour soup 馄饨汤Wonton soup 猪肠汤Pork intestine soup 肉羹汤Pork thick soup 鱿鱼汤Squid soup 花枝羹Squid thick soup

中餐:

bear’s paw熊掌 * of deer鹿脯 beche-de-mer; sea cucumber海参 sea sturgeon海鳝 salted jelly fish海蜇皮kelp,seaweed 海带 abalone鲍鱼shark fin鱼翅scallops干贝lobster龙虾 bird’s nest 燕窝 roast suckling pig 考乳猪pig’s knuckle 猪脚 boiled salted duck 盐水鸭 preserved meat腊肉 barbecued pork 叉烧 sausage 香肠 fried pork flakes 肉松 BAR-B-Q 烤肉meat diet 荤菜 vegetables 素菜 meat broth 肉羹 local dish 地方菜 Cantonese cuisine 广东菜 set meal 客饭 curry rice 咖喱饭fried rice 炒饭 plain rice 白饭 crispy rice 锅巴gruel, soft rice , porridge 粥 —noodles with gravy 打卤面plain noodle 阳春面 casserole 砂锅 chafing dish,fire pot火锅 meat bun肉包子shao-mai烧麦preserved bean curd 腐乳bean curd 豆腐fermented blank bean 豆豉 pickled cucumbers 酱瓜preserved egg 皮蛋 salted duck egg 咸鸭蛋 dried turnip 萝卜干

西餐与日本料理:

menu菜单 French cuisine法国菜 today’s special 今日特餐 chef’s special 主厨特餐 buffet 自助餐 fast food 快餐 specialty 招牌菜 continental cuisine 欧式西餐 aperitif 饭前酒 dim sum 点心 French fires炸薯条baked potato烘马铃薯 mashed potatoes马铃薯泥omelette 简蛋卷 pudding 布丁 pastries 甜点 pickled vegetables 泡菜 kimchi 韩国泡菜 crab meat 蟹肉 prawn 明虾 conch 海螺 escargots 田螺braised beef 炖牛肉 bacon 熏肉 poached egg 荷包蛋 sunny side up 煎一面荷包蛋 over 煎两面荷包蛋 fried egg 煎蛋over easy 煎半熟蛋 over hard 煎全熟蛋 scramble eggs 炒蛋boiled egg 煮蛋 stone fire pot 石头火锅 sashi 日本竹筷 sake 日本米酒miso shiru 味噌汤 roast meat 铁板烤肉 sashimi 生鱼片 butter 奶油

本地blast

| Comments

1.Blast程序下载和安装 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST在该目录中选择需要的blast程序,下载.
windows用户: 双击,在该文件所在目录下会生成一系列文件。在c:windows下创建名为NCBI.ini的配置文件,用记事本写入:

[NCBI]
Data ="pathdata"

(注意:path代表你电脑上blast的安装目录)

Linux 用户:直接解压

2.下载数据库 ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ 也可以用fasta文件创建自己的数据库

3.格式化数据库 windows用户: 进入cmd,使用cd /d 命令打开blast程序所在文件夹。
linux用户,也需要在终端内进入/blast/bin/ 文件夹, 也可使用export PATH=$PATH:/path/to/blast/bin/ 省去进入blast文件夹
输入:formatdb -i databasename -p F -o T
databasename表示自己选择的数据库(最好使用绝对路径)
-i input file 参数用于指定需要格式的数据库
-p type of file 用于指定文件类型,T 为蛋白质,F为核酸,默认为 T
-o parse options 用于指定是否解析序列ID并创建索引 T 为创建,F为不创建,默认为F。如果不用T,会提示[NULL_Caption] WARNING: “inputseq”: Could not find index files for database “databasename”
可以输入formatdb –help 来获取相关参数的解释和帮助。

4.blastall windows用户: 进入cmd,使用cd /d 命令打开blast程序所在文件夹。
linux用户,也需要在终端内进入/blast/bin/ 文件夹。
输入:blastall -p blastn -d databasename -i inputfile -o outputfile
-p program name 为需要使用的程序名
blastn 为核酸序列对比搜索
blastp 为蛋白质序列对比搜索
blastx 为用被翻译的核酸序列在蛋白质数据库中搜索
tblastn 为 用蛋白质序列在 [核酸序列翻译后数据库] 中搜索
tblastx 为用翻译后的核酸序列 在 核酸序列翻译后数据库中搜索
可以输入blastall - 来获取相关参数的解释和帮助。
-d databasename 指定所使用的数据库名称
-i inputfile 待搜索的序列文件(最好使用绝对路径)
-o outputfile 指定保存结果的文件(最好使用绝对路径)

注:通过Perl 脚本实现本地化运行BLAST

local_blast.pl
1
2
3
4
5
6
7
8
9
10
#!/usr/bin/perl -w
$formatdb = "blastpath/formatdb";
$blastall="blastpath/blastall";
$database="database path";
$input="inputfile path";
$output="outputfile path";
$system_check=system("$formatdb -i $database -p F -o T");
$system_check=system("$blastall -p blastn -d $database -i $input -o $output");
#windows用户注意:路径中目录必须用“/”,而不是widows常用的“”;
#前面所说的文件路径一律用“”。linux用户无视.

Synteny Relevant Links

| Comments

<<<<<<< HEAD:source/_posts/2011-02-17-synteny-relevant-links.html --- layout: post title: Synteny relevant links categories: - Bioinformatics - Genomics - Software tags: - Bioinformatics - Comparative Genomics - Evolution - Genomics - Software - Synteny published: true comments: true ---

Copy from http://www.symapdb.org/docs/links.html

======= Synteny Relevant Links - Bioops

Bioops

Bioinformatics=(ACGAAG->AK)+(#!/bin/sh)+(P(A|B)=P(B|A)*P(A)/P(B))

Synteny Relevant Links

| Comments

Copy from http://www.symapdb.org/docs/links.html

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:2011/02/synteny-relevant-links/index.html

Contents:

  • Information
  • On-line comparative sequenced genome databases
  • Synteny downloadable programs
  • A few multiple alignment programs

Information

Information Comments
Homology Wiki
Useful definitions Nature Reviews Genetics
Alignment software Wiki
Plant database USDA
Plant genomes GenBanks
Angliosperm Phylogeny Missouri Botanical Garden
Tree of Life Collaborative effort to provide information about biodiversity

On-line comparative sequenced genomes databases

Database Species;graphics;computation Publication
Phytozome Plants; Genome browsers; Hartmann et al. 2006
PLAZA Plants; Gene region alignments, Dotplot, Chromosome alignment, Multiplicon; blastp & OrthoMCL method Proost et al. 2009
Plant Genome Duplication Plants; Dotplots and detail; blastp & MCSCAN Tang et al. 2008
Gramene Grasses; Ensembl; Liang et al. 2008
Legume Information System Legumes; CMap;
JGI genome projects Various; Vista and links to Phytozome;
Vista Various; Java display of regions; Shuffle-LAGAN & BLAT in pipeline, allows user input. Frazer et al. 2004
Narcisse Various; Dotplots, circos, chromosome-based gene alignment; blastp & custom script. Courcelle et al. 2008
Cinteny Various; Genome/chromosome view; custom algorithm Sinha & Meller 2007
CoGe Various; Dotplot (SynMap) and ajax Genome Browser; DAGchainer, Blast Lyons & Freeling 2008
OrthoClusterDB No plants; Genome chromosome view, link to Gbrowse; OrthoCluster, allows user input. Ng et al. 2009
The Synteny Database No plants; Various views; Blastp & custom pipeline, emphasis on duplications and ohnologs. Catchen et al. 2009
Ensembl No plants; ComparaMart, DAS visualization; TreeBeST, Pecan. Hubbard et al. 2009
UCSC Genome Browser No plants; Miller et. al. 2007
Inparanoid Various, produce lists of orthologs Ostlund et al. 2009
OrthoMCL Various, produce lists of orthologs Li et al. 2003

Synteny Computation and/or Display Software

Key: Perfect - conserved order in block, Imperfect - allows micro-rearrangments in block, Duplications - allows duplicated regions, Multiplicon - mutually homologous segment between >2 genomes.
Software Input Synteny Graphics Publication
i-ADHoRe N gene lists Perfect, duplications, multiplicon Generates plots using Perl GD Vandepoele et al. 2002 Simmillion et al. 2007
AXTCHAIN BLASTZ alignments Kent et al. 2003
DiagHunter Similarity file Imperfect GenoPix Tcl/Tk Cannon et al. 2003
GRIMM-synteny Similarity file Imperfect, breakpoints Web Tessler 2002 andPevzner and Tessler 2003
FISH Ordered marker file per contig and pairs files Imperfect no Calabrese et al. 2003
DAGchainer Gene pairs with chromosome positions Perfect Dot plot Haas et al. 2004
ColinearScan Protein sequences with positions and pairs file Perfect, estimates gap parameters. no Wang et al. 2006
OSfinder Homologous markers with locations (provide scripts to generate this file from different sources) Perfect, estimates all parameters, no duplication PNG file of blocks and of dotplot Hachiya et al. 2009
OrthoCluster Imperfect Vergara & CHen 2009
Cyntenator guided tree & similarity files Conserved synteny between n genomes no Rodelsperger & Dieterich 2010
Mauve Multiple sequences (small genomes or regions) Conserved, aligns multiple sequences, computes synteny & breakpoints Java graphics & manager Darling et al. 2004
MCMuSeC Homologs for multiple genomes & phylogenetic tree Detects gene clusters no Ling et al. 2009
MCscan blast & gff file DAGchainer + consensus sequence builder no Tang et al. 2008
LineUp Genetic maps no Hampson et al. 2003
GenomeMatcher BLAST, MUMmer no Dotplots and closeup (seegallery) Ohtsubo et al. 2008
CMap Maps (genetic, FPC, sequence) no Perl/CGI (seescreen shots) Youens-Clark et al. 2009
GBrowse_syn Alignments no Perl/CGI GBrowse-based -
Circos GFF files no Perl/CGI Krzywinski et al. 2009

A few alignment programs

Software Input Computation Graphics Publications
Mummer Genome sequence Anchors Java graphical interface Kurtz et al. 2004
LAGAN Genome Sequence Vista Brudno et al. 2007
AVID Genome sequence Aligns and orders Vista Bray et al. 2003
Vista Alignments LAGAN or AVID Java (see screenshot) Frazer et al. 2004

<<<<<<< HEAD:source/_posts/2011-02-17-synteny-relevant-links.html =======

Comments

Copyright © 2016 - Bioops - Powered by Octopress | Themed with Whitespace

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:2011/02/synteny-relevant-links/index.html

Software Packages for Next Generation Sequence Analysis From Seqanswers Com

| Comments

<<<<<<< HEAD:source/_posts/2011-02-17-software-packages-for-next-generation-sequence-analysis-from-seqanswers-com.html --- layout: post title: Software packages for next generation sequence analysis (from seqanswers.com) categories: - Bioinformatics - NGS - Software tags: - Bioinformatics - Linux - NGS - Software published: true comments: true ---

Thank those guys on seqanswers.com

======= Software Packages for Next Generation Sequence Analysis (From seqanswers.com) - Bioops

Bioops

Bioinformatics=(ACGAAG->AK)+(#!/bin/sh)+(P(A|B)=P(B|A)*P(A)/P(B))

Software Packages for Next Generation Sequence Analysis (From seqanswers.com)

| Comments

Thank those guys on seqanswers.com

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:2011/02/software-packages-for-next-generation-sequence-analysis-from-seqanswers-com/index.html

A reasonably thorough table of next-gen-seq software available in the commercial and public domain

Integrated solutionsCLCbio Genomics Workbenchde novo and reference assembly of Sanger, Roche FLX, Illumina, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Windows, Mac OS X and Linux.
Galaxy - Galaxy = interactive and reproducible genomics. A job webportal.
Genomatix - Integrated Solutions for Next Generation Sequencing data analysis.
JMP Genomics - Next gen visualization and statistics tool from SAS. They are working with NCGR to refine this tool and produce others.
NextGENede novo and reference assembly of Illumina, SOLiD and Roche FLX data. Uses a novel Condensation Assembly Tool approach where reads are joined via “anchors” into mini-contigs before assembly. Includes SNP detection, CHiP-seq, browser and other features. Commercial. Win or MacOS.
SeqMan Genome Analyser - Software for Next Generation sequence assembly of Illumina, Roche FLX and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Commercial. Win or Mac OS X.
SHORE - SHORE, for Short Read, is a mapping and analysis pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A suite created by the 1001 Genomes project. Source for POSIX.
SlimSearch - Fledgling commercial product.

Align/Assemble to a referenceBFAST - Blat-like Fast Accurate Search Tool. Written by Nils Homer, Stanley F. Nelson and Barry Merriman at UCLA.
Bowtie - Ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of 25 million reads per hour on a typical workstation with 2 gigabytes of memory. Uses a Burrows-Wheeler-Transformed (BWT) index. Link to discussion thread here. Written by Ben Langmead and Cole Trapnell. Linux, Windows, and Mac OS X.
BWA - Heng Lee’s BWT Alignment program - a progression from Maq. BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence. C++ source.
ELAND - Efficient Large-Scale Alignment of Nucleotide Databases. Whole genome alignments to a reference genome. Written by Illumina author Anthony J. Cox for the Solexa 1G machine.
Exonerate - Various forms of pairwise alignment (including Smith-Waterman-Gotoh) of DNA/protein against a reference. Authors are Guy St C Slater and Ewan Birney from EMBL. C for POSIX.
GenomeMapper - GenomeMapper is a short read mapping tool designed for accurate read alignments. It quickly aligns millions of reads either with ungapped or gapped alignments. A tool created by the 1001 Genomes project. Source for POSIX.
GMAP - GMAP (Genomic Mapping and Alignment Program) for mRNA and EST Sequences. Developed by Thomas Wu and Colin Watanabe at Genentec. C/Perl for Unix.
gnumap - The Genomic Next-generation Universal MAPper (gnumap) is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size. It seeks to align reads from nonunique repeats using statistics. From authors at Brigham Young University. C source/Unix.
MAQ - Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly designed for Illumina with preliminary functions to handle ABI SOLiD data. Written by Heng Li from the Sanger Centre. Features extensive supporting tools for DIP/SNP detection, etc. C++ source
MOSAIK - MOSAIK produces gapped alignments using the Smith-Waterman algorithm. Features a number of support tools. Support for Roche FLX, Illumina, SOLiD, and Helicos. Written by Michael Strömberg at Boston College. Win/Linux/MacOSX
MrFAST and MrsFAST - mrFAST & mrsFAST are designed to map short reads generated with the Illumina platform to reference genome assemblies; in a fast and memory-efficient manner. Robust to INDELs and MrsFAST has a bisulphite mode. Authors are from the University of Washington. C as source.
MUMmer - MUMmer is a modular system for the rapid whole genome alignment of finished or draft sequence. Released as a package providing an efficient suffix tree library, seed-and-extend alignment, SNP detection, repeat detection, and visualization tools. Version 3.0 was developed by Stefan Kurtz, Adam Phillippy, Arthur L Delcher, Michael Smoot, Martin Shumway, Corina Antonescu and Steven L Salzberg - most of whom are at The Institute for Genomic Research in Maryland, USA. POSIX OS required.
Novocraft - Tools for reference alignment of paired-end and single-end Illumina reads. Uses a Needleman-Wunsch algorithm. Can support Bis-Seq. Commercial. Available free for evaluation, educational use and for use on open not-for-profit projects. Requires Linux or Mac OS X.
PASS - It supports Illumina, SOLiD and Roche-FLX data formats and allows the user to modulate very finely the sensitivity of the alignments. Spaced seed intial filter, then NW dynamic algorithm to a SW(like) local alignment. Authors are from CRIBI in Italy. Win/Linux.
RMAP - Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. By Andrew D. Smith and Zhenyu Xuan at CSHL. (published in BMC Bioinformatics). POSIX OS required.
SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly tunable. Written by Hui Jiang from the Wong lab at Stanford. Builds available for most OS’s.
SHRiMP - Assembles to a reference sequence. Developed with Applied Biosystem’s colourspace genomic representation in mind. Authors are Michael Brudno and Stephen Rumble at the University of Toronto. POSIX.
Slider- An application for the Illumina Sequence Analyzer output that uses the probability files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences. Authors are from BCGSC. Paper is here.
SOAP - SOAP (Short Oligonucleotide Alignment Program). A program for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The updated version uses a BWT. Can call SNPs and INDELs. Author is Ruiqiang Li at the Beijing Genomics Institute. C++, POSIX.
SSAHA - SSAHA (Sequence Search and Alignment by Hashing Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. Developed at the Sanger Centre by Zemin Ning, Anthony Cox and James Mullikin. C++ for Linux/Alpha.
SOCS - Aligns SOLiD data. SOCS is built on an iterative variation of the Rabin-Karp string search algorithm, which uses hashing to reduce the set of possible matches, drastically increasing search speed. Authors are Ondov B, Varadarajan A, Passalacqua KD and Bergman NH.
SWIFT - The SWIFT suit is a software collection for fast index-based sequence comparison. It contains: SWIFT — fast local alignment search, guaranteeing to find epsilon-matches between two sequences. SWIFT BALSAM — a very fast program to find semiglobal non-gapped alignments based on k-mer seeds. Authors are Kim Rasmussen (SWIFT) and Wolfgang Gerlach (SWIFT BALSAM)
SXOligoSearch - SXOligoSearch is a commercial platform offered by the Malaysian based Synamatix. Will align Illumina reads against a range of Refseq RNA or NCBI genome builds for a number of organisms. Web Portal. OS independent.
Vmatch - A versatile software tool for efficiently solving large scale sequence matching tasks. Vmatch subsumes the software tool REPuter, but is much more general, with a very flexible user interface, and improved space and time requirements. Essentially a large string matching toolbox. POSIX.
Zoom - ZOOM (Zillions Of Oligos Mapped) is designed to map millions of short reads, emerged by next-generation sequencing technology, back to the reference genomes, and carry out post-analysis. ZOOM is developed to be highly accurate, flexible, and user-friendly with speed being a critical priority. Commercial. Supports Illumina and SOLiD data.

De novo Align/AssembleABySS - Assembly By Short Sequences. ABySS is a de novo sequence assembler that is designed for very short reads. The single-processor version is useful for assembling genomes up to 40-50 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes. By Simpson JT and others at the Canada’s Michael Smith Genome Sciences Centre. C++ as source.
ALLPATHS - ALLPATHS: De novo assembly of whole-genome shotgun microreads. ALLPATHS is a whole genome shotgun assembler that can generate high quality assemblies from short reads. Assemblies are presented in a graph form that retains ambiguities, such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. Broad Institute.
Edena - Edena (Exact DE Novo Assembler) is an assembler dedicated to process the millions of very short reads produced by the Illumina Genome Analyzer. Edena is based on the traditional overlap layout paradigm. By D. Hernandez, P. François, L. Farinelli, M. Osteras, and J. Schrenzel. Linux/Win.
EULER-SR - Short read de novo assembly. By Mark J. Chaisson and Pavel A. Pevzner from UCSD (published in Genome Research). Uses a de Bruijn graph approach.
MIRA2 - MIRA (Mimicking Intelligent Read Assembly) is able to perform true hybrid de-novo assemblies using reads gathered through 454 sequencing technology (GS20 or GS FLX). Compatible with 454, Solexa and Sanger data. Linux OS required.
SEQAN - A Consistency-based Consensus Algorithm for De Novo and Reference-guided Sequence Assembly of Short Reads. By Tobias Rausch and others. C++, Linux/Win.
SHARCGS - De novo assembly of short reads. Authors are Dohm JC, Lottaz C, Borodina T and Himmelbauer H. from the Max-Planck-Institute for Molecular Genetics.
SSAKE - The Short Sequence Assembly by K-mer search and 3’ read Extension (SSAKE) is a genomics application for aggressively assembling millions of short nucleotide sequences by progressively searching for perfect 3’-most k-mers using a DNA prefix tree. Authors are René Warren, Granger Sutton, Steven Jones and Robert Holt from the Canada’s Michael Smith Genome Sciences Centre. Perl/Linux.
SOAPdenovo - Part of the SOAP suite. See above.
VCAKE - De novo assembly of short reads with robust error correction. An improvement on early versions of SSAKE.
Velvet - Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454. Need about 20-25X coverage and paired reads. Developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI).

SNP/Indel DiscoveryssahaSNP - ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. Highly repetitive elements are filtered out by ignoring those kmer words with high occurrence numbers. More tuned for ABI Sanger reads. Developers are Adam Spargo and Zemin Ning from the Sanger Centre. Compaq Alpha, Linux-64, Linux-32, Solaris and Mac
PolyBayesShort - A re-incarnation of the PolyBayes SNP discovery tool developed by Gabor Marth at Washington University. This version is specifically optimized for the analysis of large numbers (millions) of high-throughput next-generation sequencer reads, aligned to whole chromosomes of model organism or mammalian genomes. Developers at Boston College. Linux-64 and Linux-32.
PyroBayes - PyroBayes is a novel base caller for pyrosequences from the 454 Life Sciences sequencing machines. It was designed to assign more accurate base quality estimates to the 454 pyrosequences. Developers at Boston College.

Genome Annotation/Genome Browser/Alignment Viewer/Assembly DatabaseEagleView - An information-rich genome assembler viewer. EagleView can display a dozen different types of information including base quality and flowgram signal. Developers at Boston College.
LookSeq - LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data. LookSeq supports multiple sequencing technologies, alignment sources, and viewing modes; low or high-depth read pileups; and easy visualization of putative single nucleotide and structural variation. From the Sanger Centre.
MapView - MapView: visualization of short reads alignment on desktop computer. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China. Linux.
SAM - Sequence Assembly Manager. Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type. Developers are Rene Warren, Yaron Butterfield, Asim Siddiqui and Steven Jones at Canada’s Michael Smith Genome Sciences Centre. MySQL backend and Perl-CGI web-based frontend/Linux.
STADEN - Includes GAP4. GAP5 once completed will handle next-gen sequencing data. A partially implemented test version is available hereXMatchView - A visual tool for analyzing cross_match alignments. Developed by Rene Warren and Steven Jones at Canada’s Michael Smith Genome Sciences Centre. Python/Win or Linux.

Counting e.g. CHiP-Seq, Bis-Seq, CNV-SeqBS-Seq - The source code and data for the “Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA Methylation Patterning” Nature paper by Cokus et al. (Steve Jacobsen’s lab at UCLA). POSIX.
CHiPSeq - Program used by Johnson et al. (2007) in their Science publication
CNV-Seq - CNV-seq, a new method to detect copy number variation using high-throughput sequencing. Chao Xie and Martti T Tammi at the National University of Singapore. Perl/R.
FindPeaks - perform analysis of ChIP-Seq experiments. It uses a naive algorithm for identifying regions of high coverage, which represent Chromatin Immunoprecipitation enrichment of sequence fragments, indicating the location of a bound protein of interest. Original algorithm by Matthew Bainbridge, in collaboration with Gordon Robertson. Current code and implementation by Anthony Fejes. Authors are from the Canada’s Michael Smith Genome Sciences Centre. JAVA/OS independent. Latest versions available as part of the Vancouver Short Read Analysis PackageMACS - Model-based Analysis for ChIP-Seq. MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. Written by Yong Zhang and Tao Liu from Xiaole Shirley Liu’s Lab.
PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments Relative to Controls. a two-pass approach for scoring ChIP-Seq data relative to controls. The first pass identifies putative binding sites and compensates for variation in the mappability of sequences across the genome. The second pass filters out sites that are not significantly enriched compared to the normalized input DNA and computes a precise enrichment and significance. By Rozowsky J et al. C/Perl.
QuEST - Quantitative Enrichment of Sequence Tags. Sidow and Myers Labs at Stanford. From the 2008 publication Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. (C++)
SISSRs - Site Identification from Short Sequence Reads. BED file input. Raja Jothi @ NIH. Perl.
**See also this thread for ChIP-Seq, until I get time to update this list.

Alternate Base CallingRolexa - R-based framework for base calling of Solexa data. Project publicationAlta-cyclic - “a novel Illumina Genome-Analyzer (Solexa) base caller”

TranscriptomicsERANGE - Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq. Supports Bowtie, BLAT and ELAND. From the Wold lab.
G-Mo.R-Se - G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads. From CNS in France.
MapNext - MapNext: A software tool for spliced and unspliced alignments and SNP detection of short sequence reads. From the Evolutionary Genomics Lab at Sun-Yat Sen University, China.
QPalma - Optimal Spliced Alignments of Short Sequence Reads. Authors are Fabio De Bona, Stephan Ossowski, Korbinian Schneeberger, and Gunnar Rätsch. A paper is available.
RSAT - RSAT: RNA-Seq Analysis Tools. RNASAT is developed and maintained by Hui Jiang at Stanford University.
TopHat - TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons. TopHat is a collaborative effort between the University of Maryland and the University of California, Berkeley

<<<<<<< HEAD:source/_posts/2011-02-17-software-packages-for-next-generation-sequence-analysis-from-seqanswers-com.html =======

Comments

Copyright © 2016 - Bioops - Powered by Octopress | Themed with Whitespace

>>>>>>> d80cd8fa3e1fb5461144707ba04f7385ec6726a7:2011/02/software-packages-for-next-generation-sequence-analysis-from-seqanswers-com/index.html

Perl: Sorting a Two-Dimensional Array

| Comments

copy from http://www.devx.com/tips/Tip/5283

The Perl sort function is useful for alphabetically sorting lists. However, you can’t use it on a list of lists, because once a list starts listing other lists, they cease to be lists and become references instead. By sorting arrays within arrays, it’s possible to gain relational database-like control over data grids.
For example, let’s say I have a list of lists called @biglist. To print all of its unsorted contents, I would write:

for $list_ref ( @biglist ) {
print "@$list_ref\n";
}

To sort @biglist by the first element in each list, I would write:

for $list_ref ( sort { $a->[0] <=> $b->[0] } @biglist ) {
print "@$list_ref\n";
}

If the array element that you wish to sort is not numeric then change the ” to ‘cmp’ to sort asciibetically:

for $list_ref ( sort { $a->[0] cmp $b->[0] } @biglist ) {
print "@$list_ref\n";
}

also, should you wish to not sort in a case-sensitive way:

for $list_ref ( sort { lc($a->[0]) cmp lc($b->[0]) } @biglist ) {
print "@$list_ref\n";
}

用perl操作Clustalw进行多序列比对

| Comments

#multiple sequences alignment (using Clustalw)

Windows下用perl来操作ClustalW进行多序列比对并读取分析其结果
首先要下载clustalw ftp://ftp.ebi.ac.uk/pub/software/clustalw2/2.0.11/clustalw-2.0.11-win.msi 安装后,可以直接运行,通过命令行格式的输入来进行多序列比对,具体实现过程可参见其帮助文件。
这里重点描述在windows下,如何用perl来操作ClustalW。
代码如下

run_clustalw.pl
1
2
3
4
5
my $aln='s1.fa'; #需要输入的序列文件,多个序列必须在一个文件中。
my $clustalw = "d:/ClustalW2/clustalw2"; #clustalw的安装路径
my $tempoutput='aa.aln'; #输出文件
my $system_check=system("$clustalw -KTUPLE=2 -INFILE=$aln -OUTFILE=$tempoutput");
#实现比对,具体参数设置可参考clustalw的帮助文件中的help 9

Windows下用perl操作bl2seq进行两序列比对

| Comments

run_bl2seq.pl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#bl2seq是本地blast中的一个程序,用于两序列比对。
# Two sequence alignment (using bl2seq)
use strict;
use Bio::SearchIO;
#############################
#两个输入文件
my $seq1="m16.fa";
my $seq2="s11.fa";
#输出文件
my $tempoutput="bl2seq.txt";
#设置bl2seq的安装路径
my $bl2seq="D:/blast/bin/bl2seq";
#############################
system("$bl2seq -i $seq1 -j $seq2 -p blastn -F F -o $tempoutput -g T");
#具体参数设置可以参考blast的帮助文件
my ($result,$hit,$hsp);
#实现对比对结果的分析和提取
my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => "$tempoutput");
while( $result = $in->next_result ) {
  while( $hit = $result->next_hit ) {
    while( $hsp = $hit->next_hsp ) {
      if( $hsp->length('total') > 1500 ) {
        #这里可以做很多相关分析,具体可以参考Bio::SearchIO的document
        if ( $hsp->percent_identity >= 80 ) {
          print "Hit= ",       $hit->name,
                ",Length=",     $hsp->length('total'),
                ",Percent_id=", $hsp->percent_identity, "n";
        }
      }
    }
  }
}