博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
怎么检测自己fastq的Phred类型 | phred33 phred64
阅读量:6257 次
发布时间:2019-06-22

本文共 5519 字,大约阅读时间需要 18 分钟。

 http://wiki.bits.vib.be/index.php/Identify_the_Phred_scale_of_quality_scores_used_in_fastQ

#  S - Sanger        Phred+33,  raw reads typically (0, 40)#  X - Solexa        Solexa+64, raw reads typically (-5, 40)#  I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)#  J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)#  L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)

  

一个Perl脚本

#!/usr/bin/perl -w# http://wiki.bits.vib.be/index.php/Identify_the_Phred_scale_of_quality_scores_used_in_fastQuse strict;use File::Basename;use List::MoreUtils qw( minmax );# fastq_detect.pl fastq.file sample-size# detect fastQ format from quality scores in fastQ input file# Version 3## Stephane Plaisance - VIB-BITS - July-04-2012# Joachim Jacob - Aug-02-2012 - joachim.jacob@gmail.com# - changed the maximum value of Sanger to 73# - changed reading the file with a file handle#   (was a file handle !! supporting several archive formats. SP)# - changed the diagnosing algoritm# Stephane Plaisance - VIB-BITS - April-08-2013# - merged both versions and corrected flaw in min/max# thanks to Sergey Mitrfanov for perl reformatting###################################################################### diagnose#   SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.....................................................#   ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX......................#   ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII......................#   .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ......................#   LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL....................................................#   !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~#   |                         |    |        |                              |                     |#  33                        59   64       73                            104                   126# S 0........................26...31.......40# X                          -5....0........9.............................40# I                                0........9.............................40# J                                   3.....9.............................40# L 0.2......................26...31........41##  S - Sanger        Phred+33,  raw reads typically (0, 40)#  X - Solexa        Solexa+64, raw reads typically (-5, 40)#  I - Illumina 1.3+ Phred+64,  raw reads typically (0, 40)#  J - Illumina 1.5+ Phred+64,  raw reads typically (3, 40) with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold)#  L - Illumina 1.8+ Phred+33,  raw reads typically (0, 41)#####################################################################my $script = basename($0);@ARGV gt 0 or die "usage: $script 
\n";my ($inputfile, $limit) = @ARGV;if (! defined $limit) { $limit = 100}; # check first 100 recordsmy $cnt=0;my ($min, $max); # global min and max values# print STDERR "\n## Analysing ".$limit." records from $inputfile ... \n";my $z = ReadFile ($inputfile) || die "Error: cannot read from variant file $inputfile: $!\n";## parsewhile (my $id = <$z>) { $id =~ m/^@/ || die "expected @ not found in line 1!\n"; my $seq = <$z>; my $sep = <$z>; $sep =~ m/^\+/ || die "expected + not found in line 3!\n"; my $qual = <$z>; chomp($qual); $cnt++; $cnt>=$limit && last; # char to ascii my @chars = split("", $qual); my @nums = sort { $a <=> $b } (map { unpack("C*", $_ )} @chars); if ($cnt==1) { ($min, $max) = minmax @nums; } else { my ($lmin, $lmax) = minmax @nums; # local values for this read $lmin<$min ? $min=$lmin : $min=$min; $lmax>$max ? $max=$lmax : $max=$max; }}undef $z;## diagnosemy %diag=( 'Sanger' => '.', 'Solexa' => '.', 'Illumina 1.3+' => '.', 'Illumina 1.5+' => '.', 'Illumina 1.8+' => '.', );my %comment=( 'Sanger' => 'Phred+33, Q[33; 73], (0, 40)', 'Solexa' => 'Solexa+64, Q[59; 104], (-5, 40)', 'Illumina 1.3+' => 'Phred+64, Q[64; 104], (0, 40)', 'Illumina 1.5+' => 'Phred+64, Q[66; 104], (3, 40), with 0=N/A, 1=N/A, 2=Read Segment Quality Control Indicator', 'Illumina 1.8+' => 'Phred+33, Q[33; 74], (0, 41)', );if ($min<33 || $max>104) { die "Quality values corrupt. found [$min; $max] where [33; 104] was expected\n"; }if ($min>=33 && $max<=73) {$diag{'Sanger'}='x';}if ($min>=59 && $max<=104) {$diag{'Solexa'}='x';}if ($min>=64 && $max<=104) {$diag{'Illumina 1.3+'}='x';}if ($min>=66 && $max<=104) {$diag{'Illumina 1.5+'}='x';}if ($min>=33 && $max<=74) {$diag{'Illumina 1.8+'}='x';}## report# print STDERR "# sampled raw quality values are in the range of [".$min."; ".$max."]\n";# print STDERR "# format(s) marked below with 'x' agree with this range\n";foreach my $format (sort keys %diag) { #print STDERR sprintf(" %-13s : %2s [%-30s] \n", $format, $diag{$format}, $comment{$format}); if ($diag{$format} eq "x") {print "$format\n"}}################## Subs ##### reads from uncompressed, gzipped and bgzip fastQ filessub ReadFile { my $infile = shift; my $FH; if ($infile =~ /.bz2$/) { open ($FH, "bzcat $infile |") or die ("$!: can't open file $infile"); } elsif ($infile =~ /.gz$/) { open ($FH, "zcat $infile |") or die ("$!: can't open file $infile"); } elsif ($infile =~ /.fq|.fastq|.txt$/) { open ($FH, "cat $infile |") or die ("$!: can't open file $infile"); } else { die ("$!: do not recognise file type $infile"); } return $FH;}

  

转载地址:http://inxsa.baihongyu.com/

你可能感兴趣的文章
JDBC操作数据库
查看>>
Android中RelativeLayout的字符水平(垂直居中)对齐
查看>>
--@angularJS--独立作用域scope绑定策略之&符策略
查看>>
乾坤合一~Linux设备驱动之USB主机和设备驱动
查看>>
Python IDLE快捷键【转载合集】
查看>>
Bootstrap中glyphicons-halflings-regular.woff字体报404错notfound
查看>>
[C++] string与int, float, double相互转换
查看>>
ubuntu14.04安装chrome
查看>>
oracle中查询含字母的数据[正则表达式]
查看>>
1002. 写这个号码 (20)(数学啊 ZJU_PAT)
查看>>
【LeetCode】224. Basic Calculator
查看>>
Keil V4.72升级到V5.1X之后
查看>>
Google CFO 辞职信
查看>>
POJ2771_Guardian of Decency(二分图/最大独立集=N-最大匹配)
查看>>
Scala深入浅出实战经典之 List伴生对象操作方法代码实战.
查看>>
php 批量处理post数据
查看>>
RESTful架构详解(转)
查看>>
xcode 在哪里新建category、protocol等文件
查看>>
flash flex 程序出现错误 Error #2032
查看>>
【数据库】主键、外键、索引
查看>>