Autofinish and Consed Improvements --------------------------------------------------------------------- Autofinish Improvements More Universal Primer Reads Rather than Oligo Walks Autofinish will first try to fix every problem using universal primer reads. Only if it can't fix a problem using universal primer reads will it use a walk. Thus the number of walks is reduced. Model Read: There is no longer a "high quality" region of potential autofinish reads. Instead, autofinish samples all of the reads currently in the assembly (mainly shotgun reads) and constructs a "model" read based on those. Such a read will typically have quality that is poor up to base 50 or so, then increases, and then decreases again after base 400 or so. Autofinish will assume that every finishing read it suggests will have a quality profile exactly like this model read. Gap closing is much improved: There is a minimum overlap that any gap-closing read must have with the high quality segment of the consensus. This prevents gap-closing reads from sticking so far out into the gap that phrap doesn't assemble them with the contig and thus they actually do little good in closing the gap. Optionally, Autofinish will call all reverses close to a gap in the hope that they will close the gap. Optionally, Autofinish will call minilibraries to try to close the gap. Gap closing and solving weak regions have now been combined. This decreases the number of reads necessary to both close the gap and fix the low quality region at the end of the contig. Resequencing reads can be used to fix weak regions, but they are not used to close gaps (customizable). Redundancy: Autofinish will (optionally) call up to twice as many universal primer reads for each problem region. Filename Convention: We originally tried to impose a filename convention on you, but many of you didn't follow it. So we gave you determineReadTypes.perl which allows you to have any filenaming convention you want, but you must modify determineReadTypes.perl in order to use it. Then many of you said you couldn't program in perl. So now we give you a choice: you either follow the St Louis naming convention or else you program in perl. Most of the St Louis naming convention (but not all) is explaned in the phrap documentation. In addition, you must never use an underscore in the name if the read is a universal primer forward or universal primer reverse read. If the read is a walk, then you must have an underscore (_) follow the template name and then have a number (the oligo number). Examples of reads in the St Louis naming convention: read eeq03a01.g1.phd.1 is univ rev template: eeq03a01 library: eeq03 read eeq03a02.b1.phd.1 is univ fwd template: eeq03a02 library: eeq03 read eeq03a02.g1.phd.1 is univ rev template: eeq03a02 library: eeq03 read eeq03a03.b1.phd.1 is univ fwd template: eeq03a03 library: eeq03 read eej45h07_2.i1.phd.1 is walk template: eej45h07 library: eej45 read eej46c12_1.i1.phd.1 is walk template: eej46c12 library: eej46 Which Libraries Cannot be Used for Finishing? In addition to the badTemplates.txt file, you can use a badLibraries.txt file which contains a list of all libraries that are off-limits to Autofinish (e.g., you threw away all subclone templates from this library). Autofinish for cDNA Finishing We have made Autofinish so that it will work for cDNA finishing. See README.txt for more information. Libraries with Different Insert Sizes If you are finishing with large insert and small insert libraries, clearly Autofinish must know which is which in order to determine the location of reverses and which templates can be used for a particular oligo walk. Autofinish can now distinguish different libraries with different insert sizes. Improved Subclone Template Picking Templates used to be chosen by quality. Now they are picked by position (this is customizeable). If a template has a fwd/rev pair (and will make a long-enough read), it is chosen first, followed by templates whose vector/insert junction is closes to the primer. When Autofinish outputs these templates, it now indicates which strand of the template to use. --------------------------------------------------------------------- Consed: Picks PCR Primer Pairs Inexact Search For String (Based on University of Arizona agrep algorithm.) Integrating Consed with Other Programs and Databases You can configure particular keys in Consed so that when the user types those keys in the Traces Window, your external program is run. The arguments passed to your program include the read name, read position, contig name, and contig position. Thus you can easily configure Consed so that, for example, any time the user finds a polymorphic site and types control-N, that site is recorded in your Oracle database. See README.txt for more information. Reads ESD (Megabase Sequencer) files Any tag can now have a comment Showing tags automatically As you move the pointer over a tag, its type/comment is automatically shown in the box. (Thanks to Pat Minx for suggesting this. I believe he got the idea from GAP.) Naming Contigs You know how contigs have the names Contig1, Contig2, Contig3... and each time you re-assemble, the names change? Well, now you can assign custom names (that's right--you can call your favorite contig "George") and the name will stay with the contig, even after you re-assemble. You do this by applying a contigName consensus tag. Faster Startup You can choose to have Consed startup faster at the expense of more disk space. In many situations, this will greatly speed up consed startup. The amount of speedup depends on which operating system is used: on Linux, the time to read phd files dropped from 75 seconds to 8 seconds, and thus the total time to start up consed dropped from 86 seconds to 17 seconds. I saw similar speedups on Solaris where the phd files are on an nfs mounted disk. However, there was another situation in which the startup time was the same. The additional disk space needed is the total size of all the PHD files. Many small user-friendly features E.g., Button to clear the read name box, If you click on a read name, you can paste that name into another X application, etc. Oligo tags shows orientation (a red "arrow") This is true *even* after re-assembling. Exporting the Consensus in Phd Format You used to be able to just export the consensus in fasta format. Now you can also export it in phd format so you can easily use it as a fake read in a different assembly.