Assignment: GS 540 HW2 Name: Conor Camplisson Email: concamp@uw.edu Language: Python Runtime: 33.1 sec Fasta 1: CP003913.fna Non-alphabetic characters: 681 >gi|440453185|gb|CP003913.1| Mycoplasma pneumoniae M129-B7, complete genome *=816373 A=249201 C=162924 G=163697 T=240551 N=0 Nucleotide Frequencies: A=0.3053 C=0.1996 G=0.2005 T=0.2947 Dinucleotide Count Matrix: A=98512 50763 47914 52012 C=53047 36681 26746 46450 G=40870 37148 36764 48915 T=56772 38332 52273 93173 Dinucleotide Frequency Matrix: A=0.1207 0.0622 0.0587 0.0637 C=0.0650 0.0449 0.0328 0.0569 G=0.0501 0.0455 0.0450 0.0599 T=0.0695 0.0470 0.0640 0.1141 Conditional Frequency Matrix: A=0.3953 0.2037 0.1923 0.2087 C=0.3256 0.2251 0.1642 0.2851 G=0.2497 0.2269 0.2246 0.2988 T=0.2360 0.1594 0.2173 0.3873 Fasta 2: simulated_equal_freq.fa Non-alphabetic characters: 0 >simulated_equal_freq *=816373 A=204496 C=203794 G=204159 T=203924 N=0 Nucleotide Frequencies: A=0.2505 C=0.2496 G=0.2501 T=0.2498 Dinucleotide Count Matrix: A=51122 51212 51114 51048 C=50998 50966 50784 51045 G=51138 50840 51232 50949 T=51238 50775 51029 50882 Dinucleotide Frequency Matrix: A=0.0626 0.0627 0.0626 0.0625 C=0.0625 0.0624 0.0622 0.0625 G=0.0626 0.0623 0.0628 0.0624 T=0.0628 0.0622 0.0625 0.0623 Conditional Frequency Matrix: A=0.2500 0.2504 0.2500 0.2496 C=0.2502 0.2501 0.2492 0.2505 G=0.2505 0.2490 0.2509 0.2496 T=0.2513 0.2490 0.2502 0.2495 Fasta 3: simulated_markov_0.fa Non-alphabetic characters: 0 >simulated_markov_0 *=816373 A=248748 C=162830 G=164026 T=240769 N=0 Nucleotide Frequencies: A=0.3047 C=0.1995 G=0.2009 T=0.2949 Dinucleotide Count Matrix: A=75320 49910 50241 73276 C=49626 32300 32789 48115 G=50322 32668 32614 48422 T=73480 47952 48382 70955 Dinucleotide Frequency Matrix: A=0.0923 0.0611 0.0615 0.0898 C=0.0608 0.0396 0.0402 0.0589 G=0.0616 0.0400 0.0399 0.0593 T=0.0900 0.0587 0.0593 0.0869 Conditional Frequency Matrix: A=0.3028 0.2006 0.2020 0.2946 C=0.3048 0.1984 0.2014 0.2955 G=0.3068 0.1992 0.1988 0.2952 T=0.3052 0.1992 0.2009 0.2947 Fasta 4: simulated_markov_1.fa Non-alphabetic characters: 0 >simulated_markov_1 *=816373 A=249747 C=162576 G=163642 T=240408 N=0 Nucleotide Frequencies: A=0.3059 C=0.1991 G=0.2005 T=0.2945 Dinucleotide Count Matrix: A=99263 50746 47775 51963 C=52971 36546 26807 46252 G=40731 37256 36898 48757 T=56782 38028 52161 93436 Dinucleotide Frequency Matrix: A=0.1216 0.0622 0.0585 0.0637 C=0.0649 0.0448 0.0328 0.0567 G=0.0499 0.0456 0.0452 0.0597 T=0.0696 0.0466 0.0639 0.1145 Conditional Frequency Matrix: A=0.3975 0.2032 0.1913 0.2081 C=0.3258 0.2248 0.1649 0.2845 G=0.2489 0.2277 0.2255 0.2979 T=0.2362 0.1582 0.2170 0.3887 Run 1: Fasta 1: CP001872.fna Non-alphabetic characters: 851 >gi|284930242|gb|CP001872.1| Mycoplasma gallisepticum str. R(high), complete genome *=1012027 A=349322 C=159094 G=159365 T=344246 N=0 Fasta 2: simulated_equal_freq.fa Non-alphabetic characters: 0 >simulated_equal_freq *=816373 A=204496 C=203794 G=204159 T=203924 N=0 Match Length Histogram: 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 2096 9 208383 10 473117 11 233873 12 69924 13 18424 14 4674 15 1174 16 277 17 57 18 13 19 6 20 1 The longest match length: 20 Number of match strings: 1 Match string: TAATTTTCTAAATAAAATTA Fasta: CP001872.fna Position: 18193 Strand: forward Fasta: simulated_equal_freq.fa Position: 704510 Strand: forward Run 2: Fasta 1: CP001872.fna Non-alphabetic characters: 851 >gi|284930242|gb|CP001872.1| Mycoplasma gallisepticum str. R(high), complete genome *=1012027 A=349322 C=159094 G=159365 T=344246 N=0 Fasta 2: simulated_markov_0.fa Non-alphabetic characters: 0 >simulated_markov_0 *=816373 A=249008 C=163032 G=163723 T=240610 N=0 Match Length Histogram: 1 1 2 1 3 1 4 1 5 1 6 1 7 12 8 3443 9 105638 10 374008 11 330929 12 138605 13 42853 14 12099 15 3257 16 865 17 227 18 62 19 22 20 1 The longest match length: 20 Number of match strings: 1 Match string: AAATTACAATAACTACACAA Fasta: simulated_markov_0.fa Position: 193796 Strand: forward Fasta: CP001872.fna Position: 903456 Strand: forward Run 3: Fasta 1: CP001872.fna Non-alphabetic characters: 851 >gi|284930242|gb|CP001872.1| Mycoplasma gallisepticum str. R(high), complete genome *=1012027 A=349322 C=159094 G=159365 T=344246 N=0 Fasta 2: simulated_markov_1.fa Non-alphabetic characters: 0 >simulated_markov_1 *=816373 A=248913 C=162513 G=163996 T=240951 N=0 Match Length Histogram: 1 1 2 1 3 1 4 1 5 1 6 1 7 29 8 10637 9 136998 10 336808 11 302824 12 149037 13 53277 14 16046 15 4604 16 1274 17 351 18 100 19 28 20 6 21 1 22 1 The longest match length: 22 Number of match strings: 1 Match string: TATCATTAAACTCAAACAACTC Fasta: CP001872.fna Position: 173650 Strand: forward Fasta: simulated_markov_1.fa Position: 384424 Strand: forward Question 4-A: [ TODO ] Question 4-B: [ TODO ] Question 4-C: [ TODO ] Program: int main() { do_analysis(); return 0; }