Help

Transgenes typically have all, or all but the first, intron removed. Still, splicing at cryptic splice sites occurs. This is linked to remnant exonic splice enhancer motifs. Removing these motifs alongside with other unwanted functionality will thus improve transgene efficiency.

How it works

Generating variants and selecting the best

The SequenceOptimizer software will generate 1,000 gene variants and select the best by its GC3 content. By default, target GC3 content will be the average GC3 in human one- or two-exon genes (option mean). Alternatively, you may set target GC3 to high or low selecting for highest or lowest GC3 found among generated variants.

Select target GC3

Removing introns determines which sites are checked for ESE resemblance

The first step when generating the gene variants is to remove all introns but the first, or, if requested all introns including the first. This is to determine exonic sites in vicinity to now deleted introns. ESE resemblance will be adjusted at those sites only and at no other position in the gene.

Remove introns

Per default ESE motifs will be depleted (option deplete), select option enrich to enrich them instead.

Select ESE resemblance

Scoring synonymous codons

For each site, synonymous codons are assigned a score and selected with a probability equaling its score. Scores are assigned based on how well the codon matches human codon usage (option humanize; default), alternative strategies include maximizing GC3 content (option max-gc) and matching the position-dependent GC3 content of human one- or two-exon genes (option gc). Should ESE motifs have been provided, a strategy to score by ESE resemblance only may also be chosen (please note: this will affect only sites near deleted introns, at all other sites the sequence will remain unchanged; option raw).

Select scoring strategy

At sites in vicinity to deleted introns the codon score is a mixture of strategy-score and ESE resemblance-score. You may chose to adjust ESE resemblance at all sites instead of only at sites near deleted introns. This is not recommended as it is against our current understanding of ESEs, but may prove useful at times, e.g. when tweaking natural one-exon genes.

Select ESE scoring strategy

Synonymous codons at 6-fold degenerate sites

At 6-fold degenerate sites (leucine, serine or alanine positions) all six synonymous codons are scored per default. You can specify to restrict codons to those of the respective 2- or 4-codon sub-box instead.

Sixfold degenerate sites

Dealing with restriction sites

To preserve restriction sites already present in the sequence, please provide the corresponding recognition sequence(s) in the keep intact input tab. The SequenceOptimizer software will leave those sites intact when tweaking the gene.

Similarly, you may specify recognition sequences that are to be avoided. Please note: this will not remove restriction sites that are already present in the gene.

You may provide sites to keep intact and sites to avoid both combined or individually.

Diagrammatic representation of the workflow

Options and their effects on the generated variant

Sample data

(Click to inspect different sample input data)

>hg38_refGene_NM_000518 range=chr11:5225598-5227021 5'pad=0 3'pad=0 strand=- repeatMasking=none
ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGG
CAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGgttggtat
caaggttacaagacaggtttaaggagaccaatagaaactgggcatgtgga
gacagagaagactcttgggtttctgataggcactgactctctctgcctat
tggtctattttcccacccttagGCTGCTGGTGGTCTACCCTTGGACCCAG
AGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGG
CAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTG
ATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGT
GAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGgtgag
tctatgggacgcttgatgttttctttccccttcttttctatggttaagtt
catgtcataggaaggggataagtaacagggtacagtttagaatgggaaac
agacgaatgattgcatcagtgtggaagtctcaggatcgttttagtttctt
ttatttgctgttcataacaattgttttcttttgtttaattcttgctttct
ttttttttcttctccgcaatttttactattatacttaatgccttaacatt
gtgtataacaaaaggaaatatctctgagatacattaagtaacttaaaaaa
aaactttacacagtctgcctagtacattactatttggaatatatgtgtgc
ttatttgcatattcataatctccctactttattttcttttatttttaatt
gatacataatcattatacatatttatgggttaaagtgtaatgttttaata
tgtgtacacatattgaccaaatcagggtaattttgcatttgtaattttaa
aaaatgctttcttcttttaatatacttttttgtttatcttatttctaata
ctttccctaatctctttctttcagggcaataatgatacaatgtatcatgc
ctctttgcaccattctaaagaataacagtgataatttctgggttaaggca
atagcaatatctctgcatataaatatttctgcatataaattgtaactgat
gtaagaggtttcatattgctaatagcagctacaatccagctaccattctg
cttttattttatggttgggataaggctggattattctgagtccaagctag
gcccttttgctaatcatgttcatacctcttatcttcctcccacagCTCCT
GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCACTAA

LOCUS NC_000011 1606 bp DNA linear CON 12-MAR-2015 DEFINITION Homo sapiens chromosome 11, GRCh38.p2 Primary Assembly. ACCESSION NC_000011 REGION: complement(5225466..5227071) GPC_000001303 VERSION NC_000011.10 GI:568815587 DBLINK BioProject: PRJNA168 Assembly: GCF_000001405.28 KEYWORDS RefSeq. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1606) AUTHORS Taylor,T.D., Noguchi,H., Totoki,Y., Toyoda,A., Kuroki,Y., Dewar,K., Lloyd,C., Itoh,T., Takeda,T., Kim,D.W., She,X., Barlow,K.F., Bloom,T., Bruford,E., Chang,J.L., Cuomo,C.A., Eichler,E., FitzGerald,M.G., Jaffe,D.B., LaButti,K., Nicol,R., Park,H.S., Seaman,C., Sougnez,C., Yang,X., Zimmer,A.R., Zody,M.C., Birren,B.W., Nusbaum,C., Fujiyama,A., Hattori,M., Rogers,J., Lander,E.S. and Sakaki,Y. TITLE Human chromosome 11 DNA sequence and analysis including novel gene identification JOURNAL Nature 440 (7083), 497-500 (2006) PUBMED 16554811 REFERENCE 2 (bases 1 to 1606) CONSRTM International Human Genome Sequencing Consortium TITLE Finishing the euchromatic sequence of the human genome JOURNAL Nature 431 (7011), 931-945 (2004) PUBMED 15496913 REFERENCE 3 (bases 1 to 1606) AUTHORS Lander,E.S. et. al. CONSRTM International Human Genome Sequencing Consortium TITLE Initial sequencing and analysis of the human genome JOURNAL Nature 409 (6822), 860-921 (2001) PUBMED 11237011 REMARK Erratum:[Nature 2001 Aug 2;412(6846):565] COMMENT REFSEQ INFORMATION: The reference sequence is identical to CM000673.2. On Feb 3, 2014 this sequence version replaced gi:224589802. Assembly Name: GRCh38.p2 Primary Assembly The DNA sequence is composed of genomic sequence, primarily finished clones that were sequenced as part of the Human Genome Project. PCR products and WGS shotgun sequence have been added where necessary to fill gaps or correct errors. All such additions are manually curated by GRC staff. For more information see: http://genomereference.org. ##Genome-Annotation-Data-START## Annotation Provider :: NCBI Annotation Status :: Full annotation Annotation Version :: Homo sapiens Annotation Release 107 Annotation Pipeline :: NCBI eukaryotic genome annotation pipeline Annotation Software Version :: 6.2 Annotation Method :: Best-placed RefSeq; Gnomon Features Annotated :: Gene; mRNA; CDS; ncRNA ##Genome-Annotation-Data-END## FEATURES Location/Qualifiers source 1..1606 /organism=\"Homo sapiens\" /mol_type=\"genomic DNA\" /db_xref=\"taxon:9606\" /chromosome=\"11\" gene 1..1606 /gene=\"HBB\" /gene_synonym=\"beta-globin; CD113t-C\" /note=\"hemoglobin, beta; Derived by automated computational analysis using gene prediction method: Curated Genomic.\" /db_xref=\"GeneID:3043\" /db_xref=\"HGNC:HGNC:4827\" /db_xref=\"MIM:141900\" mRNA join(1..142,273..495,1346..1606) /gene=\"HBB\" /gene_synonym=\"beta-globin; CD113t-C\" /product=\"hemoglobin, beta\" /note=\"Derived by automated computational analysis using gene prediction method: Curated Genomic.\" /transcript_id=\"NM_000518.4\" /db_xref=\"GI:28302128\" /db_xref=\"GeneID:3043\" /db_xref=\"HGNC:HGNC:4827\" /db_xref=\"MIM:141900\" CDS join(51..142,273..495,1346..1474) /gene=\"HBB\" /gene_synonym=\"beta-globin; CD113t-C\" /note=\"beta globin chain; hemoglobin beta chain; Derived by automated computational analysis using gene prediction method: Curated Genomic.\" /codon_start=1 /product=\"hemoglobin subunit beta\" /protein_id=\"NP_000509.1\" /db_xref=\"GI:4504349\" /db_xref=\"CCDS:CCDS7753.1\" /db_xref=\"GeneID:3043\" /db_xref=\"HGNC:HGNC:4827\" /db_xref=\"MIM:141900\" /translation=\"MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFE SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPE NFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH\" ORIGIN 1 acatttgctt ctgacacaac tgtgttcact agcaacctca aacagacacc atggtgcatc 61 tgactcctga ggagaagtct gccgttactg ccctgtgggg caaggtgaac gtggatgaag 121 ttggtggtga ggccctgggc aggttggtat caaggttaca agacaggttt aaggagacca 181 atagaaactg ggcatgtgga gacagagaag actcttgggt ttctgatagg cactgactct 241 ctctgcctat tggtctattt tcccaccctt aggctgctgg tggtctaccc ttggacccag 301 aggttctttg agtcctttgg ggatctgtcc actcctgatg ctgttatggg caaccctaag 361 gtgaaggctc atggcaagaa agtgctcggt gcctttagtg atggcctggc tcacctggac 421 aacctcaagg gcacctttgc cacactgagt gagctgcact gtgacaagct gcacgtggat 481 cctgagaact tcagggtgag tctatgggac gcttgatgtt ttctttcccc ttcttttcta 541 tggttaagtt catgtcatag gaaggggata agtaacaggg tacagtttag aatgggaaac 601 agacgaatga ttgcatcagt gtggaagtct caggatcgtt ttagtttctt ttatttgctg 661 ttcataacaa ttgttttctt ttgtttaatt cttgctttct ttttttttct tctccgcaat 721 ttttactatt atacttaatg ccttaacatt gtgtataaca aaaggaaata tctctgagat 781 acattaagta acttaaaaaa aaactttaca cagtctgcct agtacattac tatttggaat 841 atatgtgtgc ttatttgcat attcataatc tccctacttt attttctttt atttttaatt 901 gatacataat cattatacat atttatgggt taaagtgtaa tgttttaata tgtgtacaca 961 tattgaccaa atcagggtaa ttttgcattt gtaattttaa aaaatgcttt cttcttttaa 1021 tatacttttt tgtttatctt atttctaata ctttccctaa tctctttctt tcagggcaat 1081 aatgatacaa tgtatcatgc ctctttgcac cattctaaag aataacagtg ataatttctg 1141 ggttaaggca atagcaatat ctctgcatat aaatatttct gcatataaat tgtaactgat 1201 gtaagaggtt tcatattgct aatagcagct acaatccagc taccattctg cttttatttt 1261 atggttggga taaggctgga ttattctgag tccaagctag gcccttttgc taatcatgtt 1321 catacctctt atcttcctcc cacagctcct gggcaacgtg ctggtctgtg tgctggccca 1381 tcactttggc aaagaattca ccccaccagt gcaggctgcc tatcagaaag tggtggctgg 1441 tgtggctaat gccctggccc acaagtatca ctaagctcgc tttcttgctg tccaatttct 1501 attaaaggtt cctttgttcc ctaagtccaa ctactaaact gggggatatt atgaagggcc 1561 ttgagcatct ggattctgcc taataaaaaa catttatttt cattgc //

CCTGGA
CGAGGA
CGAAGA
CTGAAG
CAGAAG
CAAGGA
CAAGAT
CAAGAA
CAAAGA
GCAGAA
GCAAGA
GGAGCA
GGAGGA
GGAGAT
GGAGAA
GGAAGA
GTGAAG
GTTGGA
GACCTG
GACCAG

GGATCC
GAATTC