For full functionality of this site it is necessary to enable JavaScript. Please follow this link for instructions how to enable JavaScript in your web browser.

Help

Transgenes typically have all, or all but the first, intron removed. Still, splicing at cryptic splice sites occurs. This is linked to remnant exonic splice enhancer motifs. Removing these motifs alongside with other unwanted functionality will thus improve transgene efficiency.

How it works

Generating variants and selecting the best

The SequenceOptimizer software will generate 1,000 gene variants and select the best by its GC3 content. By default, target GC3 content will be the average GC3 in human one- or two-exon genes (option mean). Alternatively, you may set target GC3 to high or low selecting for highest or lowest GC3 found among generated variants.

Removing introns determines which sites are checked for ESE resemblance

The first step when generating the gene variants is to remove all introns but the first, or, if requested all introns including the first. This is to determine exonic sites in vicinity to now deleted introns. ESE resemblance will be adjusted at those sites only and at no other position in the gene.

Per default ESE motifs will be depleted (option deplete), select option enrich to enrich them instead.

Scoring synonymous codons

For each site, synonymous codons are assigned a score and selected with a probability equaling its score. Scores are assigned based on how well the codon matches human codon usage (option humanize; default), alternative strategies include maximizing GC3 content (option max-gc) and matching the position-dependent GC3 content of human one- or two-exon genes (option gc). Should ESE motifs have been provided, a strategy to score by ESE resemblance only may also be chosen (please note: this will affect only sites near deleted introns, at all other sites the sequence will remain unchanged; option raw).

At sites in vicinity to deleted introns the codon score is a mixture of strategy-score and ESE resemblance-score. You may chose to adjust ESE resemblance at all sites instead of only at sites near deleted introns. This is not recommended as it is against our current understanding of ESEs, but may prove useful at times, e.g. when tweaking natural one-exon genes.

Synonymous codons at 6-fold degenerate sites

At 6-fold degenerate sites (leucine, serine or alanine positions) all six synonymous codons are scored per default. You can specify to restrict codons to those of the respective 2- or 4-codon sub-box instead.

Dealing with restriction sites

To preserve restriction sites already present in the sequence, please provide the corresponding recognition sequence(s) in the keep intact input tab. The SequenceOptimizer software will leave those sites intact when tweaking the gene.

Similarly, you may specify recognition sequences that are to be avoided. Please note: this will not remove restriction sites that are already present in the gene.

You may provide sites to keep intact and sites to avoid both combined or individually.

Diagrammatic representation of the workflow

Options and their effects on the generated variant

Sample data

(Click to inspect different sample input data)

>hg38_refGene_NM_000518 range=chr11:5225598-5227021 5'pad=0 3'pad=0 strand=- repeatMasking=none
ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTGCCCTGTGGGG
CAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGgttggtat
caaggttacaagacaggtttaaggagaccaatagaaactgggcatgtgga
gacagagaagactcttgggtttctgataggcactgactctctctgcctat
tggtctattttcccacccttagGCTGCTGGTGGTCTACCCTTGGACCCAG
AGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGG
CAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTG
ATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGT
GAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGgtgag
tctatgggacgcttgatgttttctttccccttcttttctatggttaagtt
catgtcataggaaggggataagtaacagggtacagtttagaatgggaaac
agacgaatgattgcatcagtgtggaagtctcaggatcgttttagtttctt
ttatttgctgttcataacaattgttttcttttgtttaattcttgctttct
ttttttttcttctccgcaatttttactattatacttaatgccttaacatt
gtgtataacaaaaggaaatatctctgagatacattaagtaacttaaaaaa
aaactttacacagtctgcctagtacattactatttggaatatatgtgtgc
ttatttgcatattcataatctccctactttattttcttttatttttaatt
gatacataatcattatacatatttatgggttaaagtgtaatgttttaata
tgtgtacacatattgaccaaatcagggtaattttgcatttgtaattttaa
aaaatgctttcttcttttaatatacttttttgtttatcttatttctaata
ctttccctaatctctttctttcagggcaataatgatacaatgtatcatgc
ctctttgcaccattctaaagaataacagtgataatttctgggttaaggca
atagcaatatctctgcatataaatatttctgcatataaattgtaactgat
gtaagaggtttcatattgctaatagcagctacaatccagctaccattctg
cttttattttatggttgggataaggctggattattctgagtccaagctag
gcccttttgctaatcatgttcatacctcttatcttcctcccacagCTCCT
GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA
CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCACTAA

LOCUS       NC_000011               1606 bp    DNA     linear   CON 12-MAR-2015
DEFINITION  Homo sapiens chromosome 11, GRCh38.p2 Primary Assembly.
ACCESSION   NC_000011 REGION: complement(5225466..5227071) GPC_000001303
VERSION     NC_000011.10  GI:568815587
DBLINK      BioProject: PRJNA168
            Assembly: GCF_000001405.28
KEYWORDS    RefSeq.
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 1606)
  AUTHORS   Taylor,T.D., Noguchi,H., Totoki,Y., Toyoda,A., Kuroki,Y., Dewar,K.,
            Lloyd,C., Itoh,T., Takeda,T., Kim,D.W., She,X., Barlow,K.F.,
            Bloom,T., Bruford,E., Chang,J.L., Cuomo,C.A., Eichler,E.,
            FitzGerald,M.G., Jaffe,D.B., LaButti,K., Nicol,R., Park,H.S.,
            Seaman,C., Sougnez,C., Yang,X., Zimmer,A.R., Zody,M.C.,
            Birren,B.W., Nusbaum,C., Fujiyama,A., Hattori,M., Rogers,J.,
            Lander,E.S. and Sakaki,Y.
  TITLE     Human chromosome 11 DNA sequence and analysis including novel gene
            identification
  JOURNAL   Nature 440 (7083), 497-500 (2006)
   PUBMED   16554811
REFERENCE   2  (bases 1 to 1606)
  CONSRTM   International Human Genome Sequencing Consortium
  TITLE     Finishing the euchromatic sequence of the human genome
  JOURNAL   Nature 431 (7011), 931-945 (2004)
   PUBMED   15496913
REFERENCE   3  (bases 1 to 1606)
  AUTHORS   Lander,E.S. et. al.
  CONSRTM   International Human Genome Sequencing Consortium
  TITLE     Initial sequencing and analysis of the human genome
  JOURNAL   Nature 409 (6822), 860-921 (2001)
   PUBMED   11237011
  REMARK    Erratum:[Nature 2001 Aug 2;412(6846):565]
COMMENT     REFSEQ INFORMATION: The reference sequence is identical to
            CM000673.2.
            On Feb 3, 2014 this sequence version replaced gi:224589802.
            Assembly Name: GRCh38.p2 Primary Assembly
            The DNA sequence is composed of genomic sequence, primarily
            finished clones that were sequenced as part of the Human Genome
            Project. PCR products and WGS shotgun sequence have been added
            where necessary to fill gaps or correct errors. All such additions
            are manually curated by GRC staff. For more information see:
            http://genomereference.org.

            ##Genome-Annotation-Data-START##
            Annotation Provider         :: NCBI
            Annotation Status           :: Full annotation
            Annotation Version          :: Homo sapiens Annotation Release 107
            Annotation Pipeline         :: NCBI eukaryotic genome annotation
                                           pipeline
            Annotation Software Version :: 6.2
            Annotation Method           :: Best-placed RefSeq; Gnomon
            Features Annotated          :: Gene; mRNA; CDS; ncRNA
            ##Genome-Annotation-Data-END##
FEATURES             Location/Qualifiers
     source          1..1606
                     /organism=\"Homo sapiens\"
                     /mol_type=\"genomic DNA\"
                     /db_xref=\"taxon:9606\"
                     /chromosome=\"11\"
     gene            1..1606
                     /gene=\"HBB\"
                     /gene_synonym=\"beta-globin; CD113t-C\"
                     /note=\"hemoglobin, beta; Derived by automated
                     computational analysis using gene prediction method:
                     Curated Genomic.\"
                     /db_xref=\"GeneID:3043\"
                     /db_xref=\"HGNC:HGNC:4827\"
                     /db_xref=\"MIM:141900\"
     mRNA            join(1..142,273..495,1346..1606)
                     /gene=\"HBB\"
                     /gene_synonym=\"beta-globin; CD113t-C\"
                     /product=\"hemoglobin, beta\"
                     /note=\"Derived by automated computational analysis using
                     gene prediction method: Curated Genomic.\"
                     /transcript_id=\"NM_000518.4\"
                     /db_xref=\"GI:28302128\"
                     /db_xref=\"GeneID:3043\"
                     /db_xref=\"HGNC:HGNC:4827\"
                     /db_xref=\"MIM:141900\"
     CDS             join(51..142,273..495,1346..1474)
                     /gene=\"HBB\"
                     /gene_synonym=\"beta-globin; CD113t-C\"
                     /note=\"beta globin chain; hemoglobin beta chain; Derived
                     by automated computational analysis using gene prediction
                     method: Curated Genomic.\"
                     /codon_start=1
                     /product=\"hemoglobin subunit beta\"
                     /protein_id=\"NP_000509.1\"
                     /db_xref=\"GI:4504349\"
                     /db_xref=\"CCDS:CCDS7753.1\"
                     /db_xref=\"GeneID:3043\"
                     /db_xref=\"HGNC:HGNC:4827\"
                     /db_xref=\"MIM:141900\"
                     /translation=\"MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFE
                     SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPE
                     NFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH\"
ORIGIN
        1 acatttgctt ctgacacaac tgtgttcact agcaacctca aacagacacc atggtgcatc
       61 tgactcctga ggagaagtct gccgttactg ccctgtgggg caaggtgaac gtggatgaag
      121 ttggtggtga ggccctgggc aggttggtat caaggttaca agacaggttt aaggagacca
      181 atagaaactg ggcatgtgga gacagagaag actcttgggt ttctgatagg cactgactct
      241 ctctgcctat tggtctattt tcccaccctt aggctgctgg tggtctaccc ttggacccag
      301 aggttctttg agtcctttgg ggatctgtcc actcctgatg ctgttatggg caaccctaag
      361 gtgaaggctc atggcaagaa agtgctcggt gcctttagtg atggcctggc tcacctggac
      421 aacctcaagg gcacctttgc cacactgagt gagctgcact gtgacaagct gcacgtggat
      481 cctgagaact tcagggtgag tctatgggac gcttgatgtt ttctttcccc ttcttttcta
      541 tggttaagtt catgtcatag gaaggggata agtaacaggg tacagtttag aatgggaaac
      601 agacgaatga ttgcatcagt gtggaagtct caggatcgtt ttagtttctt ttatttgctg
      661 ttcataacaa ttgttttctt ttgtttaatt cttgctttct ttttttttct tctccgcaat
      721 ttttactatt atacttaatg ccttaacatt gtgtataaca aaaggaaata tctctgagat
      781 acattaagta acttaaaaaa aaactttaca cagtctgcct agtacattac tatttggaat
      841 atatgtgtgc ttatttgcat attcataatc tccctacttt attttctttt atttttaatt
      901 gatacataat cattatacat atttatgggt taaagtgtaa tgttttaata tgtgtacaca
      961 tattgaccaa atcagggtaa ttttgcattt gtaattttaa aaaatgcttt cttcttttaa
     1021 tatacttttt tgtttatctt atttctaata ctttccctaa tctctttctt tcagggcaat
     1081 aatgatacaa tgtatcatgc ctctttgcac cattctaaag aataacagtg ataatttctg
     1141 ggttaaggca atagcaatat ctctgcatat aaatatttct gcatataaat tgtaactgat
     1201 gtaagaggtt tcatattgct aatagcagct acaatccagc taccattctg cttttatttt
     1261 atggttggga taaggctgga ttattctgag tccaagctag gcccttttgc taatcatgtt
     1321 catacctctt atcttcctcc cacagctcct gggcaacgtg ctggtctgtg tgctggccca
     1381 tcactttggc aaagaattca ccccaccagt gcaggctgcc tatcagaaag tggtggctgg
     1441 tgtggctaat gccctggccc acaagtatca ctaagctcgc tttcttgctg tccaatttct
     1501 attaaaggtt cctttgttcc ctaagtccaa ctactaaact gggggatatt atgaagggcc
     1561 ttgagcatct ggattctgcc taataaaaaa catttatttt cattgc
//

CCTGGA
CGAGGA
CGAAGA
CTGAAG
CAGAAG
CAAGGA
CAAGAT
CAAGAA
CAAAGA
GCAGAA
GCAAGA
GGAGCA
GGAGGA
GGAGAT
GGAGAA
GGAAGA
GTGAAG
GTTGGA
GACCTG
GACCAG

GGATCC
GAATTC