ID   POLG_WNV                Reviewed;        3430 AA.
AC   P06935;
DT   01-JAN-1988, integrated into UniProtKB/Swiss-Prot.
DT   24-OCT-2003, sequence version 2.
DT   24-JUL-2007, entry version 82.
DE   Genome polyprotein [Contains: Capsid protein C (Core protein);
DE   Envelope protein M (Matrix protein); Major envelope protein E; Non-
DE   structural protein 1 (NS1); Non-structural protein 2A (NS2A);
DE   Flavivirin protease NS2B regulatory subunit; Flavivirin protease NS3
DE   catalytic subunit (EC 3.4.21.91); Non-structural protein 4A (NS4A);
DE   Non-structural protein 4B (NS4B); RNA-directed RNA polymerase
DE   (EC 2.7.7.48) (NS5)].
OS   West Nile virus (WNV).
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae;
OC   Flavivirus; Japanese encephalitis virus group.
OX   NCBI_TaxID=11082;
OH   NCBI_TaxID=7158; Aedes.
OH   NCBI_TaxID=34610; Amblyomma variegatum.
OH   NCBI_TaxID=8782; Aves.
OH   NCBI_TaxID=53527; Culex.
OH   NCBI_TaxID=9606; Homo sapiens (Human).
OH   NCBI_TaxID=34627; Hyalomma marginatum.
OH   NCBI_TaxID=308735; Mansonia uniformis.
OH   NCBI_TaxID=308737; Mimomyia.
OH   NCBI_TaxID=34630; Rhipicephalus.
RN   [1]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA].
RX   MEDLINE=86124703; PubMed=3753811; DOI=10.1016/0042-6822(86)90082-6;
RA   Castle E., Leidner U., Nowak T., Wengler G., Wengler G.;
RT   "Primary structure of the West Nile flavivirus genome region coding
RT   for all nonstructural proteins.";
RL   Virology 149:10-26(1986).
RN   [2]
RP   SEQUENCE REVISION TO 1908; 2018-2036; 2242 AND 2859-2860.
RX   MEDLINE=21176376; PubMed=11277701; DOI=10.1006/viro.2000.0795;
RA   Yamshchikov V.F., Wengler G., Perelygin A.A., Brinton M.A.,
RA   Compans R.W.;
RT   "An infectious clone of the West Nile flavivirus.";
RL   Virology 281:294-304(2001).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 1-291.
RX   MEDLINE=85274372; PubMed=2992152; DOI=10.1016/0042-6822(85)90156-4;
RA   Castle E., Nowak T., Leidner U., Wengler G., Wengler G.;
RT   "Sequence analysis of the viral core protein and the membrane-
RT   associated proteins V1 and NV2 of the flavivirus West Nile virus and
RT   of the genome sequence for these proteins.";
RL   Virology 145:227-236(1985).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 255-854.
RX   MEDLINE=86072082; PubMed=3855247; DOI=10.1016/0042-6822(85)90129-1;
RA   Wengler G., Castle E., Leidner U., Nowak T., Wengler G.;
RT   "Sequence analysis of the membrane protein V3 of the flavivirus West
RT   Nile virus and of its gene.";
RL   Virology 147:264-274(1985).
RN   [5]
RP   DISULFIDE BONDS IN E PROTEIN.
RX   MEDLINE=87122143; PubMed=3811228; DOI=10.1016/0042-6822(87)90443-0;
RA   Nowak T., Wengler G.;
RT   "Analysis of disulfides present in the membrane proteins of the West
RT   Nile flavivirus.";
RL   Virology 156:127-137(1987).
CC   -!- FUNCTION: The small proteins NS2A, NS4A and NS4B are hydrophobic,
CC       suggesting a possible membrane-related function. NS5 may play a
CC       role in the viral RNA replication. The NS2B/NS3 protease complex
CC       processes the viral polyprotein.
CC   -!- CATALYTIC ACTIVITY: Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds
CC       in which each of the Xaa can be either Arg or Lys and Yaa can be
CC       either Ser or Ala.
CC   -!- CATALYTIC ACTIVITY: Nucleoside triphosphate + RNA(n) = diphosphate
CC       + RNA(n+1).
CC   -!- SUBUNIT: NS3 and NS2B form a heterodimer. NS3 is the catalytic
CC       subunit, whereas NS2B strongly stimulates the latter (By
CC       similarity).
CC   -!- INTERACTION:
CC       P05106:ITGB3 (xeno); NbExp=2; IntAct=EBI-981051, EBI-702847;
CC   -!- PTM: Specific enzymatic cleavages in vivo yield mature proteins
CC       (By similarity).
CC   -!- MISCELLANEOUS: The virion of this virus is a nucleocapsid covered
CC       by a lipoprotein envelope. The envelope contains two proteins: the
CC       protein M and glycoprotein E. The nucleocapsid is a complex of
CC       protein C and mRNA. In immature particles, there are 60
CC       icosaedrally organized trimeric spikes on the surface. Each spike
CC       consists of three heterodimers of envelope protein M precursor
CC       (prM) and envelope protein E (By similarity).
CC   -!- SIMILARITY: Contains 1 helicase ATP-binding domain.
CC   -!- SIMILARITY: Contains 1 helicase C-terminal domain.
CC   -!- SIMILARITY: Contains 1 peptidase S7 domain.
CC   -!- SIMILARITY: Contains 1 RdRp catalytic domain.
CC   ---------------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   ---------------------------------------------------------------------------
DR   EMBL; M12294; AAA48498.2; -; Genomic_RNA.
DR   PIR; A25256; GNWVWV.
DR   PDB; 2FP7; X-ray; A=1420-1466, B=1517-1688.
DR   SMR; P06935; 25-97.
DR   IntAct; P06935; -.
DR   MEROPS; S07.001; -.
DR   GO; GO:0005515; F:protein binding; IPI:IntAct.
DR   InterPro; IPR014001; DEAD-like_N.
DR   InterPro; IPR002464; DEAH_box.
DR   InterPro; IPR013756; Flav_glyE_cen_2.
DR   InterPro; IPR011999; Flav_glyE_cen_dm.
DR   InterPro; IPR013754; Flav_glyE_dim.
DR   InterPro; IPR001122; Flavi_capsidC.
DR   InterPro; IPR011492; Flavi_DEAD.
DR   InterPro; IPR000069; Flavi_M.
DR   InterPro; IPR001157; Flavi_NS1.
DR   InterPro; IPR000752; Flavi_NS2A.
DR   InterPro; IPR000487; Flavi_NS2B.
DR   InterPro; IPR000404; Flavi_NS4A.
DR   InterPro; IPR001528; Flavi_NS4B.
DR   InterPro; IPR002535; Flavi_propep.
DR   InterPro; IPR000336; Flv_glyE_Ig-like.
DR   InterPro; IPR014412; Gen_Poly_FLV.
DR   InterPro; IPR014021; Helic_SF1/SF2_ATP_bd.
DR   InterPro; IPR001650; Helicase_C.
DR   InterPro; IPR001850; Peptidase_S7.
DR   InterPro; IPR000208; RNA_pol_flaviviral.
DR   InterPro; IPR007094; RNA_pol_PSvir.
DR   InterPro; IPR002877; RrmJFtsJ_mtfrase.
DR   Gene3D; G3DSA:3.30.67.10; Flav_glyE_cen_2; 1.
DR   Gene3D; G3DSA:2.60.98.10; Flav_glyE_dim; 1.
DR   Gene3D; G3DSA:2.60.40.350; Flv_glyE_Ig-like; 1.
DR   Pfam; PF01003; Flavi_capsid; 1.
DR   Pfam; PF07652; Flavi_DEAD; 1.
DR   Pfam; PF02832; Flavi_glycop_C; 1.
DR   Pfam; PF00869; Flavi_glycoprot; 1.
DR   Pfam; PF01004; Flavi_M; 1.
DR   Pfam; PF00948; Flavi_NS1; 1.
DR   Pfam; PF01005; Flavi_NS2A; 1.
DR   Pfam; PF01002; Flavi_NS2B; 1.
DR   Pfam; PF01350; Flavi_NS4A; 1.
DR   Pfam; PF01349; Flavi_NS4B; 1.
DR   Pfam; PF00972; Flavi_NS5; 1.
DR   Pfam; PF01570; Flavi_propep; 1.
DR   Pfam; PF01728; FtsJ; 1.
DR   Pfam; PF00271; Helicase_C; 1.
DR   Pfam; PF00949; Peptidase_S7; 1.
DR   PIRSF; PIRSF003817; Gen_Poly_FLV; 1.
DR   ProDom; PD001496; Flavi_NS1; 1.
DR   SMART; SM00487; DEXDc; 1.
DR   SMART; SM00490; HELICc; 1.
DR   PROSITE; PS00690; DEAH_ATP_HELICASE; FALSE_NEG.
DR   PROSITE; PS51192; HELICASE_ATP_BIND_1; 1.
DR   PROSITE; PS51194; HELICASE_CTER; 1.
DR   PROSITE; PS50507; RDRP_SSRNA_POS; 1.
PE   1: Evidence at protein level;
KW   3D-structure; ATP-binding; Capsid protein;
KW   Cleavage on pair of basic residues; Core protein; Envelope protein;
KW   Glycoprotein; Helicase; Hydrolase; Membrane; Nucleotide-binding;
KW   Nucleotidyltransferase; RNA replication; RNA-directed RNA polymerase;
KW   Transferase; Transmembrane; Virion.
FT   CHAIN         1    123       Capsid protein C.
FT                                /FTId=PRO_0000037743.
FT   INIT_MET      1      1       Removed; by host.
FT   PROPEP      124    215
FT                                /FTId=PRO_0000037744.
FT   CHAIN       216    290       Envelope protein M.
FT                                /FTId=PRO_0000037745.
FT   CHAIN       291    787       Major envelope protein E.
FT                                /FTId=PRO_0000037746.
FT   CHAIN       788   1139       Non-structural protein 1.
FT                                /FTId=PRO_0000037747.
FT   CHAIN      1140   1370       Non-structural protein 2A.
FT                                /FTId=PRO_0000037748.
FT   CHAIN      1371   1501       Flavivirin protease NS2B regulatory
FT                                subunit.
FT                                /FTId=PRO_0000037749.
FT   CHAIN      1502   2120       Flavivirin protease NS3 catalytic
FT                                subunit.
FT                                /FTId=PRO_0000037750.
FT   CHAIN      2121   2269       Non-structural protein 4A.
FT                                /FTId=PRO_0000037751.
FT   CHAIN      2270   2525       Non-structural protein 4B.
FT                                /FTId=PRO_0000037752.
FT   CHAIN      2526   3430       RNA-directed RNA polymerase.
FT                                /FTId=PRO_0000037753.
FT   DOMAIN     1508   1679       Peptidase S7.
FT   DOMAIN     1682   1838       Helicase ATP-binding.
FT   DOMAIN     1849   2014       Helicase C-terminal.
FT   DOMAIN     3055   3207       RdRp catalytic.
FT   NP_BIND    1695   1702       ATP (Potential).
FT   REGION      388    401       Involved in fusion.
FT   MOTIF      1786   1789       DEAH box.
FT   ACT_SITE   1552   1552       Charge relay system (By similarity).
FT   ACT_SITE   1576   1576       Charge relay system (By similarity).
FT   ACT_SITE   1636   1636       Charge relay system (By similarity).
FT   CARBOHYD    138    138       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    917    917       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    962    962       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    994    994       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   1289   1289       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   2336   2336       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   2489   2489       N-linked (GlcNAc...) (Potential).
FT   DISULFID    293    320
FT   DISULFID    350    406
FT   DISULFID    364    395
FT   DISULFID    382    411
FT   DISULFID    476    574
FT   DISULFID    591    622
FT   STRAND     1423   1428
FT   STRAND     1444   1449
FT   STRAND     1455   1457
FT   STRAND     1522   1527
FT   STRAND     1536   1543
FT   STRAND     1546   1550
FT   HELIX      1551   1554
FT   STRAND     1559   1561
FT   STRAND     1564   1566
FT   STRAND     1568   1572
FT   TURN       1573   1576
FT   STRAND     1577   1583
FT   STRAND     1592   1594
FT   STRAND     1596   1600
FT   STRAND     1608   1612
FT   STRAND     1615   1619
FT   STRAND     1622   1627
FT   HELIX      1633   1635
FT   STRAND     1639   1641
FT   STRAND     1647   1651
FT   STRAND     1654   1656
FT   STRAND     1662   1665
FT   STRAND     1690   1694
FT   TURN       1701   1704
FT   HELIX      1705   1715
FT   STRAND     1720   1726
FT   HELIX      1727   1736
FT   STRAND     1741   1744
FT   STRAND     1759   1763
FT   HELIX      1764   1772
FT   STRAND     1773   1775
FT   STRAND     1781   1786
FT   TURN       1787   1789
FT   HELIX      1793   1808
FT   STRAND     1812   1819
FT   STRAND     1835   1838
FT   HELIX      1851   1855
FT   STRAND     1860   1863
FT   HELIX      1867   1878
FT   TURN       1879   1881
FT   STRAND     1884   1887
FT   TURN       1889   1891
FT   HELIX      1892   1900
FT   STRAND     1905   1909
FT   HELIX      1911   1913
FT   STRAND     1921   1926
FT   STRAND     1929   1936
FT   STRAND     1938   1940
FT   STRAND     1942   1950
FT   HELIX      1953   1960
FT   STRAND     1973   1976
FT   HELIX      1989   1997
FT   HELIX      2012   2014
FT   TURN       2015   2017
FT   HELIX      2029   2041
FT   HELIX      2046   2054
FT   HELIX      2063   2065
FT   HELIX      2070   2072
FT   STRAND     2082   2084
FT   STRAND     2090   2092
FT   STRAND     2096   2099
FT   HELIX      2100   2102
FT   HELIX      2106   2116
SQ   SEQUENCE   3430 AA;  380110 MW;  42D71B7CB12DC45B CRC64;
     MSKKPGGPGK NRAVNMLKRG MPRGLSLIGL KRAMLSLIDG KGPIRFVLAL LAFFRFTAIA
     PTRAVLDRWR GVNKQTAMKH LLSFKKELGT LTSAINRRST KQKKRGGTAG FTILLGLIAC
     AGAVTLSNFQ GKVMMTVNAT DVTDVITIPT AAGKNLCIVR AMDVGYLCED TITYECPVLA
     AGNDPEDIDC WCTKSSVYVR YGRCTKTRHS RRSRRSLTVQ THGESTLANK KGAWLDSTKA
     TRYLVKTESW ILRNPGYALV AAVIGWMLGS NTMQRVVFAI LLLLVAPAYS FNCLGMSNRD
     FLEGVSGATW VDLVLEGDSC VTIMSKDKPT IDVKMMNMEA ANLADVRSYC YLASVSDLST
     RAACPTMGEA HNEKRADPAF VCKQGVVDRG WGNGCGLFGK GSIDTCAKFA CTTKATGWII
     QKENIKYEVA IFVHGPTTVE SHGKIGATQA GRFSITPSAP SYTLKLGEYG EVTVDCEPRS
     GIDTSAYYVM SVGEKSFLVH REWFMDLNLP WSSAGSTTWR NRETLMEFEE PHATKQSVVA
     LGSQEGALHQ ALAGAIPVEF SSNTVKLTSG HLKCRVKMEK LQLKGTTYGV CSKAFKFART
     PADTGHGTVV LELQYTGTDG PCKVPISSVA SLNDLTPVGR LVTVNPFVSV ATANSKVLIE
     LEPPFGDSYI VVGRGEQQIN HHWHKSGSSI GKAFTTTLRG AQRLAALGDT AWDFGSVGGV
     FTSVGKAIHQ VFGGAFRSLF GGMSWITQGL LGALLLWMGI NARDRSIAMT FLAVGGVLLF
     LSVNVHADTG CAIDIGRQEL RCGSGVFIHN DVEAWMDRYK FYPETPQGLA KIIQKAHAEG
     VCGLRSVSRL EHQMWEAIKD ELNTLLKENG VDLSVVVEKQ NGMYKAAPKR LAATTEKLEM
     GWKAWGKSII FAPELANNTF VIDGPETEEC PTANRAWNSM EVEDFGFGLT STRMFLRIRE
     TNTTECDSKI IGTAVKNNMA VHSDLSYWIE SGLNDTWKLE RAVLGEVKSC TWPETHTLWG
     DGVLESDLII PITLAGPRSN HNRRPGYKTQ NQGPWDEGRV EIDFDYCPGT TVTISDSCEH
     RGPAARTTTE SGKLITDWCC RSCTLPPLRF QTENGCWYGM EIRPTRHDEK TLVQSRVNAY
     NADMIDPFQL GLMVVFLATQ EVLRKRWTAK ISIPAIMLAL LVLVFGGITY TDVLRYVILV
     GAAFAEANSG GDVVHLALMA TFKIQPVFLV ASFLKARWTN QESILLMLAA AFFQMAYYDA
     KNVLSWEVPD VLNSLSVAWM ILRAISFTNT SNVVVPLLAL LTPGLKCLNL DVYRILLLMV
     GVGSLIKEKR SSAAKKKGAC LICLALASTG VFNPMILAAG LMACDPNRKR GWPATEVMTA
     VGLMFAIVGG LAELDIDSMA IPMTIAGLMF AAFVISGKST DMWIERTADI TWESDAEITG
     SSERVDVRLD DDGNFQLMND PGAPWKIWML RMACLAISAY TPWAILPSVI GFWITLQYTK
     RGGVLWDTPS PKEYKKGDTT TGVYRIMTRG LLGSYQAGAG VMVEGVFHTL WHTTKGAALM
     SGEGRLDPYW GSVKEDRLCY GGPWKLQHKW NGHDEVQMIV VEPGKNVKNV QTKPGVFKTP
     EGEIGAVTLD YPTGTSGSPI VDKNGDVIGL YGNGVIMPNG SYISAIVQGE RMEEPAPAGF
     EPEMLRKKQI TVLDLHPGAG KTRKILPQII KEAINKRLRT AVLAPTRVVA AEMSEALRGL
     PIRYQTSAVH REHSGNEIVD VMCHATLTHR LMSPHRVPNY NLFIMDEAHF TDPASIAARG
     YIATKVELGE AAAIFMTATP PGTSDPFPES NAPISDMQTE IPDRAWNTGY EWITEYVGKT
     VWFVPSVKMG NEIALCLQRA GKKVIQLNRK SYETEYPKCK NDDWDFVITT DISEMGANFK
     ASRVIDSRKS VKPTIIEEGD GRVILGEPSA ITAASAAQRR GRIGRNPSQV GDEYCYGGHT
     NEDDSNFAHW TEARIMLDNI NMPNGLVAQL YQPEREKVYT MDGEYRLRGE ERKNFLEFLR
     TADLPVWLAY KVAAAGISYH DRKWCFDGPR TNTILEDNNE VEVITKLGER KILRPRWADA
     RVYSDHQALK SFKDFASGKR SQIGLVEVLG RMPEHFMVKT WEALDTMYVV ATAEKGGRAH
     RMALEELPDA LQTIVLIALL SVMSLGVFFL LMQRKGIGKI GLGGVILGAA TFFCWMAEVP
     GTKIAGMLLL SLLLMIVLIP EPEKQRSQTD NQLAVFLICV LTLVGAVAAN EMGWLDKTKN
     DIGSLLGHRP EARETTLGVE SFLLDLRPAT AWSLYAVTTA VLTPLLKHLI TSDYINTSLT
     SINVQASALF TLARGFPFVD VGVSALLLAV GCWGQVTLTV TVTAAALLFC HYAYMVPGWQ
     AEAMRSAQRR TAAGIMKNVV VDGIVATDVP ELERTTPVMQ KKVGQIILIL VSMAAVVVNP
     SVRTVREAGI LTTAAAVTLW ENGASSVWNA TTAIGLCHIM RGGWLSCLSI MWTLIKNMEK
     PGLKRGGAKG RTLGEVWKER LNHMTKEEFT RYRKEAITEV DRSAAKHARR EGNITGGHPV
     SRGTAKLRWL VERRFLEPVG KVVDLGCGRG GWCYYMATQK RVQEVKGYTK GGPGHEEPQL
     VQSYGWNIVT MKSGVDVFYR PSEASDTLLC DIGESSSSAE VEEHRTVRVL EMVEDWLHRG
     PKEFCIKVLC PYMPKVIEKM ETLQRRYGGG LIRNPLSRNS THEMYWVSHA SGNIVHSVNM
     TSQVLLGRME KKTWKGPQFE EDVNLGSGTR AVGKPLLNSD TSKIKNRIER LKKEYSSTWH
     QDANHPYRTW NYHGSYEVKP TGSASSLVNG VVRLLSKPWD TITNVTTMAM TDTTPFGQQR
     VFKEKVDTKA PEPPEGVKYV LNETTNWLWA FLARDKKPRM CSREEFIGKV NSNAALGAMF
     EEQNQWKNAR EAVEDPKFWE MVDEEREAHL RGECNTCIYN MMGKREKKPG EFGKAKGSRA
     IWFMWLGARF LEFEALGFLN EDHWLGRKNS GGGVEGLGLQ KLGYILKEVG TKPGGKVYAD
     DTAGWDTRIT KADLENEAKV LELLDGEHRR LARSIIELTY RHKVVKVMRP AADGKTVMDV
     ISREDQRGSG QVVTYALNTF TNLAVQLVRM MEGEGVIGPD DVEKLGKGKG PKVRTWLFEN
     GEERLSRMAV SGDDCVVKPL DDRFATSLHF LNAMSKVRKD IQEWKPSTGW YDWQQVPFCS
     NHFTELIMKD GRTLVVPCRG QDELIGRARI SPGAGWNVRD TACLAKSYAQ MWLLLYFHRR
     DLRLMANAIC SAVPANWVPT GRTTWSIHAK GEWMTTEDML AVWNRVWIEE NEWMEDKTPV
     ERWSDVPYSG KREDIWCGSL IGTRTRATWA ENIHVAINQV RSVIGEEKYV DYMSSLRRYE
     DTIVVEDTVL
//