ID   POLG_WNV       STANDARD;      PRT;  3430 AA.
AC   P06935;
DT   01-JAN-1988, integrated into UniProtKB/Swiss-Prot.
DT   24-OCT-2003, sequence version 2.
DT   07-MAR-2006, entry version 64.
DE   Genome polyprotein [Contains: Capsid protein C (Core protein);
DE   Envelope protein M (Matrix protein); Major envelope protein E;
DE   Nonstructural protein 1 (NS1); Nonstructural protein 2A (NS2A);
DE   Flavivirin protease NS2B regulatory subunit; Flavivirin protease NS3
DE   catalytic subunit (EC 3.4.21.91); Nonstructural protein 4A (NS4A);
DE   Nonstructural protein 4B (NS4B); RNA-directed RNA polymerase
DE   (EC 2.7.7.48) (NS5)].
OS   West Nile virus (WN).
OC   Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae;
OC   Flavivirus; Japanese encephalitis virus group.
OX   NCBI_TaxID=11082;
RN   [1]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA].
RX   MEDLINE=86124703; PubMed=3753811;
RA   Castle E., Leidner U., Nowak T., Wengler G., Wengler G.;
RT   "Primary structure of the West Nile flavivirus genome region coding
RT   for all nonstructural proteins.";
RL   Virology 149:10-26(1986).
RN   [2]
RP   SEQUENCE REVISION TO 1908; 2018-2036; 2242 AND 2859-2860.
RX   MEDLINE=21176376; PubMed=11277701; DOI=10.1006/viro.2000.0795;
RA   Yamshchikov V.F., Wengler G., Perelygin A.A., Brinton M.A.,
RA   Compans R.W.;
RT   "An infectious clone of the West Nile flavivirus.";
RL   Virology 281:294-304(2001).
RN   [3]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 1-291.
RX   MEDLINE=85274372; PubMed=2992152;
RA   Castle E., Nowak T., Leidner U., Wengler G., Wengler G.;
RT   "Sequence analysis of the viral core protein and the membrane-
RT   associated proteins V1 and NV2 of the flavivirus West Nile virus and
RT   of the genome sequence for these proteins.";
RL   Virology 145:227-236(1985).
RN   [4]
RP   NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 255-854.
RX   MEDLINE=86072082; PubMed=3855247;
RA   Wengler G., Castle E., Leidner U., Nowak T., Wengler G.;
RT   "Sequence analysis of the membrane protein V3 of the flavivirus West
RT   Nile virus and of its gene.";
RL   Virology 147:264-274(1985).
RN   [5]
RP   DISULFIDE BONDS IN E PROTEIN.
RX   MEDLINE=87122143; PubMed=3811228;
RA   Nowak T., Wengler G.;
RT   "Analysis of disulfides present in the membrane proteins of the West
RT   Nile flavivirus.";
RL   Virology 156:127-137(1987).
CC   -!- FUNCTION: The small proteins NS2A, NS4A and NS4B are hydrophobic,
CC       suggesting a possible membrane-related function. NS5 may play a
CC       role in the viral RNA replication. The NS2B/NS3 protease complex
CC       processes the viral polyprotein.
CC   -!- CATALYTIC ACTIVITY: Selective hydrolysis of -Xaa-Xaa-|-Yaa- bonds
CC       in which each of the Xaa can be either Arg or Lys and Yaa can be
CC       either Ser or Ala.
CC   -!- CATALYTIC ACTIVITY: Nucleoside triphosphate + RNA(n) = diphosphate
CC       + RNA(n+1).
CC   -!- SUBUNIT: NS3 and NS2B form a heterodimer. NS3 is the catalytic
CC       subunit, whereas NS2B strongly stimulates the latter (By
CC       similarity).
CC   -!- PTM: Specific enzymatic cleavages in vivo yield mature proteins
CC       (By similarity).
CC   -!- MISCELLANEOUS: The virion of this virus is a nucleocapsid covered
CC       by a lipoprotein envelope. The envelope contains two proteins: the
CC       protein M and glycoprotein E. The nucleocapsid is a complex of
CC       protein C and mRNA. In immature particles, there are 60
CC       icosaedrally organized trimeric spikes on the surface. Each spike
CC       consists of three heterodimers of envelope protein M precursor
CC       (prM) and envelope protein E (By similarity).
CC   -!- SIMILARITY: Contains 1 peptidase S7 domain.
CC   -!- SIMILARITY: Contains 1 RdRp catalytic domain.
CC   ---------------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution (CC BY 4.0) License
CC   ---------------------------------------------------------------------------
DR   EMBL; M12294; AAA48498.2; -; Genomic_RNA.
DR   PIR; A25256; GNWVWV.
DR   HSSP; Q88653; 1L9K.
DR   SMR; P06935; 25-97.
DR   MEROPS; S07.001; -.
DR   InterPro; IPR001410; DEAD.
DR   InterPro; IPR011545; DEAD/DEAH_N.
DR   InterPro; IPR002464; DEAH_box.
DR   InterPro; IPR011999; Flav_glyE_cen_dm.
DR   InterPro; IPR001122; Flavi_capsidC.
DR   InterPro; IPR011492; Flavi_DEAD.
DR   InterPro; IPR000069; Flavi_M.
DR   InterPro; IPR001157; Flavi_NS1.
DR   InterPro; IPR000752; Flavi_NS2A.
DR   InterPro; IPR000487; Flavi_NS2B.
DR   InterPro; IPR000404; Flavi_NS4A.
DR   InterPro; IPR001528; Flavi_NS4B.
DR   InterPro; IPR000208; Flavi_NS5.
DR   InterPro; IPR002535; Flavi_propep.
DR   InterPro; IPR000336; Flv_glyE_Ig-like.
DR   InterPro; IPR001650; Helicase_C.
DR   InterPro; IPR001850; Peptidase_S7.
DR   InterPro; IPR007095; RNA_pol_DS_PS.
DR   InterPro; IPR007094; RNA_pol_PSvir.
DR   InterPro; IPR002877; RrmJFtsJ_mtfrase.
DR   InterPro; IPR011998; Vrl_glyE_cen_dim.
DR   Pfam; PF01003; Flavi_capsid; 1.
DR   Pfam; PF07652; Flavi_DEAD; 1.
DR   Pfam; PF02832; Flavi_glycop_C; 1.
DR   Pfam; PF00869; Flavi_glycoprot; 1.
DR   Pfam; PF01004; Flavi_M; 1.
DR   Pfam; PF00948; Flavi_NS1; 1.
DR   Pfam; PF01005; Flavi_NS2A; 1.
DR   Pfam; PF01002; Flavi_NS2B; 1.
DR   Pfam; PF01350; Flavi_NS4A; 1.
DR   Pfam; PF01349; Flavi_NS4B; 1.
DR   Pfam; PF00972; Flavi_NS5; 1.
DR   Pfam; PF01570; Flavi_propep; 1.
DR   Pfam; PF01728; FtsJ; 1.
DR   Pfam; PF00271; Helicase_C; 1.
DR   Pfam; PF00949; Peptidase_S7; 1.
DR   ProDom; PD001496; Flavi_NS1; 1.
DR   SMART; SM00487; DEXDc; 1.
DR   SMART; SM00490; HELICc; 1.
DR   PROSITE; PS00690; DEAH_ATP_HELICASE; FALSE_NEG.
DR   PROSITE; PS50507; RDRP_SSRNA_POS; 1.
KW   ATP-binding; Capsid protein; Core protein; Envelope protein;
KW   Glycoprotein; Helicase; Hydrolase; Membrane; Nucleotide-binding;
KW   Nucleotidyltransferase; Polyprotein; RNA-directed RNA polymerase;
KW   Structural protein; Transferase; Transmembrane.
FT   CHAIN         1    123       Capsid protein C.
FT                                /FTId=PRO_0000037743.
FT   INIT_MET      1      1       Removed from capsid protein C by the
FT                                cellular aminopeptidase.
FT   PROPEP      124    215
FT                                /FTId=PRO_0000037744.
FT   CHAIN       216    290       Envelope protein M.
FT                                /FTId=PRO_0000037745.
FT   CHAIN       291    787       Major envelope protein E.
FT                                /FTId=PRO_0000037746.
FT   CHAIN       788   1139       Nonstructural protein 1.
FT                                /FTId=PRO_0000037747.
FT   CHAIN      1140   1370       Nonstructural protein 2A.
FT                                /FTId=PRO_0000037748.
FT   CHAIN      1371   1501       Flavivirin protease NS2B regulatory
FT                                subunit.
FT                                /FTId=PRO_0000037749.
FT   CHAIN      1502   2120       Flavivirin protease NS3 catalytic
FT                                subunit.
FT                                /FTId=PRO_0000037750.
FT   CHAIN      2121   2269       Nonstructural protein 4A.
FT                                /FTId=PRO_0000037751.
FT   CHAIN      2270   2525       Nonstructural protein 4B.
FT                                /FTId=PRO_0000037752.
FT   CHAIN      2526   3430       RNA-directed RNA polymerase.
FT                                /FTId=PRO_0000037753.
FT   DOMAIN     1508   1679       Peptidase S7.
FT   DOMAIN     3055   3207       RdRp catalytic.
FT   NP_BIND    1695   1702       ATP (Potential).
FT   REGION      388    401       Involved in fusion.
FT   MOTIF      1786   1789       DEAH box.
FT   ACT_SITE   1552   1552       Charge relay system (By similarity).
FT   ACT_SITE   1576   1576       Charge relay system (By similarity).
FT   ACT_SITE   1636   1636       Charge relay system (By similarity).
FT   CARBOHYD    138    138       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    917    917       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    962    962       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD    994    994       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   1289   1289       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   2336   2336       N-linked (GlcNAc...) (Potential).
FT   CARBOHYD   2489   2489       N-linked (GlcNAc...) (Potential).
FT   DISULFID    293    320
FT   DISULFID    350    406
FT   DISULFID    364    395
FT   DISULFID    382    411
FT   DISULFID    476    574
FT   DISULFID    591    622
SQ   SEQUENCE   3430 AA;  380110 MW;  42D71B7CB12DC45B CRC64;
     MSKKPGGPGK NRAVNMLKRG MPRGLSLIGL KRAMLSLIDG KGPIRFVLAL LAFFRFTAIA
     PTRAVLDRWR GVNKQTAMKH LLSFKKELGT LTSAINRRST KQKKRGGTAG FTILLGLIAC
     AGAVTLSNFQ GKVMMTVNAT DVTDVITIPT AAGKNLCIVR AMDVGYLCED TITYECPVLA
     AGNDPEDIDC WCTKSSVYVR YGRCTKTRHS RRSRRSLTVQ THGESTLANK KGAWLDSTKA
     TRYLVKTESW ILRNPGYALV AAVIGWMLGS NTMQRVVFAI LLLLVAPAYS FNCLGMSNRD
     FLEGVSGATW VDLVLEGDSC VTIMSKDKPT IDVKMMNMEA ANLADVRSYC YLASVSDLST
     RAACPTMGEA HNEKRADPAF VCKQGVVDRG WGNGCGLFGK GSIDTCAKFA CTTKATGWII
     QKENIKYEVA IFVHGPTTVE SHGKIGATQA GRFSITPSAP SYTLKLGEYG EVTVDCEPRS
     GIDTSAYYVM SVGEKSFLVH REWFMDLNLP WSSAGSTTWR NRETLMEFEE PHATKQSVVA
     LGSQEGALHQ ALAGAIPVEF SSNTVKLTSG HLKCRVKMEK LQLKGTTYGV CSKAFKFART
     PADTGHGTVV LELQYTGTDG PCKVPISSVA SLNDLTPVGR LVTVNPFVSV ATANSKVLIE
     LEPPFGDSYI VVGRGEQQIN HHWHKSGSSI GKAFTTTLRG AQRLAALGDT AWDFGSVGGV
     FTSVGKAIHQ VFGGAFRSLF GGMSWITQGL LGALLLWMGI NARDRSIAMT FLAVGGVLLF
     LSVNVHADTG CAIDIGRQEL RCGSGVFIHN DVEAWMDRYK FYPETPQGLA KIIQKAHAEG
     VCGLRSVSRL EHQMWEAIKD ELNTLLKENG VDLSVVVEKQ NGMYKAAPKR LAATTEKLEM
     GWKAWGKSII FAPELANNTF VIDGPETEEC PTANRAWNSM EVEDFGFGLT STRMFLRIRE
     TNTTECDSKI IGTAVKNNMA VHSDLSYWIE SGLNDTWKLE RAVLGEVKSC TWPETHTLWG
     DGVLESDLII PITLAGPRSN HNRRPGYKTQ NQGPWDEGRV EIDFDYCPGT TVTISDSCEH
     RGPAARTTTE SGKLITDWCC RSCTLPPLRF QTENGCWYGM EIRPTRHDEK TLVQSRVNAY
     NADMIDPFQL GLMVVFLATQ EVLRKRWTAK ISIPAIMLAL LVLVFGGITY TDVLRYVILV
     GAAFAEANSG GDVVHLALMA TFKIQPVFLV ASFLKARWTN QESILLMLAA AFFQMAYYDA
     KNVLSWEVPD VLNSLSVAWM ILRAISFTNT SNVVVPLLAL LTPGLKCLNL DVYRILLLMV
     GVGSLIKEKR SSAAKKKGAC LICLALASTG VFNPMILAAG LMACDPNRKR GWPATEVMTA
     VGLMFAIVGG LAELDIDSMA IPMTIAGLMF AAFVISGKST DMWIERTADI TWESDAEITG
     SSERVDVRLD DDGNFQLMND PGAPWKIWML RMACLAISAY TPWAILPSVI GFWITLQYTK
     RGGVLWDTPS PKEYKKGDTT TGVYRIMTRG LLGSYQAGAG VMVEGVFHTL WHTTKGAALM
     SGEGRLDPYW GSVKEDRLCY GGPWKLQHKW NGHDEVQMIV VEPGKNVKNV QTKPGVFKTP
     EGEIGAVTLD YPTGTSGSPI VDKNGDVIGL YGNGVIMPNG SYISAIVQGE RMEEPAPAGF
     EPEMLRKKQI TVLDLHPGAG KTRKILPQII KEAINKRLRT AVLAPTRVVA AEMSEALRGL
     PIRYQTSAVH REHSGNEIVD VMCHATLTHR LMSPHRVPNY NLFIMDEAHF TDPASIAARG
     YIATKVELGE AAAIFMTATP PGTSDPFPES NAPISDMQTE IPDRAWNTGY EWITEYVGKT
     VWFVPSVKMG NEIALCLQRA GKKVIQLNRK SYETEYPKCK NDDWDFVITT DISEMGANFK
     ASRVIDSRKS VKPTIIEEGD GRVILGEPSA ITAASAAQRR GRIGRNPSQV GDEYCYGGHT
     NEDDSNFAHW TEARIMLDNI NMPNGLVAQL YQPEREKVYT MDGEYRLRGE ERKNFLEFLR
     TADLPVWLAY KVAAAGISYH DRKWCFDGPR TNTILEDNNE VEVITKLGER KILRPRWADA
     RVYSDHQALK SFKDFASGKR SQIGLVEVLG RMPEHFMVKT WEALDTMYVV ATAEKGGRAH
     RMALEELPDA LQTIVLIALL SVMSLGVFFL LMQRKGIGKI GLGGVILGAA TFFCWMAEVP
     GTKIAGMLLL SLLLMIVLIP EPEKQRSQTD NQLAVFLICV LTLVGAVAAN EMGWLDKTKN
     DIGSLLGHRP EARETTLGVE SFLLDLRPAT AWSLYAVTTA VLTPLLKHLI TSDYINTSLT
     SINVQASALF TLARGFPFVD VGVSALLLAV GCWGQVTLTV TVTAAALLFC HYAYMVPGWQ
     AEAMRSAQRR TAAGIMKNVV VDGIVATDVP ELERTTPVMQ KKVGQIILIL VSMAAVVVNP
     SVRTVREAGI LTTAAAVTLW ENGASSVWNA TTAIGLCHIM RGGWLSCLSI MWTLIKNMEK
     PGLKRGGAKG RTLGEVWKER LNHMTKEEFT RYRKEAITEV DRSAAKHARR EGNITGGHPV
     SRGTAKLRWL VERRFLEPVG KVVDLGCGRG GWCYYMATQK RVQEVKGYTK GGPGHEEPQL
     VQSYGWNIVT MKSGVDVFYR PSEASDTLLC DIGESSSSAE VEEHRTVRVL EMVEDWLHRG
     PKEFCIKVLC PYMPKVIEKM ETLQRRYGGG LIRNPLSRNS THEMYWVSHA SGNIVHSVNM
     TSQVLLGRME KKTWKGPQFE EDVNLGSGTR AVGKPLLNSD TSKIKNRIER LKKEYSSTWH
     QDANHPYRTW NYHGSYEVKP TGSASSLVNG VVRLLSKPWD TITNVTTMAM TDTTPFGQQR
     VFKEKVDTKA PEPPEGVKYV LNETTNWLWA FLARDKKPRM CSREEFIGKV NSNAALGAMF
     EEQNQWKNAR EAVEDPKFWE MVDEEREAHL RGECNTCIYN MMGKREKKPG EFGKAKGSRA
     IWFMWLGARF LEFEALGFLN EDHWLGRKNS GGGVEGLGLQ KLGYILKEVG TKPGGKVYAD
     DTAGWDTRIT KADLENEAKV LELLDGEHRR LARSIIELTY RHKVVKVMRP AADGKTVMDV
     ISREDQRGSG QVVTYALNTF TNLAVQLVRM MEGEGVIGPD DVEKLGKGKG PKVRTWLFEN
     GEERLSRMAV SGDDCVVKPL DDRFATSLHF LNAMSKVRKD IQEWKPSTGW YDWQQVPFCS
     NHFTELIMKD GRTLVVPCRG QDELIGRARI SPGAGWNVRD TACLAKSYAQ MWLLLYFHRR
     DLRLMANAIC SAVPANWVPT GRTTWSIHAK GEWMTTEDML AVWNRVWIEE NEWMEDKTPV
     ERWSDVPYSG KREDIWCGSL IGTRTRATWA ENIHVAINQV RSVIGEEKYV DYMSSLRRYE
     DTIVVEDTVL
//