ID POLG_WNV STANDARD; PRT; 3430 AA. AC P06935; DT 01-JAN-1988 (REL. 06, CREATED) DT 01-JAN-1988 (REL. 06, LAST SEQUENCE UPDATE) DT 01-AUG-1992 (REL. 23, LAST ANNOTATION UPDATE) DE GENOME POLYPROTEIN (CONTAINS: CAPSID PROTEIN C (CORE PROTEIN); MATRIX DE PROTEIN (ENVELOPE PROTEIN M); MAJOR ENVELOPE PROTEIN E; NONSTRUCTURAL DE PROTEINS NS1, NS2A, NS2B, NS4A AND NS4B; HELICASE (NS3); RNA-DIRECTED DE RNA POLYMERASE (EC 2.7.7.48) (NS5)). OS WEST NILE VIRUS. OC VIRIDAE; SS-RNA ENVELOPED VIRUSES; POSITIVE-STRAND; FLAVIVIRIDAE; OC FLAVIVIRUSES. RN [1] RP SEQUENCE FROM N.A. RM 86124703 RA CASTLE E., LEIDNER U., NOWAK T., WENGLER G., WENGLER G.; RL VIROLOGY 149:10-26(1986). RN [2] RP SEQUENCE OF 1-291 FROM N.A. RM 85274372 RA CASTLE E., NOWAK T., LEIDNER U., WENGLER G., WENGLER G.; RL VIROLOGY 145:227-236(1985). RN [3] RP SEQUENCE OF 255-854 FROM N.A. RM 86072082 RA WENGLER G., CASTLE E., LEIDNER U., NOWAK T., WENGLER G.; RL VIROLOGY 147:264-274(1985). CC -!- FUNCTION: THE SMALL PROTEINS NS2A, NS2B, NS4A AND NS4B ARE CC HYDROPHOBIC, SUGGESTING A POSSIBLE MEMBRANE-RELATED FUNCTION. CC NS3 AND NS5 MAY PLAY A ROLE IN THE VIRAL RNA REPLICATION. CC -!- SUBUNIT: THE VIRION OF THIS VIRUS IS A NUCLEOCAPSID COVERED BY A CC LIPOPROTEIN ENVELOPE. THE ENVELOPE CONSISTS OF TWO PROTEINS: CC PROTEIN M AND GLYCOPROTEIN E. THE NUCLEOCAPSID IS A COMPLEX OF CC PROTEIN C AND MRNA. CC --------------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution (CC BY 4.0) License CC --------------------------------------------------------------------------- DR EMBL; M10103; FLWNVSP. DR PIR; A25256; GNWVWV. KW POLYPROTEIN; GLYCOPROTEIN; RNA-DIRECTED RNA POLYMERASE; CORE PROTEIN; KW COAT PROTEIN; ENVELOPE PROTEIN; HELICASE; ATP-BINDING; TRANSMEMBRANE; KW NONSTRUCTURAL PROTEIN. FT INIT_MET 1 1 REMOVED FROM CAPSID PROTEIN C BY THE FT CELLULAR AMINOPEPTIDASE. FT CHAIN 2 123 CAPSID PROTEIN C. FT PROPEP 124 215 FT CHAIN 216 290 ENVELOPE GLYCOPROTEIN M. FT CHAIN 291 787 MAJOR ENVELOPE PROTEIN E. FT CHAIN 788 1139 NONSTRUCTURAL PROTEIN NS1. FT CHAIN 1140 1370 NONSTRUCTURAL PROTEIN NS2A. FT CHAIN 1371 1501 NONSTRUCTURAL PROTEIN NS2B. FT CHAIN 1502 2120 HELICASE (NS3). FT CHAIN 2121 2269 NONSTRUCTURAL PROTEIN NS4A. FT CHAIN 2270 2525 NONSTRUCTURAL PROTEIN NS4B. FT CHAIN 2526 3430 RNA-DIRECTED RNA POLYMERASE (NS5). FT CARBOHYD 138 138 POTENTIAL. FT CARBOHYD 917 917 POTENTIAL. FT CARBOHYD 962 962 POTENTIAL. FT CARBOHYD 994 994 POTENTIAL. FT CARBOHYD 1289 1289 POTENTIAL. FT CARBOHYD 2336 2336 POTENTIAL. FT CARBOHYD 2489 2489 POTENTIAL. SQ SEQUENCE 3430 AA; 379624 MW; 2.098737E+07 CN; MSKKPGGPGK NRAVNMLKRG MPRGLSLIGL KRAMLSLIDG KGPIRFVLAL LAFFRFTAIA PTRAVLDRWR GVNKQTAMKH LLSFKKELGT LTSAINRRST KQKKRGGTAG FTILLGLIAC AGAVTLSNFQ GKVMMTVNAT DVTDVITIPT AAGKNLCIVR AMDVGYLCED TITYECPVLA AGNDPEDIDC WCTKSSVYVR YGRCTKTRHS RRSRRSLTVQ THGESTLANK KGAWLDSTKA TRYLVKTESW ILRNPGYALV AAVIGWMLGS NTMQRVVFAI LLLLVAPAYS FNCLGMSNRD FLEGVSGATW VDLVLEGDSC VTIMSKDKPT IDVKMMNMEA ANLADVRSYC YLASVSDLST RAACPTMGEA HNEKRADPAF VCKQGVVDRG WGNGCGLFGK GSIDTCAKFA CTTKATGWII QKENIKYEVA IFVHGPTTVE SHGKIGATQA GRFSITPSAP SYTLKLGEYG EVTVDCEPRS GIDTSAYYVM SVGEKSFLVH REWFMDLNLP WSSAGSTTWR NRETLMEFEE PHATKQSVVA LGSQEGALHQ ALAGAIPVEF SSNTVKLTSG HLKCRVKMEK LQLKGTTYGV CSKAFKFART PADTGHGTVV LELQYTGTDG PCKVPISSVA SLNDLTPVGR LVTVNPFVSV ATANSKVLIE LEPPFGDSYI VVGRGEQQIN HHWHKSGSSI GKAFTTTLRG AQRLAALGDT AWDFGSVGGV FTSVGKAIHQ VFGGAFRSLF GGMSWITQGL LGALLLWMGI NARDRSIAMT FLAVGGVLLF LSVNVHADTG CAIDIGRQEL RCGSGVFIHN DVEAWMDRYK FYPETPQGLA KIIQKAHAEG VCGLRSVSRL EHQMWEAIKD ELNTLLKENG VDLSVVVEKQ NGMYKAAPKR LAATTEKLEM GWKAWGKSII FAPELANNTF VIDGPETEEC PTANRAWNSM EVEDFGFGLT STRMFLRIRE TNTTECDSKI IGTAVKNNMA VHSDLSYWIE SGLNDTWKLE RAVLGEVKSC TWPETHTLWG DGVLESDLII PITLAGPRSN HNRRPGYKTQ NQGPWDEGRV EIDFDYCPGT TVTISDSCEH RGPAARTTTE SGKLITDWCC RSCTLPPLRF QTENGCWYGM EIRPTRHDEK TLVQSRVNAY NADMIDPFQL GLMVVFLATQ EVLRKRWTAK ISIPAIMLAL LVLVFGGITY TDVLRYVILV GAAFAEANSG GDVVHLALMA TFKIQPVFLV ASFLKARWTN QESILLMLAA AFFQMAYYDA KNVLSWEVPD VLNSLSVAWM ILRAISFTNT SNVVVPLLAL LTPGLKCLNL DVYRILLLMV GVGSLIKEKR SSAAKKKGAC LICLALASTG VFNPMILAAG LMACDPNRKR GWPATEVMTA VGLMFAIVGG LAELDIDSMA IPMTIAGLMF AAFVISGKST DMWIERTADI TWESDAEITG SSERVDVRLD DDGNFQLMND PGAPWKIWML RMACLAISAY TPWAILPSVI GFWITLQYTK RGGVLWDTPS PKEYKKGDTT TGVYRIMTRG LLGSYQAGAG VMVEGVFHTL WHTTKGAALM SGEGRLDPYW GSVKEDRLCY GGPWKLQHKW NGHDEVQMIV VEPGKNVKNV QTKPGVFKTP EGEIGAVTLD YPTGTSGSPI VDKNGDVIGL YGNGVIMPNG SYISAIVQGE RMEEPAPAGF EPEMLRKKQI TVLDLHPGAG KTRKILPQII KEAINKRLRT AVLAPTRVVA AEMSEALRGL PIRYQTSAVH REHSGNEIVD VMCHATLTHR LMSPHRVPNY NLFIMDEAHF TDPASIAARG YIATKVELGE AAAIFMTATP PGTSDPFPES NAPISDMQTE IPDRAWNTGY EWITEYVGKT VWFVPSVKMG NEIALCLQRA GKKVIQLNRK SYETEYPKCK NDDWDFVYTT DISEMGANFK ASRVIDSRKS VKPTIIEEGD GRVILGEPSA ITAASAAQRR GRIGRNPSQV GDEYCYGGHT NEDDSNFAHW TEARIMLDNI NMPNGLVAQL YQPEREKCTP RTGNTGSEGK NGRTSFEFLR TADLPVWLAY KVAAAGISYH DRKWCFDGPR TNTILEDNNE VEVITKLGER KILRPRWADA RVYSDHQALK SFKDFASGKR SQIGLVEVLG RMPEHFMVKT WEALDTMYVV ATAEKGGRAH RMALEELPDA LQTIVLIALL SVMSLGVFFL LMQRKGIGKI GLGGVILGAA TFFCWMAEVP GTKIAGMLLL SLLLMIVLIP ESEKQRSQTD NQLAVFLICV LTLVGAVAAN EMGWLDKTKN DIGSLLGHRP EARETTLGVE SFLLDLRPAT AWSLYAVTTA VLTPLLKHLI TSDYINTSLT SINVQASALF TLARGFPFVD VGVSALLLAV GCWGQVTLTV TVTAAALLFC HYAYMVPGWQ AEAMRSAQRR TAAGIMKNVV VDGIVATDVP ELERTTPVMQ KKVGQIILIL VSMAAVVVNP SVRTVREAGI LTTAAAVTLW ENGASSVWNA TTAIGLCHIM RGGWLSCLSI MWTLIKNMEK PGLKRGGAKG RTLGEVWKER LNHMTKEEFT RYRKEAITEV DRSAAKHARR EGNITGGHPV SRGTAKLRWL VERRFLEPVG KVVDLGCGRG GWCYYMATQK RVQEVKGYTK GGPGHEEPQL VQSYGWNIVT MKSGVDVFYR PSEASDTLLC DIGESSSSAE VEEHRTVRVL EMVEDWLHRG PKEFCIKVLC PYMPKVIEKM ETLQRRYGGG LIRNPLSRNS THEMYWVSHA SGNIVHSVNM TSQVLLGRME KKTWKGPQFE EDVNLGSGTR AVGKPLLNSD TSKIKNRIER LKKEYSSTWH QDANHPYRTW NYHGSYEVKP TGSASSLVNG VVRLLSKPMG TITNVTTMAM TDTTPFGQQR VFKEKVDTKA PEPPEGVKYV LNETTNWLWA FLARDKKPRM CSREEFIGKV NSNAALGAMF EEQNQWKNAR EAVEDPKFWE MVDEEREAHL RGECNTCIYN MMGKREKKPG EFGKAKGSRA IWFMWLGARF LEFEALGFLN EDHWLGRKNS GGGVEGLGLQ KLGYILKEVG TKPGGKVYAD DTAGWDTRIT KADLENEAKV LELLDGEHRR LARSIIELTY RHKVVKVMRP AADGKTVMDV ISREDQRGSG QVVTYALNTF TNLAVQLVRM MEGEGVIGPD DVEKLGKGKG PKVRTWLFEN GEERLSRMAV SGDDCVVKPL DDRFATSLHF LNAMSKVRKD IQEWKPSTGW YDWQQVPFCS NHFTELIMKD GRTLVVPCRG QDELIGRARI SPGAGWNVRD TACLAKSYAQ MWLLLYFHRR DLRLMANAIC SAVPANWVPT GRTTWSIHAK GEWMTTEDML AVWNRVWIEE NEWMEDKTPV ERWSDVPYSG KREDIWCGSL IGTRTRATWA ENIHVAINQV RSVIGEEKYV DYMSSLRRYE DTIVVEDTVL //