《Spark实验报告.docx》由会员分享,可在线阅读,更多相关《Spark实验报告.docx(12页珍藏版)》请在课桌文档上搜索。
1、Spark报金航1510122526SPark试验报告-、环境搭建1、卜载版本卜载地址为:2、解压和安装:解压:tar-xvfscala-2.11.4.tgz安装:mvscala-2.11.4opV3、m-.bashprofile文件增加SCA1.AHoME环境变量配置,exportexportC1.ASSPATH=.:SJAVA_HOME/jre),lib:$JAVA_HOME/lib:$JAVA_HOME.lib.aools.jarPATH=SPATHSHOMEBn:$JAVAjHoME,bin:$SCA1.A_HOMEybin马上生效source/.bash_profile4、验证SCa
2、la:scala-version5、CoPy到SIaVe机器scp/.bash_PrOfiIe:Ybashjxofile6. bspark.wget7、在master主机配置SPark:格下校的SPark-I2.0-bin-hadoop2.4.tgz解压到TOPt/即,配置环境变量SPARKJHOME# setjavaenvexportC1.ASSPATH-.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lihtTOls.jarPATH=$PATH:$HOME.,bin:$JAVA_HOME.bin:$SCA1.A_HOME/bin:SSPARK_
3、HOMEbinSHADOOPJHOMEKbin配置完成后运用source吩咐使配置生效进入SParkCOnf书目:sparkSlPA11opt$cd(sparkSlPA11spark-1.2.0-bin-hadoop2,4SIsbinconfdataec2exampleslib1.ICENSElogsNOTICEpythonREADME,mdRE1.EASESbinworksparkS1PA11spark-1.2.0-bin-hadoop2.4Scdnfspark(三)SlPA11conf)$Issaves.templatespark-env.shIog4j.properties.templa
4、teslavesspark-spark-first:修改SlaVeS文件,增加两个SlaVe节点SIPA11、S1PA222sparkS1PA11conf$vislavesS1PA11S1PA222second:配置spark-env.sh首先把sark-env.sh.temlatecopyspark-env.shvispark-env.sh文件在最下面增加:exportexportSPARKWORKERMEMORY=2ghadpHADoOP_CONF_DiR是Hadoop配置文件书目,SPARK_MASTEFUP主机IP地址,SPARKWORKER.MEMORY是WOrker运用的最大内存完
5、成配世后,将sparkHIlcopyslave机器scp-r# oplpark-l.2.0-bin-hadoop2.4:沁PV8、启动spark分布式集群并查看信息ISparkSIPA11sbin$./start-all.sh查看:(sparkSlPA11sbnSjps31233ResourceManager27201Jps30498NameNode30733SecondaryNameNode5648Worker5399Master15888JobHistoryServer假如HDFS没有启动,启动起来.# slave节点:(sparkSlPA222scala$jps20352Bootstra
6、p30737NodeManager7219Jps30482DataNode29500Bootstrap757Worker9、页面查看集群状况:进去spark集群的Web管理页面,访问WSpariiKUiteratpaZ10.M.4447:707741.ra0Mi(MM4ZVn*f*v?CC*-OUMCM*r40-GgOtMnBeCyMa。C*v*WCCMw9ICu9M*AM因为我们看到两个worker节点,因为master和slave都是WOrker节点我们进入spark的bin书Il,肩动SPark-shell限制台1SOVM14:17:30XXTOMtt.Btt2cckTcMfr*mo:S
7、rrrcxfloa3“01IV01i(B112O口2M)f*.Sltot*.al9Utaa9*MM*cXctt:V*qitDRAM.BlMttai9c4.Isc*lb3t,S72)120l4eUtaMrtMr:叼1tZU0ctMt9r1WO1414P(2OWWrt*l.frtUSZE9filMOWQOf4PVUIqoredsTV1.temtat,64Uto,M”,4t*uUUIU0JAU3(O)ISP0MNWWWUM三X*-*3ft*MI.M4*Wv一,OMtoMMA)I*OA4wPi,awzvv4aMt(qo111JoJ)88msVdsJobsStagesStorageEnvironment
8、ExecutorsExecutors(1)Memory:OOBUsed(2650MBTotal)Disk:OOBUsedExecutorIDAddressRDDBlocksMemoryUsedDiskUsedActiveTasksFailedTasksCompleteTasksTotalTaskslocalhost5740100.0B/2650MBOOB0000Spark120SPark集群环境搭建胜利了10.isfjspark-shell测试之前我们在tmp书目上住了一个README文件,我们现在就用spark读取hdfs中README.txt文件tmpPermissionOwnerGrou
9、p-W-sparksupergroupdrwxsparksupergrouphttp:drwxr-xr-xsparksupergroupHadoop.2014.取得hdfs文件:15/01/06I5IOCIS/0X/04l*,BXtf)Kif3214U913214:29:32X:25:32H29MXMFOtoc.loryflVoctXNFO39.MMMrr,XMFOat0ra9.Maaor4tor:XKFOatocagv.M3gyflt0t:eourFxcall*dVldCUrMslX.三M-2772Blockbxt4ct-0-pcOaor*dabytiM*9xystina%dais22.tx
10、w2,4.0MBIIMFOVt7.lorhMryt*t,inonloolholf49211(it22.8few(2“9MBIOftI412t22INFOcor9.BloclMn*rUrUp4tdIntoOfb:oiio4caj0jSx0OC14:29:32XNrOprk.9paxk3c%xt:CratdbrAdcatOfxc三txt11leatx12Fil!oxg.Ach.nrkr4d.RDD(Striae)Mf:/SlVAll1.txtNpf020111ttt11lal2count下READM.txt文件中文字总数,FI:三三鲁WW,:HI我们过渡README.txt包括The单词有多个sc
11、alavaxtheCount=readmeFile.filter(line=line.contains(The*)theCount:org.apache.spark.rdd.RODString)5三:rilteredRDD(3tfilterat我们算出来一共有4个The单词(sparkSlPA222hadoop-2.6.0$IsNOTICE.txtREADMspark81PA222hadoop-2.6.0)$grepTheREADME.txtwc437269_h1.1.pi.h;nAoh1.nC1-IWkSUI1UHOspark0S!PA222hadoop-2.6.0$我们通过WC也罚出来有4
12、个The单词我们再实现下HadOoPWordCoUnt功能:首先对读取的readmeFile执行以卜吩咐:calavalWordCOUnt=readmeFile.fIatMap(line=line.split().map(word=wordcount:org.apache.spark.rdd.RDD(StringrInt)=ShuffledRDD(6atred,hlip:!(.csln.notstrks11mI其次运用llect吩咐提交并执行job:00UtMois:m010CISiM/1/0(IStMoci:mocis:M104ismoeIStMi0C13:N,006ISiM12IWTO*r
13、rh.VF4r*Co4vtt0ri*gjobrC91Xetttl?1V010IStMl/01/0C1S:M1VO)OCISiMry0tor*s42IMro*toraQ.Nttr79tor:42INOiA9.MMw.f/Vtvi42JWFO9tor4gv.W*ory0tor*tAi*trn9BXC3(aapatol*:14)Oot,32collctatj17vlt2OWtPWtfa*rtltl0|allo1.MxalfalMFimIJlf*l(ell*,!?r*NRta9tfiMlsM:1.tm力m*4b3,3IMaf4vmSlaK14),aic1”aiicnz*FrOe(33COIelldwi
14、thoicMm19418.BAXNTZ4,3lockbrcdcat_1toxd1sIAMao(*tl*tdlxS.3Urfr2C4.MBInMaM(SaSm1)4ith4uMm2&3:4,aaWJ7Tloebcdej3j*c0vordbyviwocyxiSdX.IB.tc2M.MBIIVOVOC1V02/0C13/。OCItiMIStMISiOIISiMXDdc*at_3-plc0isM*xyCnloclort:1MBI4JIMFOtor*9.BlockM*MmtUKUldIafo“b18bsAcj)j,o0010ISiM010C1S:M010(ISiM1WO1OI15O2OCIMFO*r八.
15、(rbCenVflC,Crt4bro;*)ftonbt9*eMCC)t42XMTO9chlx.MGlr:4lh0,urt,42IMFO*rt*Sulr.TM9UUn7r:9tciagtvk1.0intg2.0ITXD3loclbM.K11.42JMFOxcucox.Kxcutc*z:fUn3tk0.0IfiatQft.QT10443SKTOM*a.Kmmm*4IBMAASrtM.0it49i.fT:OSttl4UI4bt4)1214”,)IVOVOiX3/01/04113:ZISiM1SM1S:M424242JMFOr*l.l*3opDD?:B,*plil三rrd.lA5soF4nD:2ava
16、pl1.C:IMFOMMCA.KMCUtM1Fl*lte*4IMFO*avcMorEaCM%9rtFiih*4Mtr9irMlf9M0avRZAIe.xt!M)*t4SMfJlPMlzMOt三FWAtt-txtO3cmit.ny0.aa010CISiM,01,0,153/01/81:Mi0C13:M/01/04ISiMi0iISjM010C13:M0(ISiM.TK53Unr:11nxabdtk.T*ktMn*)r;tkt*f2.0(TtOS).1”,bytr4ultMitt341ivr*fV2.0CYXO4.19,teytvrevlMntSdriwr1.0Idto2.0ITXDMa152MO
17、alocalovll2l0.0inJ02.0ItID4laIMMOAlocalAtU2424242lThlMt*B)Wb*iT4k4l,0,*heK)I*wfltd*ftaIWOlZOt08XV010C1S:M1V01/0(ISiM11/01/0,IttM42IMTOchulx.42XMFOMhAdla“XWVO*H*alrTOvtorao42SHFOt0r9MUlSMtfUM2f3IS.M.*2IStMiAXISiMittl.MUt2BTtrl1lfinhdin%1,3al00la4fnvlyrvikablt*fUM*flSt(VWitieQ!t(0t*lH:AtOIMJCm3iMaDD(4
18、)atr!ByKyt111whichievZnMbXr79tor:nuxrr0r*o(21121c4lldwithcurtlw03513.aM三*27T2ryatr1BlMkIMrSdCj4tord“valueia-ty(tbMCd1m2.1Mvti-264.SMalryBtoctnwf*FrS4$lel1Xihd三*W,45,aMiT4t4nrz44.raWA30.r430.*MrtC*.vMeF)4.,.43l*wO.IT.M.1.,1J*ty.S.v*Mtly,t,*te.t.,t4.X,.He,“k*r,.vvk.s),*t*.t.rto*4*Mt,t.1.1.11.lMAy.vlrI
19、MM,),m.1,.,H,fMrMtM.7),”,tv,t,(MlaM*tM.t*,(MK,11,iBa*M%rarki4*4,3t4ywt.t.fC*4ity.11,l,Sv.我们看IWEBUI界面执行效果:DUIforSUg0DetailsforSu9e2mm*vmm*w*VraMMtWvalue.toStringO.splitCWs+O.mafKword=(vord,1).reduceByKey(_+_)将产生的RDD数据集保存到HDFS上。可以运用SParkConteXt中的SaveAsTextFiIe哈数将数据集保存到HDFS书目下,钛认采纳HadOoP供应的TeXtoUtPUtFO
20、rmat,每条记录以(key,value)”的形式打印输出,你也可以采纳SaveAsSequenceFiIe函数将数据保存为SeqUenCeFiIe格式等,result.saVeAsseqUenCeFiIe(args(2)当然,一般我们写SPark程序时,须要包含以下两个头文件:importSParkConteXt.须要留意的是,指定输入输出文件时,须要指定hdfs的URl,比如输入书目是hdfs:/hadoop-test/tmp/input.?!HbU是hdfs:/hadoop-test/tmp/output.我中,hdfs:hadoop-test”是由Hadp配巴文件core-site.xml中参数指定的,详细替换成你的配置即可。