《Linear predictive coding (LPC) of speech - Forward线性预测编码(LPC)语音了.docx》由会员分享,可在线阅读,更多相关《Linear predictive coding (LPC) of speech - Forward线性预测编码(LPC)语音了.docx(20页珍藏版)》请在课桌文档上搜索。
1、SpeechProcessingProject1.inearPredictivecodingusingVoiceexcitedVocoderECE5525OsamaSarairehFall2005Dr.VetonKepuskaThebasicformofpitchexcitedLPCvocoderisshownbelowThespeechsignalisfilteredtonomorethanonehalfthesystemsamplingfrequencyandthenA/Dconversionisperformed.Thespeechisprocessedonaframebyframeba
2、siswheretheanalysisframelengthcanbevariable.Foreachframeapitchperiodestimationismadealongwithavoicingdecision.AlinearpredictivecoefficientanalysisisperformedtoobtainaninversemodelofthespeechspectrumA(z).InadditionagainparameterG,representingsomefunctionofIhespeechenergyiscomputed.Anencodingprocedure
3、isthenappliedfortransformingtheanalyzedparametersintoanefficientsetoftransmissionparameterswiththegoalofminimizingthedegradationinthesynthesizedspeechforaspecifiednumberofbits.Knowingthetransmissionframerateandthenumberofbitsusedforeachtransmissionparameters,onecancomputeanoise-freechanneltransmissi
4、onbitrate.Atthereceiver,thetransmittedparametersaredecodedintoquantizedversionsofthecoeifficentanalysisandpitchestimationparameters.Anexcitationsignalforsynthesisisthenconstructedfromthetransmittedpitchandvoicingparameters.Theexcitationsignalthendrivesasynthesisfilter1/A(z)correspondingtotheanalysis
5、modelA(z).Thedigitalsampless(n)arethenpassedthroughanD/Aconverterandlowpassfilteredtogeneratethesyntheticspeechs(t).Eitherbeforeoraftersynthesis,thegainisusedtomatchthesyntheticspeechenergytotheactualspeechenergy.Thedigitalsamplesaretheconvertedtoananalogsignalandpassedthroughafiltersimilartotheonea
6、ttheinputofthesystem.LMearDrediCtiVeCOdin父(LPC)OfSDeeChThelinearpredictivecoding(LPC)methodforspeechanalysisandsynthesisisbasedonmodelingtheVocaltractasalinearAll-Pole(IIR)filterhavingthesystemtransferfunction:T = pitd periodimpulse trainInnovationsu(n)。UVSpeech SignalLPC FilterWI)whitenoisesimplesp
7、eechproductionWherepisthenumberofpoles,GisthefilterGain,andakaretheparametersthatdeterminethepoles.Therearetwomutuallyexclusivewaysexcitationfunctionstomodelvoicedandunvoicedspeechsounds.Forashorttime-basisanalysis,voicedspeechisconsideredperiodicwithafundamentalfrequencyofFo,andapitchperiodoflFo,wh
8、ichdependsonthespeaker.Hence,Voicedspeechisgeneratedbyexcitingtheallpolefiltermodelbyaperiodicimpulsetrain.Ontheotherhand,unvoicedsoundsaregeneratedbyexcitingtheall-polefilterbytheoutputofarandomnoisegenerator.Thefundamentaldifferencebetweenthesetwotypesofspeechsoundscomesfromthewaytheyareproduced.T
9、hevibrationsofthevocalcordsproducevoicedsounds.Therateatwhichthevocalcordsvibratedictatesthepitchofthesound.Ontheotherhand,unvoicedsoundsdonotrelyonthevibrationofthevocalcords.Theunvoicedsoundsarecreatedbytheconstrictionofthevocaltract.Thevocalcordsremainopenandtheconstrictionsofthevocaltractforceai
10、routtoproducetheunvoicedsoundsGivenashortsegmentofaspeechsignal,letssayabout20msor160samplesatasamplingrate8KHz,thespeechencoderatthetransmittermustdeterminetheproperexcitationfunction,thepitchperiodforvoicedspeech,thegain,andthecoefficients3pk.Theblockdiagrambelowdescribestheencoder/decoderfortheLi
11、nearPredictiveCoding.Theparametersofthemodelaredeterminedadaptivelyfromthedataandmodeledintoabinarysequenceandtransmittedtothereceiver.Atthereceiverpoint,thespeechsignalisthesynthesizedfromthemodelandexcitationsignal.Theparametersoftheall-polefiltermodelaredeterminedfromthespeechsamplesbymeansofline
12、arprediction.TobespecifictheoutputofIheLinearPredictionfilterisPS()=工ap(k)s(nk)k=landthecorrespondingerrorbetweentheobservedsampleS(n)andthepredictedvalueAs(h)ise(h)=s(ri)一s(h)byminimizingthesumofthesquarederrorwecandeterminethepoleparameters/7(Jofthemodel.Theresultofdifferentiatingthesumabovewithre
13、specttoeachoftheparametersandequationtheresulttozero,isasepofplinearequationsP%(Z)Q(机幻=_噎(MWherem=I2.pk=whereGS(Mpresenttheautocorrelationofthesequence$()definedasNQ(M=s()s5+m)H=OtheequationabovecanbeexpressedinmatrixformasRd=一曝whereRSSaisapxpautocorrelationmatrix,GsiSaPXlautocorrelationvector,andai
14、sapx1vectorofmodelparameters.rowcol=size(data);ifcol=1data=data;endnfrane=0;msfr=round(srl(X)Ofr);%Convertmstosamplesmsfs=round(sr/1000*fs);%Convertmstosamplesduration=Iength(data);speech=filler01-preemp,1,data);%Preemphasizespeechnsoverlap=msfs-nsfr;ramp=0:1/(nsoverlap-1):1J;%Computepartofwindowfor
15、frameindex=1:msfr:duration-msfs+1%framerate=20rnsframeData=speech(frameindex:(frameIndex+ms-1);%framesize=3Omsnframe=nfrane+l;CiiitoCor=XcorriframeData);%ComputethecrosscorrelationautoCorVec=autoCor(msfs+0:LJ);TheseequationscanbesolvedinMATLBbyusingtheLevinson-Durbinalgorithm.%Levinsonsmethoderr(1)=
16、autoCorVec(I);k(l)=O;=;farindex=1:Lnumerator=/7A.*autoCorVec(index+1:-1:2);denominator=-1*err(index);k(index)=nuneratordenoninator;%PARCORcoeffsA=A+k(index)*flipud(八);k(index)J;err(index+l)=(1-k(index)2)*err(index);Thegainparameterofthefiltercanbeobtainedbytheinput-outputrelationshipasfollowPs(n)=-Z
17、a,(k)s(n一2)+Gx()k=lwhereX(n)representtheinputsequence.WecanfurthermanipulatethisequationandintermsoftheerrorsequencewehavePGx(n)=s(n)+ap(k)s(n-k)=e(n)k=thenNTNTG2x2(n)=e2(n)n=0n=0iftheinputexcitationisnormalizedtounitenergybydesign,thenN-IN-IPG2x2(n)=e25)=(0)+XaP(Z)Q(k)n=0n=()k=lwhereG2issetequaltot
18、heresidualenergyresultingfromtheleastsquareoptimization.%filterresponseifgain=0;cft=O:(1/255):1;forindex=1:Lgain=gain+aCoeffindex,nframe)*exp(-i*2*pi*cft).index;endgain=abs(!./gain);spec(:,nframe)=20*logl0(gain(l:128),;plot(20*lOg10(gain);title(nframe);drawnow;endifimplseResponse=filter(l,aCoeff(:,n
19、frame),/1zeros(l,255)J);freqResp=20*logl0(abs(ffi(ImpidseResponse);plot(freqResp);endoncetheLPCcoefficientsarecomputed,wecandetermineweathertheinputspeechframeisvoiced,andifitisindeedvoicedsound,thenwhatisthepitch.Wecandeterminethepitchbycomputingthefollowingsequenceinmatlab:P小)=W(k)%(-k)k=whwrera(k
20、)isdefinedasfollowPra(n)=aa(k)ap(i+k)k=lwhichisdefinedastheautocorrelationsequenceofthepredictioncoefficients.Thepitchiddetectedbyfindingthepeakofthenormalizedsequencere(11)(0)Inthetimeintervalcorrespondsto3to15msinthe20mssamplingframe.Ifthevalueofthispeakisatleast0.25,theframeofspeechisconsideredvo
21、icedwithaMNP)F?ypitchperiodequaltothevalueof-p,where丫()isamaximumvalue.Ifthepeakvalueislessthan0.25,theframespeechisconsideredunvoicedandthepitchwouldequaltozero.errSig=filter(lA,IJrameData);%findexcitationnoiseG(nframe)=sqrt(err(L+l);%gainautoCorErr=xcorr(errSig);%calculatepitch&voicinginformationB
22、,I=sort(autoCorErr);num=Iength(I);ifB(num-1).01*B(num)pitch(nframe)=abs(I(num)-I(num-1);elsepitch(nframe)=0;endThevalueoftheLPCcoefficients,thepitchperiod,andthetypeofexcitationarethentransmittedtothereceiver.Thedecodersynthesizesthespeechsignalbypassingtheproperexcitationthroughtheallpolefiltermode
23、lofthevocaltract.Typicallythepitchperiodrequires6bits,thegainparametersarerepresentedin5bitsafterthedynamicrangeiscompressedIogrithmaticaly,andthepredictioncoefficientsrequire8-10bitsnormallyforaccuracyreasons.ThisisveryimportantinLPCbecauseanysmallchangesinthepredictioncoefficientsresultinlargechan
24、geinthepolepositionsofthefiltermodel,whichcauseinstabilityinthemodel.ThisisovercomebyusingthePARACORmethod.ISSDeeChfr(ImeVoiCedOrUlVVoiCed?OncetheLPCcoefficientsarecompeted,wecandetermineweathertheinputspeechframeisvoiced,andifso,whatthepitchis.Ifthespeechframeisdecidedtobevoiced,animpulsetrainisemp
25、loyedtorepresentit,withnonzerotapsoccurringeverypitchperiod.Apitch-detectingalgorithmisusedinordertodeterminetocorrectpitchperiod/frequency.Theautocorrelationfunctionisusedtoestimatethepitchperiodas.However,iftheframeisunvoiced,thenwhitenoiseisusedtorepresentitandapitchperiodofT=Oistransmitted.There
26、fore,eitherwhitenoiseorimpulsetrainbecomestheexcitationoftheLPCsynthesisfilterTwotypesofLPCvocoderswereimplementedinMATLABPlainLPCVocoderdiagramisshownbelow:%LPCvocoderfunctionoutspeech=speechcoderi(inspeech)f%Parameters:%inspeech:wavedatawithsamplingrateFs%(Fscanbechangedunderneathifnecessary)%Retu
27、rns:%outspeech:wavedatawithsamplingrateFs%(codedandresynthesized)if(nargin-=1)enor(argumentcheckfailed);end;Fs=16000;%samplingrateinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCaCoeff,resid,pitch,G,parcor,stream=proclpc(inspeech,Fs,Order);%decode/SytUheSiZespeechusingLPCandimp
28、ulse-trainsasexcitationoutspeech=synlpc(aCoeff,pilch,Fs,G)results:residualplot:voiceexcitedLPCVocoder(utilizingDCTforhighcompressionrate/lowbits)theinputspeechsignalineachframeisfilteredwiththeestimatedtransferfunctionofLPCanalyzer.Thisfilteredsignaliscalledtheresidual.Toachieveahighcompressionrate,
29、thediscretecosinetransform(DCT)oftheresidualsignalcouldbeemployed.TheDCTconcentratesmostoftheenergyofthesignalinthefirstfewcoefficients.Thusonewaytocompressthesignalistotransferonlythecoefficients,whichcontainmostoftheenergy.functionOutspeech=speechcoder2(inspecch)%Parameters:%inspeech:wavedatawiths
30、amplingrateFs%(Fscanbechangedunderneathifnecessary)%Returns:%Outspeech:wavedatawithsamplingraleFs%(codedandresynthesized)if(nargin=1)crror(,argumcnicheckfailed);end;Fs=16000;%samplingraleinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCIaCoeff,resid,pitch,G,parcor.stream=proclpc
31、(inspeech,Fs,Order);%performadiscretecosinetransformontheresidualresid=dct(resid);a,b=size(resid);%onlyusethefirst50DCT-Coefficientsthiscanbedone%becausemostoftheenergyofthesignalisconservedinthesecoeffsresid=resid(1:50,:);zeros(430,b);%quantizethedataresid=uencode(resid,4);resid=udecode(resid,4);%p
32、erformaninverseDCTresid=idct(resid);%addsomenoisetothesignaltomakeitsoundbetternoise=zeros(50,b);0.01*randn(430,b);resid=resid+noise;%decode/synthesizespeechusingLPCandthecompressedresidualasexcitationOUtspeech=synlpc2(aCoeff.resid,Fs,G);resultsnoise=zeros(50,b);0.01*randn(430,b);resid=resid+noise;M
33、ATLABfiles:clearall;%osamasaraireh%speechprocessing%Dr.VetonKepuska%F1TFAll2005a=input(pleaseloadthespeechsignalasa.wavfile,s,);Inputsoundfile=a;inspeech,Fs,bitsl=Wavread(Inputsoundfile);%readthewavefileoutspeech1=speechcoder1(inspeech);%plainLPCvocoderoutspeech2=speechcoder2(inspeech);%Voiceexcitde
34、dLPCvocoder%plotresultsfigure(l);subplot(3,l,l);lot(inseech);grid;subplot(3,l,2);lot(outspeechl);grid;subplot(3,l,3);lot(outspeech2);grid;dis(,Pressanykeytoplaytheoriginalsoundfile);pause;soundsc(inspeech,Fs);disp(PressanykeytoplaytheLPCcompressedfile!);pause;soundsc(outspeech1,Fs);dis(,Pressakeytop
35、laythevoice-excitedLPCcompressedsound!*);pause;soundsc(outspeech2,Fs);functionaCoeff,resid,pitch,G,parcor.streamJ=proclpc(data,sr,L,fr,fs,preemp)%L-Theorderoftheanalysis.%fr-Frametimeincrement,inms.Defaultsto20ms%fs-Framesizeinms.%aCoeff-TheLPCanalysisresults,%resid-TheLPCresidual,%pitch-calculatedb
36、yfindingthepeakintheresidualsautocorrelation%fbreachframe.%G-TheLPCgaintoreachframe.%parcor-Theparcorcoefficients.%stream-TheLPCanalysisresidualorexcitationsignalasonelongvector.if(nargin3),L=10;endif(nargin4),fr=20;endif(nargin5),fs=30;endif(nargin.01*B(num)pitch(nframe)=abs(I(num)-I(num-1);elsepit
37、ch(nframe)=O;end%improvethecompressedsoundqualityresid(:,nframe)=crrSigG(nframe);if(frameindex=I)%addresidualframesusingatrapezoidalwindowstream=resid(1:msfr,nframe);elsestream=stream;overlap+resid(1:msoverlap,nframe).*ramp;resid(msoverlap+kmsfr.nframe);endif(framelndcx+msfr+msfs-1duration)stream=st
38、ream;resid(msfr+l:msfs,nframe);elseoverlap=resid(msfr+1:msfs,nframe).*flipud(ramp);endendstream=filter(1,1-preemp,stream),;SpeechModelone1.PCVocoder:functionOutspeech=speechcoder1(inspeech)%Parameters:%inspeech:wavedatawithsamplingrateFs%outputs:%Outspeech:wavedatawithsamplingrateFs%(codedandresynth
39、esized)if(nargin=1)error(argumentcheckfailed);end:Fs=8000;%samplingrateinHertz(Hz)Order=10;%orderofthemodelusedbyLPC%encodedthespeechusingLPCaCoeff,resid,pitch,G,arcor,stream=roclc(inseech,Fs,Order);%decode/synthesizespeechusingLPCandimpulse-trainsasexcitationOutspeech=synlpc(aCoeff,pitch,Fs,G);%Voi
40、ce-excitedLPCvocoderfunctionOutspeech=speechcodcr2(inspeech)%Parameters:%inspeech:wavedatawithsamplingrateFs%(Fscanbechangedunderneathifnecessary)%output:%Outspeech:wavedatawithsamplingrateFs%(codedandresynthesized)if(nargin=1)error(argumentcheckfailed);end;Fs=16000;%samplingrateinHertz(Hz)Order=10;
41、%orderofthemodelusedbyLPC%encodedthespeechusingLPCIaCoeff,resid,pitch,G,parcor,stream=proclpc(inspeech,Fs,Order);%performadiscretecosinetransformontheresidualresid=dct(resid);a,b=size(resid);%onlyusethefirst50DCT-Coefficientsthiscanbedone%becausemostoftheenergyofthesignalisconservedinthesecoeffsresi
42、d=resid(1:50,:);zeros(430,b)1;%quantizethedataresid=uencode(resid,4);resid=UdeCOde(resid,4);%performaninverseDCTresid=idct(resid);%addsomenoisetothesignaltomakeitsoundbetternoise=zeros(50,b);0.0l*randn(430,b);resid=resid+noise;%decode/SynlheSiZespeechusingLPCandthecompressedresidualasexcitationOutspeech=s