《谷歌-教育生成式人工智能开发技术报告 2024.docx》由会员分享,可在线阅读,更多相关《谷歌-教育生成式人工智能开发技术报告 2024.docx(84页珍藏版)》请在课桌文档上搜索。
1、TowardsResponsibleDevelopmentofGenerativeAlforEducation:AnEvaluation-DrivenApproachIrinaJurenkavt1,FAarkusKunesch-t.,KevinMcKeeyDanielGillickgi1ShaojianZhut1,SaraWiltberge省ShubhamMilindPhal1,KatherineHermann1,DanielKasenborgslAvishkarBhoopchand1,AnkitAnand1,MirunaPislarilStephanieChan,t1.isaWang.Jen
2、niferShe.ParsaMahmoudieh1lAJiyaRysbek1tWei-JenK3,AndreaHuber.BrettWiltshire1,GalElklant2,RoniRabln2,JasminRublnovltzt-4.AmitPitaru4,MacMcA)ltster3.JuliaV/llkowskP,DavidChol,RoeeEngelberg2,1.ldanHackmon2,Adva1.evln2tRachelGrlftin5,MichaelSears5,FilipBaEMIaMesarManaJabbour3fArslanChaudhry1,JamesCohan3
3、.SrldharThiagaraja11,Nir1.evine,.BenBrowm.DilanGorur.SvetlanaGrant1,RachelHashimoshoni3.1.auraWeidinger1,JieruHu1,DawnChen3,KubaDoleckt3,CanferAkbulut19MaxwellBileschi1y1.auraCulp,Wen-XinDong3.NahemaMarchal1.KelsieVanDema114,HemaBaiajMisra3.MichaelDuahslMoranAmbar2.AviCaciularu?,Sandra1.efda,ChrisSu
4、mmerBeIdTyJamesAnPierre-AlexandreKamienny1tAbhinitMohdi3,TbeofilosStrinopoulous3.AnnieHaleWayneAnderson5.1.uisC.CoboilNivEfront2.MukthaAnanda3.ShakirMohameda,MaureenHeymam3,ZoubinGhahramani1,Yo$lMatias2,BenGomes3and1.ilaIbrahim1EqualOonMbutone,tTechnicalaJ,:Researchlead.5Worketreamlead.,6cogDpMM*IlM
5、(Mw*aMtf4ltf*aU.l4*t4rMC4*Mrwt*r*44Ke,n,山*r,nxX,y4r*g-,AfMlSl4llMlTtf274*f.UTAC(C*O)VMrTnx.*:l*t4.rMMbcm公“);lOf.TvGtr,C*,,联,G,*CharAtO-,x*Participation:1.earnerfeedbackIwoulddescbeitasahelpfulfriendthatknowsalotaboutonesubjectthatcanhefpyouIeamthedass.一1.eart1.MTOStudyHallUserUg,x*f*e*a,zT-(fctlrlo*
6、nt“41*1,a.*,fcr,(“Gnem4rv*vi*M*ttkfrwltt*%*m1rv4*,“aUUltammIkMd,I)mmkt,CWv-mlMr1.Mt4M*H?IeUJ-WlMM41k4(,!(QMMbMAFigure11.earn1.M-TutorDevelopmentoverviewotourapproachtoresponsibledevelopmento1geAltoreducation.Bofdaw心showthedevelopmenttlow,do*Ma”gstheSIOfmatlonnow.OrapproachSlanSandendswith西mCgat6.Wes
7、tartbyansweringtheqestkx毋ofwhoarewtryingtohelp?*,whatdotycareabout9,whoarealltherelevantstakeholders?,andbringthemintoourdevelopmentprocess.Thisinformstheprioritisationofourmode?improvementsworkandtheJeelopmetofourmpnsMsediteration100PFinalMWeusethdeploymentofour11xxilstorealuserstofurtherinformourr
8、esearchanddevelopmentwork,andtofeedbackintothePaeepaMnstageWBusthisapproachtodevelop1.earn1.MTutor,aconversationalAltutor.Evaluation(teacherpreferences):oneofsevenevaluationbecmarksncroicdinthisrepod.ItShowSthateducatorsprefer1.eam1.M-T!oroverPrOmPd1baseGemiN1.OonthemaotyotmeasuredPedagoglCaiattribu
9、tes.Deployment(ASUStudyHall):exampleconversationbetween1.earn1.MTutorandanASUStudyHaestudentevcdintheIntroductiontoProgrammingcourse.Participation(learnerfeedback):aninterviewquoteIromanASUStudyHallstudentwhohasused1.eam1.M-Tutorduringtheircourse.Weuseinterviewstogetqualitativefeedbackontheefficacya
10、ndsafetyofthetutor.Oneofthekeychallengesfacingtheworldisthelackofuniversalandequitableaccesstoqualityeducation2J.Educationisakeyeconomicdriver3andafacilitatorotupwardsocialmobility4:however,evenbeforetheCOVID-19pandemic,53%ofallten-year-oldchildreninIow-tomiddle-incomecountrieswereexperiencinglearni
11、ngpoverty5,and40%ofUSschooldistrictleadsdescribedtheirteachershortagesas-severe*orverysevere6.Thelong-standingproblemswitheducationalattainmentandteacherretentionhavebeenfurtherexacerbatedbythepandemic,disproportionatelyaffectingthosefromlessprivilegedbackgrounds5.6.TheriseingenAlthatfollowedthepand
12、emichasbeenmetwithmixedreactions.Ontheonehand,itappearstoholdsomepromisetodemocratiseaesstoknowledgeandeducation:studentsareearlyadoptersandtopusersofthetechnology7.andgenAlisdominatingtheEdTechlandscape8.Ontheotherhand,severalncershavebeenraisedaboutthemisuseofthistechnologyineducationalsettings7,9
13、.Forexample,thegenAlmodelsthatpowermostofthelatestEdTechsystemsarenotexplicitlyoptimisedforpedagogy.Instead,modelsaretrainedtobehelpful*10-14,butthisspecificdefinitionofhelpfulnessmayoftenbeatoddswithpedagogyandlearning.Forexample,studentscaneasilygetdirectanswerstohomeworkassignmentsinsteadofworkin
14、gthroughthemforthemselvestogettheintendedpractice.TheavailabilityofwhatappearstobeexpertinformationbypromptingagenAlmodelforanansweralsogivesstudentsanillusionofmasterybeforeithasbeenachieved,whichmayeventuallyleadtoproblemsintheworkplace(9,15.ThisreportdescribesourfirststepstowardsoptimisinggenAlfo
15、reducationalusecases.Inparticular,wefocuson1:1conversationaltutoring,andproposeacomprehensiveevaluationprotolforthisusecase.Wefocusonconversationaltutoringbecausewebelievethatitisoneofthemostimpactfulandgeneralusecases,andbecauseitrequirestheintegrationofmanyimportanteducationalcapabilitiesintoasing
16、lesystem.AnexcellentconversationalAltutorhasthepotentialtoenhancetheeducationalexperienceofbothlearners(byprovidingthemwithinstantfeedbackandadaptingtotheirindividualneeds)andteachers(bymultiplyingtheirimpactandlighteningtheirworkload).Wefocusonevaluation,becauseitisclearthatasharedframeworkacross(a
17、ndevenwithin)learningscience(seeSection3.1),EdTech(seeSection3.2),andAlforEducation(seeSection4.2)islacking,andsuchaframeworkwouldlikelyenableprogressmorethananysingleproduct.Furthermore,effectivemeasuresofpedagogicalsuccessareaprerequisiteforoptimisingAlsolutions,whichneedsuchsignalsfor-hill-climbi
18、ng.Ourmaincontributionsarethefollowing:1. WedescribeourapproachtoresponsibledevelopmentofAlforeducation(Figure1).whichisinformedbyIheethicsandpolicyliterature16-26.Weemphasiseaparticipatory(Section2)andmultidisciplinaryapproachtoresearch,bringingtogetherexpertsinpedagogy,cognitivescience.Al.engineer
19、ing,ethics,andpolicy,aswellastheultimateStakeholders-StudentsandteacherstotranslateinsightsfromlearningscienceintopragmaticandusefulpedagogicalimprovementsofGemini1.0(10)foreducation.2. Weintroduce1.eam1.M-Tutor,anewtext-basedgenAltutorbasedonGemini1.0,furtherfinetunedfor1:1conversationalIutoring(Se
20、ction3),andshowthatwimproveitseducation-relatedcapabilitiesoveraprompttunedGemini1.0.3. Wedevelopacomprehensivesuiteofsevenpedagogicalbenchmarks(quantitativeandqualitative.andusingbothhumanandautomaticevaluations;Figure2)intendedforassessingtheperformanceofconversationalAltutorsfromvariousangles.Asa
21、casestudy,weapptytheseevaluationstoaprompttuned1Gemini1.0and1.eam1.M-Tutor,providingaportfol100fevidenceforpedagogicalprogress.Wealsodiscussexamplesofmoretargetedevaluationsanddescribehowweusethemtodevelopspecificeducationalcapabilitiesfor1.earn1.M-Tutorllikeevaluativepractice(Section8.1)andfeedback
22、onproceduralhomeworkproblems(Section8.2).OurcomprehensiveapproachgoesbeyondaddressingthemorecommonquestionofDoesitwork?(quantitativeresearch),toalsoincludeHowandwhydoesitwork?,(qualitativeresearch)andWillitworkforeveryone?(participatoryresearch),inlinewiththerecommendationsinDataRatings5tbti*l3ftwCd
23、KkFigure2OVefVieWoftheevaluationtaxonomyintroducedInSection4.3.2thatunderpinstheSeVenpeiferentbenchmarksprovideamifyouhaveanyimmediatesuggeSlionSorfeedback,orviathisfxmforamoreformalresearchCollatxxation.2. ParticipatoryapproachThissectiondetailstheparticipatoryelementsthathelpedshapethisproject,inc
24、ludingthedesignofourevaluativeapproach,andourgoalsindeveloping1.eam1.M-Tutor.WefirmlybelievethatresponsibledevelopmentofeducationalAlsystemsrequiresengaginglearners,educators,policymakers,andacademicresearchers(27,toensurethatIheresultingsystemsalignwiththeirneeds,values,andaspirations28.29.Weutilis
25、ediverseparticipatoryresearchmethods,includingworkshops,-designexercises,semi-structuredinterviews,anduserstudies,inacollaborativeanditerativedevelopmentprocess-Thsrepxtdescribespreviously11pub)isedwork,seeTombazzietal.30forattree-partarticlesenesonAlandtheFutureot1.earnngbyTheRSAandGocgleDeepMind.I
26、nthisreporteachparticipantisassignedanumericalidentifier(P1throughP116).ThisincludesPaniCipantSfromourworkshops(Pl-P94),initialinterviews(P95P97).co-designactivities(P98-P106),anduserstudiesdescribedinSection7(PI07-116).2.1. Participatoryworkshops:ImaginingandcritiquingthefutureofeducationandAlWecon
27、ductedtwoparticipatoryworkshopsintheUK:onewithlearners,primarilyuniversitystudentscomingfromdiverseacademicbackgrounds(n=60),andanotherwitheducators,mainlyhighschoolteachersspecialisinginSTEMsubjects(11=34).Thechoiceoftheparticipantdemographicswasdictatedbypracticalconsiderations.Werealisethatfuture
28、workisneededtoexpandourreachtobroadercommunities,sincelearnersintheUKandotherWEIRDWestern.Educated.Industfiafcsed,Rich,Democfalic(WEIRD)countries(31)areoftenover-representedinpsychologicalstudies,despilenotbeingrepresentativeofIheglobalpopulation.countrieslikelyencounterfewerbarrierstoaccessinggenAl
29、tools,andperspectivesonAlineducationlikelydiffersubstantiallyacrossculturalcontexts.Followingestablishedbestpracticesforparticipatoryworkshops32,weemployedstructuredactivitiestofosterinteraction,collaborativelearning,andgroupcohesion(seeSectionB.1formoredetails).Participantsweredividedintosmallgroup
30、soffivetoeightindividualsandengagedintwokeyexercises: Groundingexercise:Thisactivityexploredparticipantseducationalexperiences,revealingcurrentneeds,challenges,andpotentialareasforimprovementregardinggenAltools. Speculativedesign:Thisexerciseencouragedparticipantstoenvisionascenarioinvolvingalearner
31、facingvariouschallenges.Throughcollaborativebrainstorming,theyexploredhowAlandsocialfactorscouldexacerbateormitigatethesechallenges.Theseworkshopshighlightedcunentchallengesineducation:learnersstrugglewithtimemanagement,cognitiveoverload,anddemotivationwhentheyperceivetheirlearningmaterialsasirrelev
32、ant:whileeducatorsstruggletoprovidepersonalisedattentionandfeedbackinclassroomsettings.Personalisedtutoring,byAlorhumans,wasvaluedbybothlearnersandeducators.Tutorsareespeciallyeffectivewhentheyhaveknowledgeofthelearnerandcanadapttheirapproachaccdingty.1.earnersfeltmorecomfortableseekingclarification
33、sfromAltutorsthanhumantutors,perceivingAltutorsaslessformalandlesslikelytoinducefearsofjudgement.AsharedlimitationofbothhumanandAltutorswastheirlackoffamiliaritywiththenuancesofparticularsyllabiorexamboardrequirements.1.earnersintheworkshopwereoftenstrongadoptersofgenAl.Whileawareofitslimitations,th
34、eytendedtobehappytoworkaroundthem.Educatorsweremoresceptical,citingworriesabouthallucinations,thepotentialforcheating,andthelackofadaptationtothlearnerslevelandcognitiveloadingenAswall-of-text*responses.BothgroupssawimmediatebenefitsofgenAltools,suchasfromgeneratingpracticequestions,critiquingandgen
35、eratingideas,andsummarisingcontent.Asharedvisionforthefutureofeducationemerged,emphasisingtheroleofpersonalisedAltutorsinenablingflexible,cross-disciplinary,andrelevantlearningopportunities.AcJditionaIIy,virtualandaugmentedrealitytechnologieswereseenasbeneficialthroughenhancedimmersion.Educatorsdesi
36、redreal-timefeedbackandactionableinsightsfromAltoolstoimproveteaching.TheyalsocautionedagainstafuturewherelearnersbecomedependentonAlandlosetheirautonomy.WhenaskediftheyfeltthreatenedbyAl.educatorsexpressedconfidencethattherewouldalwaysbearoleforhumansintheprocessofteachingandviewedgenAlasapositivetooltoassistthem,freeingupmoretimeformeaningfulinteractionswiththeirstudents.2.2. Understandinglearningexperiences:InitialinterviewsandWizard-Of-OzsessionsToinitiateouriterativeparticipatorydesignprocessfor1.earn1.M-Tutor,v/eco