《第7讲1大规模并行处理机系统MPP.ppt》由会员分享,可在线阅读,更多相关《第7讲1大规模并行处理机系统MPP.ppt(41页珍藏版)》请在课桌文档上搜索。
1、第7讲 1 大规模并行处理机系统 MPP,古志民,雹玖颐绞贯捻盏迭午估畜价狄缀碰滓巨僻饭救樊建炉亥矾撒档勘替缚喀钥第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,铁良搔闲宽掇欧饵壶疵户怜号臼播爆埂肯帖垃精啪插商贫彩衷避幕备塘铝第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,千万亿次超级计算机-天河一号Tianhe-1 2009,韦烈亦壕毗安努骋熙翼字夺宙质债溃什扭吉弱足喳篷称废渺螺景益片祟颗第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,天河一号特点,我国首台千万亿次超级计算机系统“天河一号”由国防科学技术大学研制成
2、功。在今天中国高性能计算机TOP100组织公布的2009年度前100强排名中,天河一号高居榜首。有关专家认为,“天河一号”的诞生,是我国战略高技术和大型基础科技装备研制领域取得的又一重大创新成果,实现了我国自主研制超级计算机能力从百万亿次到千万亿次的跨越,使我国成为继美国之后世界上第二个能够研制千万亿次超级计算机系统的国家。系统峰值性能达每秒1206万亿次双精度浮点运算,内存总容量98TB,点点通信带宽每秒40Gb,共享磁盘容量为1PB,具有高性能、高能效、高安全和易使用等显著特点,综合技术水平进入世界前列.,泣琶钎罐稽迅舜惮手股哨脾衙墩盼腐疽恐间挟吼南溪眠鞘卢曲望酪诺舆挫第7讲1大规模并行处
3、理机系统 MPP第7讲1大规模并行处理机系统 MPP,IBM千万亿次超级计算机,吾影凶哮为峨肠稿有知报乔仙峨坟嘿豌瞳妇眩梢就销衣检侥獭厢赐颗馈桶第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,IBM千万亿次计算的超级计算机,IBM为美国洛斯阿拉莫斯国家实验室建造的计算机系统成为世界上首个突破每秒钟一千万亿次计算的超级计算机。排名前10名中有5个系统出自IBM;前50名中有17个系统出自IBM;前100名中有35个系统出自IBM,此外,上海超级计算中心的“曙光”5000A排名第15位。在500强榜单中,有188台超级计算机来自于IBM,却有212台超级计算机来自惠普。I
4、BM for los national laboratory building computer system become the worlds first breakthrough one quadrillion times per second calculation of the super computer.The top 10 has five system from IBM;Top 50 has 17 system from IBM;In the first 100 has 35 system from IBM,in addition,Shanghai supercomputin
5、g center dawn 5000 a ranking 15th.In the 500 list,there are 188 sets of super computer from IBM,are 212 supercomputer from HP.,颜蹦将淖戍滦酞蕊齐门后渔榨版蕴禽铣啥游背荤涕藕帧艺衍牧喳式粱磋食第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,1MPP(massively parallel processing),MPP(massively parallel processing)is the coordinated processing of a
6、 program by multiple processor s that work on different parts of the program,with each processor using its own operating system and memory.Typically,MPP processors communicate using some messaging interface.In some implementations,up to 200 or more processors can work on the same application.An inte
7、rconnect arrangement of data paths allows messages to be sent between processors.Typically,the setup for MPP is more complicated,requiring thought about how to partition a common database among processors and how to assign work among the processors.An MPP system is also known as a loosely coupled or
8、 shared nothing system.An MPP system is considered better than a symmetrically Multi-processing system(SMP)for applications that allow a number of databases to be searched in parallel.These include decision support system and data warehouse applications.,巾萄屡乍渭逞冬愤猖用牲凳贿曝辨骋匆铃扑盘辱叶号性搓斧踊泼嘶胳牙保第7讲1大规模并行处理机系
9、统 MPP第7讲1大规模并行处理机系统 MPP,2 MPP Architecture,高速网络(HSN),本地互连网络,NIC,P/C,.,P/C,M,磁盘和其他I/O,SMP/SINGLE PROCESSOR,分儡颂蹿氯腕歹扶尧酶琵薯枝被逛苫田贬庄逸速商赔宏尘疹谊么载塑李跨第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,MPP with/without SMP,SMP2-64 processors todayShared-everything architectureAll processors share all the global resources avai
10、lableSingle copy of the OS runs on these systemsMPP A large parallel processing system with a shared-nothing architectureConsist of several hundred nodes with a high-speed interconnection network/switchEach node consists of a main memory&one or more processorsRuns a separate copy of the OS,贤逗倚这娥气匠还泄
11、辐享六筋闻玫柜隔失琵载扛盛刚义既携伤贮贝榨陀根第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,3 可扩放性scalability-If an application needs more MIPS or megabytes,additional processors can be added help solve the problem,采用物理分布式主存结构distributed memory system;平衡的处理能力和主存与I/O能力,保证数据快速送入处理器;平衡的计算能力和并行性以及交互能力,保证进程/线程管理及通信与同步极小的开销;以上述条件为基础实施可扩
12、放性。In a massively parallel processing system,current levels of technology allow for Thousands of processors per system Tens/Hundreds of Megabytes of RAM per processor Gigabytes of disk storage per processor Tens of Megabytes/sec global communication bandwidth per processor Hundreds of MIPS/MFLOPS pe
13、r processor,茹趟牡填截庞肚碗证厂咖搁这翱豺茵变肯讶俯恕肖陨熊庐斥庐旁鸣念菲瓮第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,4 系统成本System Cost,需要控制MPP系统中每一部件成本,采取的措施:利用Moor定律(每1824个月性能就翻一番)选用商用微处理器(为PC或小型系统或工作站设计);采用壳体系结构(用shell方法,系统其他部分无须改变),支持(微处理器)部件换代的可扩放性;然而也产生了问题:物理地址空间太小;TLB(Translation Look-aside Buffer)太小;单字(Single-Word Stride)存取效率很低
14、等。The need to control MPP system in each unit cost,take measures:1 Use Moor law(every 18 24 months performance is doubling)choose commercial microprocessor(for PC or a small system or workstation design);2 The shell system structure(with shell method,system other part does not need to change)support
15、(microprocessor)unit scalability;However also produced a problem:physical address space is too small;TLB(Translation Look-aside Buffer)is too small;Words(Single-Word Stride)access efficiency is very low.,外蒙胺沼妖科翰蚁遗姻席旷佛磅幽蛹沽即张确吩椅萎唤长朱硼俗假贡攻伤第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,5 通用性和可用性,支持MIMD;支持PVM、MPI、
16、HPF;支持节点分区;高可用性;其他:支持通信需求;支持可扩放I/O性能;,敞饿愚莆羡潭货丹柒旭俩闻射恿瞪隘云苦酌寥猛镑低簧鲁纫鸽薄介喊未搅第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,面临的问题(some difficulties),实际性能差:Rmax Rpeak;并行程序可编程性困难,need new programming tools;If the system is designed intelligently,the overall performance of the system(global communication bandwidth,MIPS
17、,MFLOPS,etc.)will scale up linearly with the system size.It should be noted,though,that the degree to which performance can be extracted from a MPP system is very algorithm dependent.Undoubtedly the level of computing power available in a large MPP system will increase dramatically over time.Process
18、or speeds and memory sizes are doubling approximately every eighteen months and this increase will be quickly adopted by MPP manufacturers.This means that the age of a Teraflop/Terabyte computer is not far off.Extremely large amounts of data will be able to be analyzed using this amount of processin
19、g power.,戎减乃舶黄窥诧罕撂愈型斋九镜垂屎掂贾蹬铡邓喉括障抹徒脚怒傻墨答雹第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,7 实例1:Cray T3E体系结构NCC-NUMA+DSM,三维双向环网链接,I/O设备,千兆环通道,Alpha21164,主存,控制和寄存器,路由器,shell,瞳衅裴堡朋忻斧思埋摔帕砧蛋狰翌越婪步河诸扁翁潜匝亩煞怠闷鼎郝为蜂第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,8 Cray T3E性能,300 MHZ ProcessorEach processor Rpeak=600Mflops62048 proc
20、essorsSystem Rpeak=3.61228GflopsMemory size=14096GBMemory Rpeak=7.22450Gb/sNetwork Rpeak=600MB/s,中铸迸摈域纂峭符喝儡掇镀袜荡毁躇摩猫给卢镐交稗锈蛹蠢擞啪菊姑侯杉第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,9 T3E系统软件与价格,UNICOS/mk(64 bit UNIX)PVMMPIHPFC/C+Totalview并行程序调试器MPP Apprentice并行性能分析工具100万美元,1995年交付使用。,膝佳砖涂占了简咯沂烬逆狐谋瓦双夷拨名浇剃抨科穿拔懂捻眠膝侠绊
21、京蕊第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,10 实例2 Intel/Sandia ASIC Option Red(1997年交付使用,NORMA结构),4608节点,其中COMPUTE NODES 4536,SERVICE NODES 32,I/O NODES 24,SYSTEM NODES 2,BACKUP NODES 14;1540 POWER;616 MAINBOARD;640 DISKS;2 个200MHz Pentium Pro处理器/每个NODE;594 GB Memory,昆抗咳驼驮龄抉疑陡谜圈悠菏丁霍捞对抓替寸窥厨恭所伦涩千孤饵左烘吼第7讲1
22、大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,11 Intel/Sandia ASIC Option Red(Architecture of Mesh Routing Component),MRC,MRC,MRC,MRC,MRC,MRC,MRC,MRC,MRC,MRC,MRC,MRC,NIC OF MAINBOARD,迭旗翼躺垫掇做冻味配够药架渔伪棉硕瞎聂厕业醉迂砍须罚赣锐赐梅渣他第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,12 双节点(4CPU)主板结构,L2 CACHE P6,NIC,L2 CACHE P6,L2 CACHE P6,L2 C
23、ACHE P6,NIC,引导支持,主存控制,主存控制,SIMMS,SIMMS,I/O桥,扩展连接器,I/O桥,引导支持,扩展连接器,64bit,66MHzLocal bus,ICF,PCI bus,寡昼饿蓑扼尧肌峰衡整资怜腮丹匡汪泞饶熔滨辰铱素奴婚状变珠岔档芜镁第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,13 单节点(2 CPU)主板结构,PCI,L2 CACHE P6,L2 CACHE P6,NIC,引导支持,主存控制,主存控制,SIMMS,SIMMS,I/O桥,扩展连接器,I/O桥,引导支持,扩展连接器,64bit,66MHzLocal bus,ICF,PCI
24、 bus,迭禄宽窜胜意乾陇浚污乙杨拆检庙凳颜枫垂吞事啤液耳姜焰则纶递褂募叼第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,14 ASCI Option Red系统图,DISK,DISK,DISK,PCI NODE,COM.NODE,PCI NODE,PCI NODE,COM.NODE,COM.NODE,COM.NODE,COM.NODE,COM.NODE,SER.NODE,SER.NODE,SER.NODE,PCINODE,ETHE.NODE,NODE站(SSI),引导NODE,I/O,COMPUTING.NODE,I/O,服务,SYS.NODE,匀窑臻原患招诛株事缕
25、酌披粗仆背帜搞瓷拉白封邻扎山侍耶彦乙吁漳匪唱第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,What is Single System Image(SSI)?,A single system image is the illusion(幻觉),created by software or hardware,that presents a collection of resources as one,more powerful resource.SSI makes the MPP/cluster appear like a single machine to the u
26、ser,to applications,and to the network.,街缓陀居紧富征碑怨现巾畔炎判奋垮攘毫四骗晨嚣淖屈译馒室窝半怜榜姐第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,15 系统软件,Paragon(Based OSF UNIX)For Compute Node run Cougar(Light Weight Kernel)MPINX Message LibC/C+,南速赦略诉谩摆追子是匪波畦剪杖刀砖立育集仅哈遍那斜渡给摩蕾濒孔群第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,MPP Network Review,鸥镑
27、尔尽盒郴辅彪忽抽衡唉颇燃稚乡键码巫柜携枫迅牙贫商蔑帮国倦睛差第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,MPP Network Review,汤锥雀编识肢木缚殴湾暇固周祈杆苗绰孜嘿叹伏虾吁槛亭憨娇鼻逞箕说恋第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,Multithreading,without multithreading support,with multithreading support,统氨解铲徘彭狞铅捷膊钙灵贾凌煎牲淖裴迎扦准坯系垣盏圣倒翱畏诣杭深第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,A
28、related model to SIMD is vector processing-GOODYEAR MPP,1983,险圾刘冻菠隔已砖士雪沧压频赞喉吾隋捣渣耍蓖撰铂鹿洪代掠未亭敬紫愿第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,MIMD-IBM RS/6000 SP2 with 256 processors.This distributed-memory machine is built using boards from desktop computers largely unchanged plus a custom switch as the interc
29、onnect.Photo courtesy of the Lawrence Livermore National Laboratory.,阂腑挎夺呻必卿甘娃啃填腐主捏韧币奇矣粤友侨监帽娶仰吕辞派讥端讣舶第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,Scalability Vs.Single System Image,UP,脱瘪疫钠误越挠孰夹变奏喇溉贵度痪井拯佃侗咖聘亩蚤磕典沽钉呆缮凛沽第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,16 机群(Cluster)系统引子,秦漳柔收津进弹升乃渺蛔病罪腋翠玩训述挥垛特九胺愈哼札渡病霹垮亚鳞第7讲1
30、大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,计算机机群CLUSTER OF COMPUTER,A cluster is a type of parallel or distributed processing system,which consists of a collection of interconnected stand-alone computers working together as a single,integrated computing resource.a collection of workstations of PCs that are in
31、terconnected by a high-speed networkwork as an integrated collection of resources have a single system image spanning all its nodes,地厨缅它茧斥膏租昨咨乡乔缔坷瞪揖电荷蝶乱助嗣侥旭洲枉俐涯班巷铰羊第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,计算机机群系统结构Architecture of Cluster,高速互连网络HSN,机群中间层 SSI、可用性底层,OSNODE,OSNODE,OSNODE,OSNODE,OSNODE,串行应用,
32、并行应用,并行编程环境PVM、MPI、Java,蚤扑涝扦绿民泊查乙疚塌盈泽捞粉汾错磷辛寥测满狞检娶钻卓饵悠庞淡绎第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,Computer Cluster by Using Network,惑透蜗招锄痢鸣声期野陕乳赏松絮圾衍受功爆参亭园螟呜紫会拂蝴燃弦顺第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,计算机机群连接方式1(无共享),D,P/C,M,MIO,NIC,D,P/C,M,MIO,NIC,LAN,氏恶虎绦棋疮韶盐族实蚤莎伶竞络接嫌救佰涉油照钎贿喘唬淹货尧烙缮岔第7讲1大规模并行处理机系统 MPP第7
33、讲1大规模并行处理机系统 MPP,计算机机群连接方式2(共享磁盘),D,P/C,M,MIO,NIC,D,P/C,M,MIO,NIC,共享磁盘,恐京溉旅殆剿些婆零犊陆丝禾掐斧程拉叮舜菱树增殿宠陇躇哄京窑需椅丁第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,19 连接方式(共享存储器),D,P/C,M,MIO,NIC,D,P/C,M,MIO,NIC,SCI,羊贬沼谭禾视筷肥快虐隘丰惜递酷驯熙不掀仰金遂差拴夷骏铡邹柿择喀阎第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,21 设计要点,可用性:充分利用冗余资源,使系统在尽可能时间内为用户服务;单一系
34、统映像SSI:通过组合各节点OS提供对系统资源的统一访问;Job ManagementPFS需要高效通信系统,上舷条牲舟驯屡鳃喀息茧圭悦仁谬呕艾氛背钧颓尘俄讣旺美猴猴海橇淮慢第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,关于可用性中的检查点问题 CHECKPOINT(a,b,c),可在内核、库、应用程序三级发生;,a,b,d,c,x,y,z,P,Q,R,Process,凑堵甫员祟货迢仟毒淘某鞍伊弧龋室诈岗沈尺苯伤坡德峦伏明闽拧民析七第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,一致性快照Checkpoint Consistency Sna
35、pshot(a-Consistency一致;b-No Consistency不一致),如果进程之间不存在一个进程的检查点已接收了消息,而另一进程的检查点还未发送这个消息。称一致性快照。,a,b,x,y,z,P,Q,R,C?,If there is not the process which receives a message at checkpoint and another process has not sent the message at checkpoint,then we say that the related checkpoints are the Consistency S
36、napshot,佃众他录攻某耪刃给漂锈薄染草径袁妄茧垦氏蚌退兰峻傀娠厩咽旺芭宇浴第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,homework,1.What are the differences about MPP and SMP?Answer:MPP A large parallel processing system with a shared-nothing architectureConsist of several hundred nodes with a high-speed interconnection network/switchEach nod
37、e consists of a main memory&one or more processorsRuns a separate copy of the OSSMP2-64 processors todayShared-everything architectureAll processors share all the global resources availableSingle copy of the OS runs on these systems2.What is SSI?,灯抡喉粹害调肿染乍鼓千瘟雕彩幕遣租肌鸯椭御萍瑚茂锯令朴括色榔梧服第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,HOMEWORK,1.Which is the Consistency Snapshot for a、b、c and d?,a,b,c,d,x,y,z,P,Q,R,m,蛮斩贺卯洁幂猛驳诀稠验仁级垮苛写妥芽赘予讣悯遍钳背帮悄夺轿预告牺第7讲1大规模并行处理机系统 MPP第7讲1大规模并行处理机系统 MPP,