...

Antoni Portero Alberto Scionti Marco Solinas Ho Nam Roberto Giorgi

by user

on
Category: Documents
11

views

Report

Comments

Transcript

Antoni Portero Alberto Scionti Marco Solinas Ho Nam Roberto Giorgi
Simulation infrastructure for the next kilo
-x86-64 Data-Flow Processor
Antoni Portero
Alberto Scionti Marco Solinas Ho Nam
Roberto Giorgi
Dipartimento di Ingegneria
dell’Informazione
Via Roma, 56 53100 SIENA - Italy
Abstract:
The paper proposes a simulation framework based on the COTSon infrastructure, able to create thousands of virtual x86-64 cores. The framework
offers a full-system architectural simulator and a well balanced trade-off between simulation speed and accuracy. Experimental outcomes demonstrates for our framework the possibility correctly simulate a large manycore machine.
KEYWORDS: Performance Analysis and Design Aids, Simulation, Verification, Verification, Worst- case analysis.
Figure . Host system versus Virtual system
SIMnow: NODE Architecture 8, 16 and 32 cores
Simulated platform
The host machine is a DL-Proliant DL585 G7 AMD Opteron™ 6200 Series with 4 processors and 16 cores available per processor, so, in total is equipped with 64 cores coupled to
1TB-DRAM of main memory.
A Virtualized Machine instance is 64 nodes of 16 x86-64 cores each based on AMD Opteron-L1_JH-F0 (800Mhz) architecture, and 256M DRAM per core. Figure 1 depicts the
system host and guest systems.
Conclusions
The paper presents a simulation framework based on x86-64 instruction set. It has been modified
to support ISA extensions(DF-Threads execution)[12] . With the proposed simulation framework
we are able to simulate a system composed of more than 7000 x86-64 cores and their corresponding communication infrastructure. The proposed framework serves to find the bottle-necks
of the target system, and allow
Booting 1000 thousand cores with SIMNow+COTSon – an instance of
the TERAFLUX TBM (32 nodes x 32 cores)
ACKNOWLEDGEMENTS
This work was partly funded by the European FP7 projects TERAFLUX id. 249013 http://
www.teraflux.eu, ERA (Embedded Reconfigurable Architectures) id. 249059 (FP7) http://era-
References
Number of Virtual Cores vs Memory utilization in HP ProLiant DL585 G7 Server (1 TB Memory , 64 x86-64 cores).
[1] AMD SimNow Simulator 4.6.1 User’s Manual, November 2009.
[2] F. Bellard. Qemu, a fast and portable dynamic translator. In Proceedings of the 2005 USENIX Annual Technical Conference, 2005.
[3] E. Argollo et al. Cotson infrastructure for full system simulation. Operating Systems Rev, 43:52–61, 2009.
[4] S. Li and et al. Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual International Symposium on Microarchitecture, pages 469–480.
IEEE/ACM, December 2009.
[5] Exploiting Dataflow Parallelism in Teradevice Computing. http://www.teraflux.eu, 2010-2014.
[6] Antoni Portero, Alberto Scionti, Zhibin Yu, Paolo Faraboschi, Caroline Concatto, Luigi Carro, Arne Garbade, Sebastian Weis, Theo Ungerer, Roberto Giorgi, Simulating the Future kilo-x86-64 core Processor and their Infrastructure, 45th Annual Simulation Symposium (ANSS), March 2012, Orlando, Florida
[7] Antoni Portero, Zhibin Yu, and Roberto Giorgi. T-star (t*): An x86-64 isa extension to support thread execution on many cores. ACACES Advance Computer Architecture and Compilation for High-Performance and Embedded
Systems, 1:277–280, 2011.
[8] Roberto Giorgi, Alberto Scionti, Antoni Portero, Paolo Faraboschi, “Architectural Simulation in the Kilo-core Era, “Architectural Support for Programming Languages and Operating Systems (ASPLOS 2012), poster presentation,
London, UK, ACM Association for Computing Machinery
[9] Antoni Portero, Zhibin Yu, Roberto Giorgi, TERAFLUX: Exploiting Tera-device Computing Challanges, TERAFLUX: Exploiting Tera-device Computing Challenges, Procedia Computer Science 7:146-147 (2011)
[10] Roberto Giorgi et al, “Public Report, D7.2– Definition of ISA extensions, custom devices and External COTSon API extensions”, FET proactive 1: Concurrent Tera-Device Computing (ICT-2009.8.1) PROJECT NUMBER:
249013
[11] Krishna M. Kavi, Roberto Giorgi, Joseph Arul, "Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation", IEEE Trans. Computers, Los Alamitos, CA, USA, vol. 50, no. 8, Aug. 2001, pp. 834-846
[12] R. Giorgi, Z. Popovic, N. Puzovic, "DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems", Proc. IEEE SBAC-PAD, Gramado, Brasil, Oct. 2007, pp. 263-270
[13] R. Giorgi, Z. Popovic, N. Puzovic, "Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture", Proc. IEEE Int.l Symp. on Parallel and Distributed Processing – MTAAP Multi-Threading Architectures
and Applications, Rome, Italy, May 2009, pp. 1-8
[14] A. Portero, R. Pla, J. Carrabina, “SystemC implementation of a NoC”,Industrial Technology, 2005. ICIT 2005. IEEE International Conference on. Pages 1132-1135
Fly UP