TY - GEN
T1 - A coherent and managed runtime for ml on the scc
AU - Sivaramakrishnan, K. C.
AU - Ziarek, Lukasz
AU - Jagannathan, Suresh
N1 - Publisher Copyright:
© Proceedings of the Many-Core Applications Research Community Symposium, MARC 2012 at RWTH Aachen University. All rights reserved.
PY - 2012
Y1 - 2012
N2 - Intel's Single-Chip Cloud Computer (SCC) is a many-core architecture which stands out due to its complete lack of cache-coherence and the presence of fast, on-die interconnect for inter-core messaging. Cache-coherence, if required, must be implemented in software. Moreover, the amount of shared memory available on the SCC is very limited, requiring stringent management of resources even in the presence of software cachecoherence. In this paper, we present a series of techniques to provide the ML programmer a cache-coherent view of memory, while effectively utilizing both private and shared memory. To that end, we introduces a new, type-guided garbage collection scheme that effectively exploits SCC's memory hierarchy, attempts to reduce the use of shared memory in favor of message passing buffers, and provides a efficient, coherent global address space. Experimental results over a variety of benchmarks show that more than 99% of the memory requests can be potentially cached. These techniques are realized in MultiMLton, a scalable extension of MLton Standard ML compiler and runtime system on the SCC.
AB - Intel's Single-Chip Cloud Computer (SCC) is a many-core architecture which stands out due to its complete lack of cache-coherence and the presence of fast, on-die interconnect for inter-core messaging. Cache-coherence, if required, must be implemented in software. Moreover, the amount of shared memory available on the SCC is very limited, requiring stringent management of resources even in the presence of software cachecoherence. In this paper, we present a series of techniques to provide the ML programmer a cache-coherent view of memory, while effectively utilizing both private and shared memory. To that end, we introduces a new, type-guided garbage collection scheme that effectively exploits SCC's memory hierarchy, attempts to reduce the use of shared memory in favor of message passing buffers, and provides a efficient, coherent global address space. Experimental results over a variety of benchmarks show that more than 99% of the memory requests can be potentially cached. These techniques are realized in MultiMLton, a scalable extension of MLton Standard ML compiler and runtime system on the SCC.
UR - https://www.scopus.com/pages/publications/84899715187
M3 - Conference contribution
AN - SCOPUS:84899715187
T3 - Proceedings of the Many-Core Applications Research Community Symposium, MARC 2012 at RWTH Aachen University
SP - 20
EP - 25
BT - Proceedings of the Many-Core Applications Research Community Symposium, MARC 2012 at RWTH Aachen University
A2 - Lankes, Stefan
A2 - Clauss, Carsten
PB - Chair for Operating Systems, RWTH Aachen University
T2 - 7th Many-Core Applications Research Community Symposium, MARC 2012
Y2 - 29 November 2012 through 30 November 2012
ER -