Signature Codes for Energy-Efficient Data Movement in On-chip Networks

Document Type: Original Article


Computer Engineering Department, K. N. Toosi University of Technology, Tehran, Iran.



On-chip networks provide a scalable infrastructure for moving data among cores in many-core systems. In future technologies, significant amounts of dynamic energy are dissipated for data movement on on-chip links. This paper proposes Sig-NoC, a predictable approach via signature encoding and transition signaling to reduce switching activity on links. Also, we show that link energy dominates in future technologies. Sig-NoC makes the switching activity proportional to the number of 1s per data in the source by using transition signaling. We estimate the energy of each packet at the source of routing in NoC. Therefore, we reduce the number of 1s for high energy packets through signature coding in the source. Sig-NoC mechanism encodes the packets once at the source and decodes them at the destination only; therefore, it has virtually no impact on performance. Simulation results on NAS and Phoenix benchmark suits on 4X4 NoC indicate that Sig-NoC achieves an average of 28% reduction in the overall NoC energy.


[1] T. Bjerregaard and S. Mahadevan. A survey of research and practices of network-on-chip. ACM Computing Surveys (CSUR), 38(1), 2006. [ bib | DOI ]
[2] D. Bertozzi and L. Benini. Xpipes : a network-on-chip architecture for gigascale systems-on-chip. IEEE Circuits and Systems Magazine, 4(2):18--31, 2004. [ bib | DOI | http ]
[3] W. J. Dally and B. P. Towles. Principles and practices of interconnection networks. Elsevier, 2004. [ bib ]
[4] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy, J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe. The 48-core scc processor: The programmer's view. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010. [ bib | DOI | http ]
[5] X. Chen and N. K. Jha. Reducing wire and energy overheads of the SMART NoC using a setup request network. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(10):3013--3026, 2016. [ bib | DOI | http ]
[6] N. Muralimanohar, R. Balasubramonian, and N. Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). IEEE, 2007. [ bib | DOI | http ]
[7] D. H. Bailey, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, S. K. Weeratunga, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, and T. A. Lasinski. The NAS parallel benchmarks---summary and preliminary results. In Supercomputing'91: Proceedings of the 1991 ACM/IEEE conference on Supercomputing. ACM Press, 1991. [ bib | DOI | http ]
[8] R. M. Yoo, A. Romano, and C. Kozyrakis. Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system. In 2009 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2009. [ bib | DOI | http ]
[9] J. Yin, P. Zhou, S. S. Sapatnekar, and A. Zhai. Energy-Efficient Time-Division Multiplexed Hybrid-Switched NoC for Heterogeneous Multicore Systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 2014. [ bib | DOI | http ]
[10] R. Boyapati, J. Huang, P. Majumder, K. H. Yum, and E. J. Kim. APPROX-NoC. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 2017. [ bib | DOI | http ]
[11] J. Ahn, S. Yoo, O. Mutlu, and K. Choi. PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). ACM Press, 2015. [ bib | DOI | http ]
[12] E. Azarkhish, C. Pfister, D. Rossi, I. Loi, and L. Benini. Logic-Base Interconnect Design for Near Memory Computing in the Smart Memory Cube. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 25(1):210--223, January 2017. [ bib | DOI | http ]
[13] Y. Cai, K. Mai, and O. Mutlu. Comparative evaluation of FPGA and ASIC implementations of bufferless and buffered routing algorithms for on-chip networks. In Sixteenth International Symposium on Quality Electronic Design. IEEE, 2015. [ bib | DOI | http ]
[14] X. Xiang, W. Shi, S. Ghose, L. Peng, O. Mutlu, and N. Tzeng. Carpool: a bufferless on-chip network supporting adaptive multicast and hotspot alleviation. In Proceedings of the International Conference on Supercomputing. ACM Press, 2017. [ bib | DOI | http ]
[15] J. Zhan, J. Ouyang, F. Ge, J. Zhao, and Y. Xie. Hybrid Drowsy SRAM and STT-RAM Buffer Designs for Dark-Silicon-Aware NoC. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 24(10):3041--3054, 2016. [ bib | DOI | http ]
[16] S. T. Muhammad, M. A. El-Moursy, A. A. El-Moursy, and A. M. Refaat. Optimization for traffic-based virtual channel activation low-power NoC. In 5th International Conference on Energy Aware Computing Systems & Applications. IEEE, 2015. [ bib | DOI | http ]
[17] G. Ascia, V. Catania, F. Fazzino, and M. Palesi. An encoding scheme to reduce power consumption in Networks-on-Chip. In 2009 International Conference on Computer Engineering & Systems. IEEE, December 2009. [ bib | DOI | http ]
[18] J. Shen, P. Hsiung, and C. Huang. Learning-based adaptation to applications and environments in a reconfigurable network-on-chip for reducing crosstalk and dynamic power consumption. Computers & Electrical Engineering, 39(2):453--464, February 2013. [ bib | DOI | http ]
[19] N. Jafarzadeh, M. Palesi, A. Khademzadeh, and A. Afzali-Kusha. Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(3):675--685, March 2014. [ bib | DOI | http ]
[20] E. K. Ardestani and J. Renau. ESESC: A fast multicore simulator using Time-Based Sampling. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 2013. [ bib | DOI | http ]
[21] T. Rondeau. Electronics Resurgence Initiative Architectures., 2017. [Online; accessed 18-November-2017]. [ bib ]
[22] S. Borkar and A. A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, May 2011. [ bib | DOI | http ]
[23] J. Yang, R. Gupta, and C. Zhang. Frequent value encoding for low power data buses. ACM Transactions on Design Automation of Electronic Systems (TODAES), 9(3):354--384, July 2004. [ bib | DOI | http ]
[24] D. C. Suresh, B. Agrawal, J. Yang, and W. A. Najjar. Tunable and Energy Efficient Bus Encoding Techniques. IEEE Transactions on Computers, 58(8):1049--1062, August 2009. [ bib | DOI | http ]
[25] M.R. Stan and W.P. Burleson. Bus-invert coding for low-power I/O. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 3(1):49--58, March 1995. [ bib | DOI | http ]
[26] B. Subramaniam, S. Muthusamy, and G. Gengavel. Crosstalk minimization in network on chip (NoC) links with dual binary weighted code CODEC. Journal of Ambient Intelligence and Humanized Computing, 2020. [ bib | DOI | http ]
[27] P. Behnam and M. N. Bojnordi. STFL: Energy-Efficient Data Movement with Slow Transition Fast Level Signaling. In Proceedings of the 56th Annual Design Automation Conference 2019. ACM, June 2019. [ bib | DOI | http ]
[28] Z. Shirmohammadi and M. Asadinia. On-Fly-TOD: an efficient mechanism for crosstalk fault reduction in WNoC. The Journal of Supercomputing, 2020. [ bib | DOI | http ]
[29] S. Mittal and S. Nag. A survey of encoding techniques for reducing data-movement energy. Journal of Systems Architecture, 97:373--396, August 2019. [ bib | DOI | http ]