1 / 16

National Sun Yat-sen University Embedded System Laboratory

National Sun Yat-sen University Embedded System Laboratory. The Multi-threaded Optimization of Dynamic Binary Translation. Presenter: Ming- Shiun Yang. 2011, Eigth International Conference on Fuzzy Systems and Knowledge Discovery(FSKD)

paxton
Download Presentation

National Sun Yat-sen University Embedded System Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. National Sun Yat-sen University Embedded System Laboratory The Multi-threaded Optimization of Dynamic Binary Translation Presenter: Ming-ShiunYang 2011, Eigth International Conference on Fuzzy Systems and Knowledge Discovery(FSKD) Jinxian Cui, Jianmin Pan, Zheng Shan, Xianan Liu

  2. Abstract Dynamic binary translation which offers a solution for making an executable compiled on a platform to run on another platform automatically resolves the problem of code migration. The current single-thread dynamic binary translation systems don’t have enough scope to improve the performance. Therefore, the multi-core processors and the multi-threaded program are fully used for getting the high performance. Therefore, based on the performance analysis and experiments results of a single-threaded dynamic binary translation system, this paper shows the framework of MDT(Multi-Threaded Dynamic Binary Translation) system and the optimizations implemented on it.For achieving the speculatively translation scheme, the T-Tree(Translation Tree) is built by the servant thread which gives the direction

  3. Abstract (cont.) of getting the next pc. Besides that, integrated with the merit of multi-levelCache scheme, the LRU schemeand the full flush scheme, a new management of T-Cache is presented which proves out managing the translated blocks efficiently. The framework and optimizations of MDT are evaluated wholly and partly across SPEC2006under the Alpha multi-core environment. The results compared with QEMU demonstrate that the speculative translation and the reducing of T-Cache missing rate are effectively.

  4. What’s the problem? • Single-threaded dynamic translation(QEMU) is not enough to improve performance. • From Table 1 and Figure 2, it’s indicate that the translation and execution time of TB is important. For the small / value case, the translation time of TBs is more important. For the big / value case, the execution time of TBs is more important. The / value is approximate 40/1 , it means is the key performance factor

  5. What’s the problem?(Cont.) • The T-Cache flush mechanism of QEMU is that if T-Cache is full then flush the whole T-Cache. • High cost of executing TBs Translate TB to micro-operations Link each micro-operations together Map executable sections on host memory

  6. Related Work [1] Binary Translation Introduction [7] ISS introduction Used in [6] QEMU – A dynamic translator [3,4,5] Dynamic Binary Translation System [11] Multi-threadd optimization for dynamic translator T-Cache management mechanism Single-threaded dynamic binary translation Used many servant threads to do translation, it’s need much cost on managing threads This paper : The Multi-threaded Optimization of Dynamic Binary Translation

  7. Proposed : MDT Framework • MDT(Multi-threaded Dynamic Translation) • To parallelize execution and TB’s translation

  8. Proposed : Translator • Speculative translation • Translate TB which may be executed next. • To reduce the translation delay time • To achieve speculative translation • Using T-Tree(Translation-Tree)to get the next pc address Execute TB = TBs Translate next TB How to get the next PC address ? Execute next TB …

  9. The Flow of Getting Next PC • tsl_q : to reserve the TB that have been translated

  10. Proposed :T-Cache Management Scheme • Multi-level Cache scheme • Advantage : reduce cache miss rate • LRU scheme • Advantage : easy, fast • Full flush scheme • Advantage : simple, low cost • The proposed T-Cache management scheme integrate with above scheme.

  11. Proposed :T-Cache Management Scheme(cont.) Buff[1] (64k) (an unit of full flush) TB Chain TB T-Cache (4M) (an unit of full flush originally) Buff[2] (64k) (an unit of full flush) TB TB …… …… TB Buff[64] (64k) (an unit of full flush) Using LRU / LRC replace scheme to get a buffer when all buffer is full

  12. The Flow Chart of T-Cache Management Using LRU to get a new buffer Using LRU to get a new buffer

  13. Before Experiment • How much performance does MDT improve? • Compare with single-threaded dynamic binary translation system (QEMU) • The proposed T-Cache management mechanism save how many costs of flushing T-Cache?

  14. Experimental Results • Host : Alpha / Red Hat Linux • Target : I386 / Red Hat Linux • Test cases : SPEC 2006 Miss rate Speculatively Translation Rate (Hit Rate) Average TranslationTimes Because the proposed T-Cache management mechanism reduce the T-Cache miss rate, so the ATT of MDT is less than QEMU

  15. Conclusion • Proposed MDT parallelize the execution and translation of TBs to improve the performance. • Main Thread • Execute TBs • Servant Thread • Translate TBs – Speculatively Translation • New mechanism for managing T-Cache

  16. My Comment • If we want to modify QEMU to support multi-threaded dynamic binary translation, it will be a big engineering. • Whole work flow need to be modified. • Some techniques are not illustrated. • LRC scheme • The illustration of T-Tree generation is not enough. • How to deteemine the right child

More Related