Машиностроение
Приборостроение
Методическая периодика: проблемы и решения
Организация производства (промышленность)

Ключевые слова
Аннотации
Архив рубрик

регистрация
забыли пароль?

Другие журналы

электронный научно-технический журнал

ИНЖЕНЕРНЫЙ ВЕСТНИК

Издатель: Общероссийская общественная организация "Академия инженерных наук им. А.М. Прохорова".

Обзор методов решения задачи планирования параллельных алгоритмов

Инженерный вестник # 12, декабрь 2014
УДК: 519.6

Файл статьи:

Seliverstov_E.pdf (536.49Кб)

автор: Селиверстов Е. Ю.

В данном обзоре рассмотрены основные подходы к решению задачи планирования параллельных алгоритмов для многопроцессорных, распределенных и гетерогенных систем. Представлен обзор основных моделей параллельных программ, подходов к декомпозиции последовательных программ на задачи, аналитических и эвристических (на основе списков и кластеризации) методов решения задачи планирования. Отмечены некоторые особенности планирования для массивно-параллельных архитектур на основе графических процессоров и гетерогенных многоядерных систем, влияние конфликтов ресусов параллельных систем на эффективность решения задачи.

Список литературы
1. Kwok Y.-K. Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures : Ph. D. thesis / Yu-Kwong Kwok ; The Hong Kong University of Science and Technology. — 1994.
2. Kessler C., rg Keller J. Models for parallel computing: Review and perspectives // Mitteilungen-Gesellschaft fu ̈r Informatik eV, Parallel-Algorithmen und Rechnerstrukturen. — 2007.
3. Sahni S., Thanvantri V. et al. Parallel computing: Performance metrics and models // Computer & Information Sciences Department University of Florida. — 1995.
4. Harris T. A survey of pram simulation techniques // ACM Computing Surveys (CSUR). — 1994. — Vol. 26, no. 2. — P. 187–206.
5. Gibbons P., Matias Y., Ramachandran V. Can a shared-memory model serve as a bridging model for parallel computation? // Theory of Computing Systems. — 1999. — Vol. 32, no. 3. — P. 327–359.
6. Akhter S., Roberts J. Multi-core programming. — Intel Press, 2006. — Vol. 33.
7. Valiant L. G. A bridging model for parallel computation // Commun. ACM. — 1990. — ugust. — Vol. 33. — P. 103–111.
8. Parhami B. Introduction to parallel processing: algorithms and architectures. — Springer, 1999. — Vol. 1.
9. Sinnen O. Task scheduling for parallel systems. — Wiley-Interscience, 2007. — Vol. 60.
10. Introduction to Parallel Computing (2nd Edition) / Ananth Grama, George Karypis, Vipin Kumar, Anshul Gupta. — 2 edition. — Addison Wesley, 2003.
11. Воеводин B. B., Воеводин Вл. В. Параллельные вычисления. — БХВ-Петербург, 2002.
12. Vinjamuri S., Prasanna V. Hierarchical dependency graphs: Abstraction and methodology for mapping systolic array designs to multicore processors // Parallel Computing Technologies. — 2009. — P. 284–298.
13. Hierarchical place trees: A portable abstraction for task parallelism and data movement / Y. Yan, J. Zhao, Y. Guo, V. Sarkar // Languages and Compilers for Parallel Computing. — 2010. — P. 172–187.
14. Cosnard M., Loi M. Automatic task graph generation techniques // System Sciences, 1995. Vol. II. Proceedings of the Twenty-Eighth Hawaii International Conference on / IEEE. — Vol. 2. — 1995. — P. 113–122.
15. Cosnard M., Jeannot E. Automatic parallelization techniques based on compact dag extraction and symbolic scheduling // Parallel Processing Letters.—2001.—Vol. 11, no. 01. — P. 151–168.
16. Khan A., McCreary C., Jones M. A comparison of multiprocessor scheduling heuristics // Parallel Processing, 1994. ICPP 1994. International Conference on / IEEE. — Vol. 2. — 1994. — P. 243–250.
17. Lint B., Agerwala T. Communication issues in the design and analysis of parallel algorithms // Software Engineering, IEEE Transactions on. — 1981. — no. 2. — P. 174– 188.
18. Foster I. Designing and building parallel programs. — Addison-Wesley Reading, 1995. — URL: http://www.mcs.anl.gov/ itf/dbpp/.
19. SarkarV.Partitioningandschedulingparallelprogramsformultiprocessors.—MITpress,1989.
20. Berman F., Snyder L. On mapping parallel algorithms into parallel architectures // Journal of Parallel and Distributed Computing. — 1987. — Vol. 4, no. 5. — P. 439–458.
21. Lee S., Aggarwal J. A mapping strategy for parallel processing // Computers, IEEE Transactions on. — 1987. — Vol. 100, no. 4. — P. 433–442.
22. Chen S., Eshaghian M., Wu Y. Mapping arbitrary non-uniform task graphs onto arbitrary non- uniform system graphs // Proceedings of the International Conference on Parallel Processing / Citeseer. — 1995.
23. El-Rewini H., Lewis T. G. Scheduling parallel program tasks onto arbitrary target machines // Journal of parallel and Distributed Computing. — 1990. — Vol. 9, no. 2. — P. 138–153.
24. Chen S., Eshaghian M. M. A fast recursive mapping algorithm // Concurrency: Practice and Experience. — 1995. — Vol. 7, no. 5. — P. 391–409.
25. El-Rewini H., Ali H. H. On considering communication in scheduling task graphs on parallel processors // Parallel Algorithm And Applications. — 1994. — Vol. 3, no. 3-4. — P. 177–191.
26. Sinnen O., Sousa L. A., Sandnes F. E. Toward a realistic task scheduling model // Parallel and Distributed Systems, IEEE Transactions on. — 2006. — Vol. 17, no. 3. — P. 263–275.
27. BhatP.B.,RaghavendraC.S.,PrasannaV.K.Efficientcollectivecommunicationindistributed heterogeneous systems // Journal of Parallel and Distributed Computing. — 2003. — Vol. 63, no. 3. — P. 251–263.
28. Static scheduling strategies for heterogeneous systems / Olivier Beaumont, Arnaud Legrand, Yves Robert et al. // ISCIS XVII, Seventeenth International Symposium On Computer and Information Sciences. — 2002. — P. 18–22.
29. Predicting inter-thread cache contention on a chip multi-processor architecture / Dhruba Chandra, Fei Guo, Seongbeom Kim, Yan Solihin // High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on / IEEE. — 2005. — P. 340–351.
30. Xia Y., Prasanna V. K. Collaborative scheduling of dag structured computations on multicore processors // Proceedings of the 7th ACM international conference on Computing frontiers / ACM. — 2010. — P. 63–72.
31. Blagodurov S., Zhuravlev S., Fedorova A. Contention-aware scheduling on multicore systems // ACM Transactions on Computer Systems (TOCS). — 2010. — Vol. 28, no. 4. — P. 8.
32. Chen X. E., Aamodt T. M. A first-order fine-grained multithreaded throughput model // IEEE International Symposium on High Performance Computer Architecture (HPCA 2009). — 2009. — . — P. 329–340.
33. Song F., Moore S., Dongarra J. Analytical modeling for affinitybased thread scheduling on multicore platforms // University of Tennessee, Computer Science Tech. Rep. UT-CS-08-626. — 2008.
34. A memory model for scientific algorithms on graphics processors / Naga K. Govindaraju, Scott Larsen, Jim Gray, Dinesh Manocha // in Proc. of the ACM/IEEE Conference on Supercomputing (SC’06. — ACM Press, 2006.
35. Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread- level parallelism awareness // Proceedings of the 36th annual international symposium on Computer architecture. — ISCA ’09. — New York, NY, USA : ACM, 2009. — P. 152–163.
36. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA / Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi et al. // Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. — PPoPP ’08. — New York, NY, USA : ACM, 2008. — P. 73–82.
37. Dynamic warp formation and scheduling for efficient gpu control flow / W.W.L. Fung, I. Sham, G. Yuan, T.M. Aamodt // Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture / IEEE Computer Society. — 2007. — P. 407–420.
38. An adaptive performance modeling tool for gpu architectures / Sara S Baghsorkhi, Matthieu Delahaye, Sanjay J Patel et al. // ACM Sigplan Notices / ACM. — Vol. 45. — 2010. — P. 105–114.
39. Schaa D., Kaeli D. Exploring the multiple-gpu design space // Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on / IEEE. — 2009. — P. 1–12.
40. Choi J. W., Singh A., Vuduc R. W. Model-driven autotuning of sparse matrix-vector multiply on gpus // ACM Sigplan Notices / ACM. — Vol. 45. — 2010. — P. 115–126.
41. Program optimization space pruning for a multithreaded gpu / Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone et al. // Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. — CGO ’08. — New York, NY, USA : ACM, 2008. — P. 195–204.
42. Zhang Y., Owens J. A quantitative performance analysis model for GPU architectures // High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on / IEEE. — 2011. — P. 382–393.
43. El-Rewini H., Abd-El-Barr M. Advanced computer architecture and parallel processing.— Wiley. com, 2005. — Vol. 42.
44. Bokhari S. On the mapping problem // Computers, IEEE Transactions on. — 1981. — Vol. 100, no. 3. — P. 207–214.
45. Kwok Y.-K., Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors // ACM Computing Surveys (CSUR). — 1999. — Vol. 31, no. 4. — P. 406–471.
46. Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip / Martino Ruggiero, Alessio Guerri, Davide Bertozzi et al. // Design, Automation and Test in Europe, 2006. DATE’06. Proceedings / IEEE.—Vol. 1.—2006.— P. 6–pp.
47. Chaudhary V., Aggarwal J. A generalized scheme for mapping parallel algorithms // Parallel and Distributed Systems, IEEE Transactions on. — 1993. — Vol. 4, no. 3. — P. 328–346.
48. Scheduling strategies for master-slave tasking on heterogeneous processor platforms / Cyril Banino, Olivier Beaumont, Larry Carter et al. // Parallel and Distributed Systems, IEEE Transactions on. — 2004. — Vol. 15, no. 4. — P. 319–330.
49. Chen J., John L. K. Efficient program scheduling for heterogeneous multi-core processors // Proceedings of the 46th Annual Design Automation Conference / ACM. — 2009. — P. 927–930.
50. Gerasoulis A., Yang T. A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors // Journal of Parallel and Distributed Computing. — 1992. — Vol. 16, no. 4. — P. 276–291.
51. Kim S., Browne J. A general approach to mapping of parallel computation upon multiprocessor architectures // International Conference on Parallel Processing. — Vol. 3. — 1988. — P. 8.
52. Kwok Y.-K., Ahmad I. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors // Parallel and Distributed Systems, IEEE Transactions on. — 1996. — Vol. 7, no. 5. — P. 506–521.
53. Maheswaran M., Siegel H. J. A dynamic matching and scheduling algorithm for heterogeneous computing systems // Heterogeneous Computing Workshop, 1998.(HCW 98) Proceedings. 1998 Seventh / IEEE. — 1998. — P. 57–69.
54. Hendrickson B., Kolda T. Graph partitioning models for parallel computing // Parallel computing. — 2000. — Vol. 26, no. 12. — P. 1519–1534.
55. Catalyurek U. V., Aykanat C. Hypergraph-partitioning-based decomposition for parallel sparse- matrix vector multiplication // Parallel and Distributed Systems, IEEE Transactions on. — 1999. — Vol. 10, no. 7. — P. 673–693.
56. Multilevel hypergraph partitioning: applications in vlsi domain / George Karypis, Rajat Aggarwal, Vipin Kumar, Shashi Shekhar // Very Large Scale Integration (VLSI) Systems, IEEE Transactions on. — 1999. — Vol. 7, no. 1. — P. 69–79.
57. Hendricksony B., Lelandz R., Van Driesschex R. Skewed graph partitioning // Eighth SIAM conference on parallel processing for scientific computing. — 1997.
58. Hendrickson B. Graph partitioning and parallel solvers: Has the emperor no clothes? // Solving Irregularly Structured Problems in Parallel. — 1998. — P. 218–225.
59. Schloegel K., Karypis G., Kumar V. Parallel static and dynamic multi-constraint graph partitioning // Concurrency and Computation: Practice and Experience.—2002.—Vol. 14, no. 3. — P. 219–240.
60. Zoltan data management services for parallel dynamic applications / Karen Devine, Erik Boman, Robert Heaphy et al. // Computing in Science & Engineering. — 2002. — Vol. 4, no. 2. — P. 90–96.
61. Partitioning and scheduling using graph decomposition / C McCreary, J Thompson, H Gill et al. // Department of Computer Science and Engineering CSE-93-06, Auburn University. — 1993.
62. Ahmad I., Dhodhi M., Ul-Mustafa R. Dps: Dynamic priority scheduling heuristic for heterogeneous computing systems // Computers and Digital Techniques, IEE Proceedings- / IET. — Vol. 145. — 1998. — P. 411–418.
63. Yang T., Gerasoulis A. Dsc: Scheduling parallel tasks on an unbounded number of processors // Parallel and Distributed Systems, IEEE Transactions on. — 1994. — Vol. 5, no. 9. — P. 951–967.
64. Kim D., Yi B.-G. A two-pass scheduling algorithm for parallel programs // Parallel Computing. — 1994. — Vol. 20, no. 6. — P. 869–885.
65. A selection theory and methodology for heterogeneous supercomputing / Song Chen, Mary Eshaghian, Ashfaq Khokhar, Muhammad Shaaban // Heterogeneous Processing.— 1993. — P. 15–22.
66. Cirou B., Jeannot E. Triplet: a clustering scheduling algorithm for heterogeneous systems // Parallel Processing Workshops, 2001. International Conference on / IEEE. — 2001. — P. 231– 236.
67. El-Rewini H., Ali H., Lewis T. Task scheduling in multiprocessing systems // Computer. —1995. — Vol. 28, no. 12. — P. 27–37.
68. Goldenberg M., Lu P., Schaeffer J. Trellisdag: A system for structured dag scheduling // Job Scheduling Strategies for Parallel Processing / Springer. — 2003. — P. 21–43.
69. Kale L. V., Krishnan S. Charm++: Parallel Programming with Message-Driven Objects // Parallel Programming using C++ / Ed. by Gregory V. Wilson, Paul Lu. — MIT Press, 1996. — P. 175–213.
70. Wu M.-Y., Shu W., Chen Y. A runtime system for dynamic dag programming // Parallel and Distributed Processing / Ed. by Jos ́e Rolim. — Springer Berlin Heidelberg, 2000. — Vol. 1800 of Lecture Notes in Computer Science. — P. 1192–1199.
71. Song F., YarKhan A., Dongarra J. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems // High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on / IEEE. — 2009. — P. 1–11.
72. Trifunovic A., Knottenbelt W. J. Parkway 2.0: A parallel multilevel hypergraph partitioning tool // Computer and Information Sciences-ISCIS 2004. — Springer, 2004. — P. 789–800.
73. Yu J., Buyya R. A taxonomy of workflow management systems for grid computing // Journal of Grid Computing. — 2005. — Vol. 3, no. 3-4. — P. 171–200.
74. Pegasus: Mapping scientific workflows onto the grid / Ewa Deelman, James Blythe, Yolanda Gil et al. // Grid Computing / Springer. — 2004. — P. 11–20.
75. Evaluation of job-scheduling strategies for grid computing / Volker Hamscher, Uwe Schwiegelshohn, Achim Streit, Ramin Yahyapour // Grid Computing—GRID 2000.— Springer, 2000. — P. 191–202.
76. Yu J., Buyya R. A taxonomy of scientific workflow systems for grid computing // Sigmod Record. — 2005. — Vol. 34, no. 3. — P. 44–49.
77. Grewe D., O’Boyle M. A static task partitioning approach for heterogeneous systems using opencl // Compiler Construction / Springer. — 2011. — P. 286–305.
78. Diamos G. F., Yalamanchili S. Harmony: an execution model and runtime for heterogeneous many core systems // Proceedings of the 17th international symposium on High performance distributed computing / ACM. — 2008. — P. 197–200.
79. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures / C. Augonnet, S. Thibault, R. Namyst, P.A. Wacrenier // Concurrency and Computation: Practice and Experience. — 2011. — Vol. 23, no. 2. — P. 187–198.
80. Gregg C., Brantley J., Hazelwood K. Contention-aware scheduling of parallel code for heterogeneous systems // 2nd USENIX Workshop on Hot Topics in Parallelism, HotPar, Berkeley, CA. — 2010.
81. Predictive runtime code scheduling for heterogeneous architectures / V ́ıctor J Jim ́enez, Llu ́ıs Vilanova, Isaac Gelado et al. // High Performance Embedded Architectures and Compilers. — Springer, 2009. — P. 19–33.
82. Becchi M., Crowley P. Dynamic thread assignment on heterogeneous multiprocessor architectures // Proceedings of the 3rd conference on Computing frontiers / ACM. — 2006. — P. 29–40.
83. Weng N., Wolf T. Profiling and mapping of parallel workloads on network processors // Proceedings of the 2005 ACM symposium on Applied computing / ACM. — 2005. — P. 890– 896.
84. Hass: a scheduler for heterogeneous multicore systems / Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery et al. // ACM SIGOPS Operating Systems Review. — 2009. — Vol. 43, no. 2. — P. 66–75.
85. Dague: A generic distributed dag engine for high performance computing / George Bosilca, Aurelien Bouteiller, Anthony Danalis et al. // Parallel Computing. — 2012. — Vol. 38, no. 1. — P. 37–51.
86. Szymanek R., Kuchcinski K. A constructive algorithm for memory-aware task assignment and scheduling // Proceedings of the ninth international symposium on Hardware/software codesign / ACM. — 2001. — P. 147–152.
87. Cab: Cache aware bi-tier task-stealing in multi-socket multi-core architecture / Quan Chen, Zhiyi Huang, Minyi Guo, Jingyu Zhou // Parallel Processing (ICPP), 2011 International Conference on / IEEE. — 2011. — P. 722–732.
88. Blumofe R. D., Leiserson C. E. Scheduling multithreaded computations by work stealing // Journal of the ACM (JACM). — 1999. — Vol. 46, no. 5. — P. 720–748.
89. Cilk: An efficient multithreaded runtime system / Robert D Blumofe, Christopher F Joerg, Bradley C Kuszmaul et al. — ACM, 1995. — Vol. 30.
90. Scheduling task parallelism on multi-socket multicore systems / Stephen L Olivier, Allan K Porterfield, Kyle B Wheeler, Jan F Prins // Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers / ACM. — 2011. — P. 49– 56.
91. Squillante M. S., Nelson R. D. Analysis of task migration in shared-memory multiprocessor scheduling. — ACM, 1991. — Vol. 19.
92. Work distribution methods on GPUs : Rep. / Department of Computer Science, University of North Carolina ; Executor: C. Lauterback, Q. Mo, D. Manocha : 2009.
93. Xia Y., Prasanna V., Li J. Hierarchical scheduling of dag structured computations on manycore processors with dynamic thread grouping // Job Scheduling Strategies for Parallel Processing / Springer. — 2010. — P. 154–174.
94. Sequoia: programming the memory hierarchy / Kayvon Fatahalian, Daniel Reiter Horn, Timothy J Knight et al. // Proceedings of the 2006 ACM/IEEE conference on Supercomputing / ACM. — 2006. — P. 83.