Другие журналы
|
электронный научно-технический журналИНЖЕНЕРНЫЙ ВЕСТНИКИздатель: Общероссийская общественная организация "Академия инженерных наук им. А.М. Прохорова".
Обзор методов решения задачи планирования параллельных алгоритмов
Инженерный вестник # 12, декабрь 2014 УДК: 519.6
Файл статьи:
Seliverstov_E.pdf
(536.49Кб)
В данном обзоре рассмотрены основные подходы к решению задачи планирования параллельных алгоритмов для многопроцессорных, распределенных и гетерогенных систем. Представлен обзор основных моделей параллельных программ, подходов к декомпозиции последовательных программ на задачи, аналитических и эвристических (на основе списков и кластеризации) методов решения задачи планирования. Отмечены некоторые особенности планирования для массивно-параллельных архитектур на основе графических процессоров и гетерогенных многоядерных систем, влияние конфликтов ресусов параллельных систем на эффективность решения задачи. Список литературы 1. Kwok Y.-K. Efficient Algorithms for Scheduling and Mapping of Parallel Programs onto Parallel Architectures : Ph. D. thesis / Yu-Kwong Kwok ; The Hong Kong University of Science and Technology. — 1994. 2. Kessler C., rg Keller J. Models for parallel computing: Review and perspectives // Mitteilungen-Gesellschaft fu ̈r Informatik eV, Parallel-Algorithmen und Rechnerstrukturen. — 2007. 3. Sahni S., Thanvantri V. et al. Parallel computing: Performance metrics and models // Computer & Information Sciences Department University of Florida. — 1995. 4. Harris T. A survey of pram simulation techniques // ACM Computing Surveys (CSUR). — 1994. — Vol. 26, no. 2. — P. 187–206. 5. Gibbons P., Matias Y., Ramachandran V. Can a shared-memory model serve as a bridging model for parallel computation? // Theory of Computing Systems. — 1999. — Vol. 32, no. 3. — P. 327–359. 6. Akhter S., Roberts J. Multi-core programming. — Intel Press, 2006. — Vol. 33. 7. Valiant L. G. A bridging model for parallel computation // Commun. ACM. — 1990. — ugust. — Vol. 33. — P. 103–111. 8. Parhami B. Introduction to parallel processing: algorithms and architectures. — Springer, 1999. — Vol. 1. 9. Sinnen O. Task scheduling for parallel systems. — Wiley-Interscience, 2007. — Vol. 60. 10. Introduction to Parallel Computing (2nd Edition) / Ananth Grama, George Karypis, Vipin Kumar, Anshul Gupta. — 2 edition. — Addison Wesley, 2003. 11. Воеводин B. B., Воеводин Вл. В. Параллельные вычисления. — БХВ-Петербург, 2002. 12. Vinjamuri S., Prasanna V. Hierarchical dependency graphs: Abstraction and methodology for mapping systolic array designs to multicore processors // Parallel Computing Technologies. — 2009. — P. 284–298. 13. Hierarchical place trees: A portable abstraction for task parallelism and data movement / Y. Yan, J. Zhao, Y. Guo, V. Sarkar // Languages and Compilers for Parallel Computing. — 2010. — P. 172–187. 14. Cosnard M., Loi M. Automatic task graph generation techniques // System Sciences, 1995. Vol. II. Proceedings of the Twenty-Eighth Hawaii International Conference on / IEEE. — Vol. 2. — 1995. — P. 113–122. 15. Cosnard M., Jeannot E. Automatic parallelization techniques based on compact dag extraction and symbolic scheduling // Parallel Processing Letters.—2001.—Vol. 11, no. 01. — P. 151–168. 16. Khan A., McCreary C., Jones M. A comparison of multiprocessor scheduling heuristics // Parallel Processing, 1994. ICPP 1994. International Conference on / IEEE. — Vol. 2. — 1994. — P. 243–250. 17. Lint B., Agerwala T. Communication issues in the design and analysis of parallel algorithms // Software Engineering, IEEE Transactions on. — 1981. — no. 2. — P. 174– 188. 18. Foster I. Designing and building parallel programs. — Addison-Wesley Reading, 1995. — URL: http://www.mcs.anl.gov/ itf/dbpp/. 19. SarkarV.Partitioningandschedulingparallelprogramsformultiprocessors.—MITpress,1989. 20. Berman F., Snyder L. On mapping parallel algorithms into parallel architectures // Journal of Parallel and Distributed Computing. — 1987. — Vol. 4, no. 5. — P. 439–458. 21. Lee S., Aggarwal J. A mapping strategy for parallel processing // Computers, IEEE Transactions on. — 1987. — Vol. 100, no. 4. — P. 433–442. 22. Chen S., Eshaghian M., Wu Y. Mapping arbitrary non-uniform task graphs onto arbitrary non- uniform system graphs // Proceedings of the International Conference on Parallel Processing / Citeseer. — 1995. 23. El-Rewini H., Lewis T. G. Scheduling parallel program tasks onto arbitrary target machines // Journal of parallel and Distributed Computing. — 1990. — Vol. 9, no. 2. — P. 138–153. 24. Chen S., Eshaghian M. M. A fast recursive mapping algorithm // Concurrency: Practice and Experience. — 1995. — Vol. 7, no. 5. — P. 391–409. 25. El-Rewini H., Ali H. H. On considering communication in scheduling task graphs on parallel processors // Parallel Algorithm And Applications. — 1994. — Vol. 3, no. 3-4. — P. 177–191. 26. Sinnen O., Sousa L. A., Sandnes F. E. Toward a realistic task scheduling model // Parallel and Distributed Systems, IEEE Transactions on. — 2006. — Vol. 17, no. 3. — P. 263–275. 27. BhatP.B.,RaghavendraC.S.,PrasannaV.K.Efficientcollectivecommunicationindistributed heterogeneous systems // Journal of Parallel and Distributed Computing. — 2003. — Vol. 63, no. 3. — P. 251–263. 28. Static scheduling strategies for heterogeneous systems / Olivier Beaumont, Arnaud Legrand, Yves Robert et al. // ISCIS XVII, Seventeenth International Symposium On Computer and Information Sciences. — 2002. — P. 18–22. 29. Predicting inter-thread cache contention on a chip multi-processor architecture / Dhruba Chandra, Fei Guo, Seongbeom Kim, Yan Solihin // High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on / IEEE. — 2005. — P. 340–351. 30. Xia Y., Prasanna V. K. Collaborative scheduling of dag structured computations on multicore processors // Proceedings of the 7th ACM international conference on Computing frontiers / ACM. — 2010. — P. 63–72. 31. Blagodurov S., Zhuravlev S., Fedorova A. Contention-aware scheduling on multicore systems // ACM Transactions on Computer Systems (TOCS). — 2010. — Vol. 28, no. 4. — P. 8. 32. Chen X. E., Aamodt T. M. A first-order fine-grained multithreaded throughput model // IEEE International Symposium on High Performance Computer Architecture (HPCA 2009). — 2009. — . — P. 329–340. 33. Song F., Moore S., Dongarra J. Analytical modeling for affinitybased thread scheduling on multicore platforms // University of Tennessee, Computer Science Tech. Rep. UT-CS-08-626. — 2008. 34. A memory model for scientific algorithms on graphics processors / Naga K. Govindaraju, Scott Larsen, Jim Gray, Dinesh Manocha // in Proc. of the ACM/IEEE Conference on Supercomputing (SC’06. — ACM Press, 2006. 35. Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread- level parallelism awareness // Proceedings of the 36th annual international symposium on Computer architecture. — ISCA ’09. — New York, NY, USA : ACM, 2009. — P. 152–163. 36. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA / Shane Ryoo, Christopher I. Rodrigues, Sara S. Baghsorkhi et al. // Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. — PPoPP ’08. — New York, NY, USA : ACM, 2008. — P. 73–82. 37. Dynamic warp formation and scheduling for efficient gpu control flow / W.W.L. Fung, I. Sham, G. Yuan, T.M. Aamodt // Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture / IEEE Computer Society. — 2007. — P. 407–420. 38. An adaptive performance modeling tool for gpu architectures / Sara S Baghsorkhi, Matthieu Delahaye, Sanjay J Patel et al. // ACM Sigplan Notices / ACM. — Vol. 45. — 2010. — P. 105–114. 39. Schaa D., Kaeli D. Exploring the multiple-gpu design space // Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on / IEEE. — 2009. — P. 1–12. 40. Choi J. W., Singh A., Vuduc R. W. Model-driven autotuning of sparse matrix-vector multiply on gpus // ACM Sigplan Notices / ACM. — Vol. 45. — 2010. — P. 115–126. 41. Program optimization space pruning for a multithreaded gpu / Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone et al. // Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. — CGO ’08. — New York, NY, USA : ACM, 2008. — P. 195–204. 42. Zhang Y., Owens J. A quantitative performance analysis model for GPU architectures // High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on / IEEE. — 2011. — P. 382–393. 43. El-Rewini H., Abd-El-Barr M. Advanced computer architecture and parallel processing.— Wiley. com, 2005. — Vol. 42. 44. Bokhari S. On the mapping problem // Computers, IEEE Transactions on. — 1981. — Vol. 100, no. 3. — P. 207–214. 45. Kwok Y.-K., Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors // ACM Computing Surveys (CSUR). — 1999. — Vol. 31, no. 4. — P. 406–471. 46. Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip / Martino Ruggiero, Alessio Guerri, Davide Bertozzi et al. // Design, Automation and Test in Europe, 2006. DATE’06. Proceedings / IEEE.—Vol. 1.—2006.— P. 6–pp. 47. Chaudhary V., Aggarwal J. A generalized scheme for mapping parallel algorithms // Parallel and Distributed Systems, IEEE Transactions on. — 1993. — Vol. 4, no. 3. — P. 328–346. 48. Scheduling strategies for master-slave tasking on heterogeneous processor platforms / Cyril Banino, Olivier Beaumont, Larry Carter et al. // Parallel and Distributed Systems, IEEE Transactions on. — 2004. — Vol. 15, no. 4. — P. 319–330. 49. Chen J., John L. K. Efficient program scheduling for heterogeneous multi-core processors // Proceedings of the 46th Annual Design Automation Conference / ACM. — 2009. — P. 927–930. 50. Gerasoulis A., Yang T. A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors // Journal of Parallel and Distributed Computing. — 1992. — Vol. 16, no. 4. — P. 276–291. 51. Kim S., Browne J. A general approach to mapping of parallel computation upon multiprocessor architectures // International Conference on Parallel Processing. — Vol. 3. — 1988. — P. 8. 52. Kwok Y.-K., Ahmad I. Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors // Parallel and Distributed Systems, IEEE Transactions on. — 1996. — Vol. 7, no. 5. — P. 506–521. 53. Maheswaran M., Siegel H. J. A dynamic matching and scheduling algorithm for heterogeneous computing systems // Heterogeneous Computing Workshop, 1998.(HCW 98) Proceedings. 1998 Seventh / IEEE. — 1998. — P. 57–69. 54. Hendrickson B., Kolda T. Graph partitioning models for parallel computing // Parallel computing. — 2000. — Vol. 26, no. 12. — P. 1519–1534. 55. Catalyurek U. V., Aykanat C. Hypergraph-partitioning-based decomposition for parallel sparse- matrix vector multiplication // Parallel and Distributed Systems, IEEE Transactions on. — 1999. — Vol. 10, no. 7. — P. 673–693. 56. Multilevel hypergraph partitioning: applications in vlsi domain / George Karypis, Rajat Aggarwal, Vipin Kumar, Shashi Shekhar // Very Large Scale Integration (VLSI) Systems, IEEE Transactions on. — 1999. — Vol. 7, no. 1. — P. 69–79. 57. Hendricksony B., Lelandz R., Van Driesschex R. Skewed graph partitioning // Eighth SIAM conference on parallel processing for scientific computing. — 1997. 58. Hendrickson B. Graph partitioning and parallel solvers: Has the emperor no clothes? // Solving Irregularly Structured Problems in Parallel. — 1998. — P. 218–225. 59. Schloegel K., Karypis G., Kumar V. Parallel static and dynamic multi-constraint graph partitioning // Concurrency and Computation: Practice and Experience.—2002.—Vol. 14, no. 3. — P. 219–240. 60. Zoltan data management services for parallel dynamic applications / Karen Devine, Erik Boman, Robert Heaphy et al. // Computing in Science & Engineering. — 2002. — Vol. 4, no. 2. — P. 90–96. 61. Partitioning and scheduling using graph decomposition / C McCreary, J Thompson, H Gill et al. // Department of Computer Science and Engineering CSE-93-06, Auburn University. — 1993. 62. Ahmad I., Dhodhi M., Ul-Mustafa R. Dps: Dynamic priority scheduling heuristic for heterogeneous computing systems // Computers and Digital Techniques, IEE Proceedings- / IET. — Vol. 145. — 1998. — P. 411–418. 63. Yang T., Gerasoulis A. Dsc: Scheduling parallel tasks on an unbounded number of processors // Parallel and Distributed Systems, IEEE Transactions on. — 1994. — Vol. 5, no. 9. — P. 951–967. 64. Kim D., Yi B.-G. A two-pass scheduling algorithm for parallel programs // Parallel Computing. — 1994. — Vol. 20, no. 6. — P. 869–885. 65. A selection theory and methodology for heterogeneous supercomputing / Song Chen, Mary Eshaghian, Ashfaq Khokhar, Muhammad Shaaban // Heterogeneous Processing.— 1993. — P. 15–22. 66. Cirou B., Jeannot E. Triplet: a clustering scheduling algorithm for heterogeneous systems // Parallel Processing Workshops, 2001. International Conference on / IEEE. — 2001. — P. 231– 236. 67. El-Rewini H., Ali H., Lewis T. Task scheduling in multiprocessing systems // Computer. —1995. — Vol. 28, no. 12. — P. 27–37. 68. Goldenberg M., Lu P., Schaeffer J. Trellisdag: A system for structured dag scheduling // Job Scheduling Strategies for Parallel Processing / Springer. — 2003. — P. 21–43. 69. Kale L. V., Krishnan S. Charm++: Parallel Programming with Message-Driven Objects // Parallel Programming using C++ / Ed. by Gregory V. Wilson, Paul Lu. — MIT Press, 1996. — P. 175–213. 70. Wu M.-Y., Shu W., Chen Y. A runtime system for dynamic dag programming // Parallel and Distributed Processing / Ed. by Jos ́e Rolim. — Springer Berlin Heidelberg, 2000. — Vol. 1800 of Lecture Notes in Computer Science. — P. 1192–1199. 71. Song F., YarKhan A., Dongarra J. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems // High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on / IEEE. — 2009. — P. 1–11. 72. Trifunovic A., Knottenbelt W. J. Parkway 2.0: A parallel multilevel hypergraph partitioning tool // Computer and Information Sciences-ISCIS 2004. — Springer, 2004. — P. 789–800. 73. Yu J., Buyya R. A taxonomy of workflow management systems for grid computing // Journal of Grid Computing. — 2005. — Vol. 3, no. 3-4. — P. 171–200. 74. Pegasus: Mapping scientific workflows onto the grid / Ewa Deelman, James Blythe, Yolanda Gil et al. // Grid Computing / Springer. — 2004. — P. 11–20. 75. Evaluation of job-scheduling strategies for grid computing / Volker Hamscher, Uwe Schwiegelshohn, Achim Streit, Ramin Yahyapour // Grid Computing—GRID 2000.— Springer, 2000. — P. 191–202. 76. Yu J., Buyya R. A taxonomy of scientific workflow systems for grid computing // Sigmod Record. — 2005. — Vol. 34, no. 3. — P. 44–49. 77. Grewe D., O’Boyle M. A static task partitioning approach for heterogeneous systems using opencl // Compiler Construction / Springer. — 2011. — P. 286–305. 78. Diamos G. F., Yalamanchili S. Harmony: an execution model and runtime for heterogeneous many core systems // Proceedings of the 17th international symposium on High performance distributed computing / ACM. — 2008. — P. 197–200. 79. Starpu: A unified platform for task scheduling on heterogeneous multicore architectures / C. Augonnet, S. Thibault, R. Namyst, P.A. Wacrenier // Concurrency and Computation: Practice and Experience. — 2011. — Vol. 23, no. 2. — P. 187–198. 80. Gregg C., Brantley J., Hazelwood K. Contention-aware scheduling of parallel code for heterogeneous systems // 2nd USENIX Workshop on Hot Topics in Parallelism, HotPar, Berkeley, CA. — 2010. 81. Predictive runtime code scheduling for heterogeneous architectures / V ́ıctor J Jim ́enez, Llu ́ıs Vilanova, Isaac Gelado et al. // High Performance Embedded Architectures and Compilers. — Springer, 2009. — P. 19–33. 82. Becchi M., Crowley P. Dynamic thread assignment on heterogeneous multiprocessor architectures // Proceedings of the 3rd conference on Computing frontiers / ACM. — 2006. — P. 29–40. 83. Weng N., Wolf T. Profiling and mapping of parallel workloads on network processors // Proceedings of the 2005 ACM symposium on Applied computing / ACM. — 2005. — P. 890– 896. 84. Hass: a scheduler for heterogeneous multicore systems / Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery et al. // ACM SIGOPS Operating Systems Review. — 2009. — Vol. 43, no. 2. — P. 66–75. 85. Dague: A generic distributed dag engine for high performance computing / George Bosilca, Aurelien Bouteiller, Anthony Danalis et al. // Parallel Computing. — 2012. — Vol. 38, no. 1. — P. 37–51. 86. Szymanek R., Kuchcinski K. A constructive algorithm for memory-aware task assignment and scheduling // Proceedings of the ninth international symposium on Hardware/software codesign / ACM. — 2001. — P. 147–152. 87. Cab: Cache aware bi-tier task-stealing in multi-socket multi-core architecture / Quan Chen, Zhiyi Huang, Minyi Guo, Jingyu Zhou // Parallel Processing (ICPP), 2011 International Conference on / IEEE. — 2011. — P. 722–732. 88. Blumofe R. D., Leiserson C. E. Scheduling multithreaded computations by work stealing // Journal of the ACM (JACM). — 1999. — Vol. 46, no. 5. — P. 720–748. 89. Cilk: An efficient multithreaded runtime system / Robert D Blumofe, Christopher F Joerg, Bradley C Kuszmaul et al. — ACM, 1995. — Vol. 30. 90. Scheduling task parallelism on multi-socket multicore systems / Stephen L Olivier, Allan K Porterfield, Kyle B Wheeler, Jan F Prins // Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers / ACM. — 2011. — P. 49– 56. 91. Squillante M. S., Nelson R. D. Analysis of task migration in shared-memory multiprocessor scheduling. — ACM, 1991. — Vol. 19. 92. Work distribution methods on GPUs : Rep. / Department of Computer Science, University of North Carolina ; Executor: C. Lauterback, Q. Mo, D. Manocha : 2009. 93. Xia Y., Prasanna V., Li J. Hierarchical scheduling of dag structured computations on manycore processors with dynamic thread grouping // Job Scheduling Strategies for Parallel Processing / Springer. — 2010. — P. 154–174. 94. Sequoia: programming the memory hierarchy / Kayvon Fatahalian, Daniel Reiter Horn, Timothy J Knight et al. // Proceedings of the 2006 ACM/IEEE conference on Supercomputing / ACM. — 2006. — P. 83. Публикации с ключевыми словами: планирование, параллельные алгоритмы, графические процессоры, отображение, гетерогенные системы Публикации со словами: планирование, параллельные алгоритмы, графические процессоры, отображение, гетерогенные системы Смотри также: Тематические рубрики: Поделиться:
|
|
|||||||||||||||||||
|