# | Title | Journal | Year | Citations |
---|
1 | A high-performance, portable implementation of the MPI message passing interface standard | Parallel Computing | 1996 | 1,639 |
2 | Automated empirical optimizations of software and the ATLAS project | Parallel Computing | 2001 | 928 |
3 | The ganglia distributed monitoring system: design, implementation, and experience | Parallel Computing | 2004 | 903 |
4 | Hybrid scheduling for the parallel solution of linear systems | Parallel Computing | 2006 | 805 |
5 | Robust taboo search for the quadratic assignment problem | Parallel Computing | 1991 | 726 |
6 | Parallel reactive molecular dynamics: Numerical methods and algorithmic techniques | Parallel Computing | 2012 | 716 |
7 | Genetic algorithms and neural networks: optimizing connections and connectivity | Parallel Computing | 1990 | 622 |
8 | The parallel genetic algorithm as function optimizer | Parallel Computing | 1991 | 569 |
9 | Data management and transfer in high-performance computational grid environments | Parallel Computing | 2002 | 467 |
10 | PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation | Parallel Computing | 2012 | 396 |
11 | Graph partitioning models for parallel computing | Parallel Computing | 2000 | 371 |
12 | A dynamic model and parallel tabu search heuristic for real-time ambulance relocation | Parallel Computing | 2001 | 360 |
13 | Evolution algorithms in combinatorial optimization | Parallel Computing | 1988 | 352 |
14 | A class of parallel tiled linear algebra algorithms for multicore architectures | Parallel Computing | 2009 | 327 |
15 | Swift: A language for distributed parallel scripting | Parallel Computing | 2011 | 319 |
16 | Parallel algorithms for hierarchical clustering | Parallel Computing | 1995 | 314 |
17 | Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming | Parallel Computing | 2004 | 311 |
18 | Particle Swarm based Data Mining Algorithms for classification tasks | Parallel Computing | 2004 | 309 |
19 | Towards dense linear algebra for hybrid GPU accelerated manycore systems | Parallel Computing | 2010 | 295 |
20 | SUPERB: A tool for semi-automatic MIMD/SIMD parallelization | Parallel Computing | 1988 | 290 |
21 | Optimization of sparse matrix–vector multiplication on emerging multicore platforms | Parallel Computing | 2009 | 276 |
22 | PT-Scotch: A tool for efficient parallel graph ordering | Parallel Computing | 2008 | 271 |
23 | Parallel recombinative simulated annealing: A genetic algorithm | Parallel Computing | 1995 | 261 |
24 | Symmetry in interconnection networks based on Cayley graphs of permutation groups: A survey | Parallel Computing | 1993 | 260 |
25 | Extensible component-based architecture for FLASH, a massively parallel, multiphysics simulation code | Parallel Computing | 2009 | 219 |
26 | BSPlib: The BSP programming library | Parallel Computing | 1998 | 218 |
27 | The PVM concurrent computing system: Evolution, experiences, and trends | Parallel Computing | 1994 | 207 |
28 | The communication challenge for MPP: Intel Paragon and Meiko CS-2 | Parallel Computing | 1994 | 201 |
29 | From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming | Parallel Computing | 2012 | 198 |
30 | A hybrid MPI–OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence | Parallel Computing | 2011 | 196 |
31 | DAGuE: A generic distributed DAG engine for High Performance Computing | Parallel Computing | 2012 | 196 |
32 | A hybrid multi-objective Particle Swarm Optimization for scientific workflow scheduling | Parallel Computing | 2017 | 194 |
33 | Parallel implementation of the TRANSIMS micro-simulation | Parallel Computing | 2001 | 193 |
34 | Parallel Tabu search heuristics for the dynamic multi-vehicle dial-a-ride problem | Parallel Computing | 2004 | 192 |
35 | Parallel GRASP with path-relinking for job shop scheduling | Parallel Computing | 2003 | 191 |
36 | Matrix algorithms on a hypercube I: Matrix multiplication | Parallel Computing | 1987 | 181 |
37 | Multiprocessor FFTs | Parallel Computing | 1987 | 179 |
38 | Component averaging: An efficient iterative parallel algorithm for large and sparse unstructured problems | Parallel Computing | 2001 | 175 |
39 | A parallel tabu search algorithm for solving the container loading problem | Parallel Computing | 2003 | 172 |
40 | FFT algorithms for vector computers | Parallel Computing | 1984 | 166 |
41 | MapReduce in MPI for Large-scale graph algorithms | Parallel Computing | 2011 | 162 |
42 | Distributed processing of very large datasets with DataCutter | Parallel Computing | 2001 | 161 |
43 | PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems | Parallel Computing | 2002 | 160 |
44 | High performance computing using MPI and OpenMP on multi-core parallel systems | Parallel Computing | 2011 | 151 |
45 | Computational aspects of a code to study rotating turbulent convection in spherical shells | Parallel Computing | 1999 | 149 |
46 | Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations | Parallel Computing | 2011 | 147 |
47 | New advances in chemistry and materials science with CPMD and parallel computing | Parallel Computing | 2000 | 146 |
48 | Sparse matrix multiplication: The distributed block-compressed sparse row library | Parallel Computing | 2014 | 143 |
49 | Cost-efficient task scheduling for executing large programs in the cloud | Parallel Computing | 2013 | 141 |
50 | Monitors, messages, and clusters: The p4 parallel programming system | Parallel Computing | 1994 | 139 |
51 | List scheduling with and without communication delays | Parallel Computing | 1993 | 137 |
52 | High-performance parallel implicit CFD | Parallel Computing | 2001 | 137 |
53 | Parallel synchronous and asynchronous implementations of the auction algorithm | Parallel Computing | 1991 | 136 |
54 | Parallel heuristics for scalable community detection | Parallel Computing | 2015 | 136 |
55 | Probabilistic methods for centroidal Voronoi tessellations and their parallel implementations | Parallel Computing | 2002 | 135 |
56 | A single-program-multiple-data computational model for EPEX/FORTRAN | Parallel Computing | 1988 | 134 |
57 | The programming model of ASSIST, an environment for parallel and distributed portable applications | Parallel Computing | 2002 | 134 |
58 | Sub optimal scheduling in a grid using genetic algorithms | Parallel Computing | 2004 | 134 |
59 | A parallel hybrid banded system solver: the SPIKE algorithm | Parallel Computing | 2006 | 134 |
60 | Multiprocessor scheduling with communication delays | Parallel Computing | 1990 | 133 |
61 | Parallel graph component labelling with GPUs and CUDA | Parallel Computing | 2010 | 132 |
62 | Data communication in parallel architectures | Parallel Computing | 1989 | 129 |
63 | Efficient schemes for nearest neighbor load balancing | Parallel Computing | 1999 | 121 |
64 | A quadtree approach to domain decomposition for spatial interpolation in Grid computing environments | Parallel Computing | 2003 | 121 |
65 | Multilevel summation of electrostatic potentials using graphics processing units | Parallel Computing | 2009 | 118 |
66 | Parallel implementation of multifrontal schemes | Parallel Computing | 1986 | 117 |
67 | A parallel solver for large quadratic programs in training support vector machines | Parallel Computing | 2003 | 117 |
68 | Maximizing parallelism and minimizing synchronization with affine partitions | Parallel Computing | 1998 | 112 |
69 | A survey on resource allocation in high performance distributed computing systems | Parallel Computing | 2013 | 112 |
70 | Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm | Parallel Computing | 2014 | 107 |
71 | Computational solution of capacity planning models under uncertainty | Parallel Computing | 2000 | 106 |
72 | Cellular automata computations and secret key cryptography | Parallel Computing | 2004 | 106 |
73 | Scheduling for heterogeneous Systems using constrained critical paths | Parallel Computing | 2012 | 106 |
74 | A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems | Parallel Computing | 2006 | 104 |
75 | A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems | Parallel Computing | 2005 | 103 |
76 | Parallel image processing applications on a network of workstations | Parallel Computing | 1995 | 102 |
77 | Fault diagnosis for airplane engines using Bayesian networks and distributed particle swarm optimization | Parallel Computing | 2007 | 100 |
78 | Large tridiagonal and block tridiagonal linear systems on vector and parallel computers | Parallel Computing | 1987 | 99 |
79 | On the impact of the migration topology on the Island Model | Parallel Computing | 2010 | 99 |
80 | Parallel job scheduling for power constrained HPC systems | Parallel Computing | 2012 | 99 |
81 | Exploring weak scalability for FEM calculations on a GPU-enhanced cluster | Parallel Computing | 2007 | 98 |
82 | Parallel clustering algorithms | Parallel Computing | 1989 | 97 |
83 | Multitasking the conjugate gradient method on the CRAY X-MP/48 | Parallel Computing | 1987 | 96 |
84 | ScaffCC: Scalable compilation and analysis of quantum programs | Parallel Computing | 2015 | 96 |
85 | Parallel Gaussian elimination on an MIMD computer | Parallel Computing | 1988 | 92 |
86 | Performance of parallel processors | Parallel Computing | 1989 | 90 |
87 | Toward a better parallel performance metric | Parallel Computing | 1991 | 90 |
88 | The design of a standard message passing interface for distributed memory concurrent computers | Parallel Computing | 1994 | 90 |
89 | On the versatility of parallel sorting by regular sampling | Parallel Computing | 1993 | 89 |
90 | Optimizing noncontiguous accesses in MPI–IO | Parallel Computing | 2002 | 89 |
91 | Two-level dynamic scheduling in PARDISO: Improved scalability on shared memory multiprocessing systems | Parallel Computing | 2002 | 89 |
92 | Message-passing multi-cell molecular dynamics on the connection machine 5 | Parallel Computing | 1994 | 88 |
93 | Optimizing a conjugate gradient solver with non-blocking collective operations | Parallel Computing | 2007 | 88 |
94 | Distributed frameworks and parallel algorithms for processing large-scale geographic data | Parallel Computing | 2003 | 87 |
95 | A parallel multiphase flow code for the 3D simulation of explosive volcanic eruptions | Parallel Computing | 2007 | 85 |
96 | Gunrock | ACM Transactions on Parallel Computing | 2017 | 84 |
97 | Design and performance of a scalable parallel community climate model | Parallel Computing | 1995 | 83 |
98 | Optimizing data intensive GPGPU computations for DNA sequence alignment | Parallel Computing | 2009 | 83 |
99 | The REFINE multiprocessor — theoretical properties and algorithms | Parallel Computing | 1995 | 82 |
100 | Parallel optimisation algorithms for multilevel mesh partitioning | Parallel Computing | 2000 | 82 |