## Manuel E E Acacio

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/7629175/publications.pdf

Version: 2024-02-01

1040056 940533 99 694 9 16 citations g-index h-index papers 99 99 99 423 docs citations times ranked citing authors all docs

| #  | Article                                                                                                                                                                               | IF           | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------|
| 1  | A Parallel Implementation of the 2D Wavelet Transform Using CUDA. , 2009, , .                                                                                                         |              | 62        |
| 2  | A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 2005, 16, 67-79.                               | 5.6          | 37        |
| 3  | A Direct Coherence Protocol for Many-Core Chip Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 2010, 21, 1779-1792.                                           | 5 <b>.</b> 6 | 32        |
| 4  | A new scalable directory architecture for large-scale multiprocessors. , 0, , .                                                                                                       |              | 30        |
| 5  | Heterogeneous Interconnects for Energy-Efficient Message Management in CMPs. IEEE Transactions on Computers, 2010, 59, 16-28.                                                         | 3.4          | 26        |
| 6  | DiCo-CMP: Efficient cache coherency in tiled CMP architectures. Parallel and Distributed Processing Symposium (IPDPS), Proceedings of the International Conference on, 2008, , .      | 1.0          | 25        |
| 7  | A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures. , 2007, , .                                                                                                   |              | 24        |
| 8  | GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs. , 2011, , .                                                                                                   |              | 23        |
| 9  | Heterogeneous NoC Design for Efficient Broadcast-based Coherence Protocol Support. , 2012, , .                                                                                        |              | 23        |
| 10 | Efficient Hardware Barrier Synchronization in Many-Core CMPs. IEEE Transactions on Parallel and Distributed Systems, 2012, 23, 1453-1466.                                             | 5.6          | 21        |
| 11 | An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration. IEEE Transactions on Parallel and Distributed Systems, 2004, 15, 755-768. | <b>5.</b> 6  | 17        |
| 12 | ZEBRA., 2011,,.                                                                                                                                                                       |              | 16        |
| 13 | An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures. Journal of Supercomputing, 2008, 45, 341-364.                                 | <b>3.</b> 6  | 15        |
| 14 | The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors. , 0, , .                                                                                            |              | 14        |
| 15 | Sim-PowerCMP: A Detailed Simulator for Energy Consumption Analysis in Future Embedded CMP Architectures. , 2007, , .                                                                  |              | 14        |
| 16 | ASCIB., 2012,,.                                                                                                                                                                       |              | 14        |
| 17 | &#x03C0;-TM: Pessimistic invalidation for scalable lazy hardware transactional memory. , 2012, , .                                                                                    |              | 14        |
| 18 | A scalable organization for distributed directories. Journal of Systems Architecture, 2010, 56, 77-87.                                                                                | 4.3          | 13        |

| #  | Article                                                                                                                                                          | IF  | Citations |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Characterizing Energy Consumption in Hardware Transactional Memory Systems. , 2010, , .                                                                          |     | 13        |
| 20 | A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs., 2010, , .                                                              |     | 12        |
| 21 | Speculation-based conflict resolution in hardware transactional memory. , 2009, , .                                                                              |     | 11        |
| 22 | A Novel Lightweight Directory Architecture for Scalable Shared-Memory Multiprocessors. Lecture Notes in Computer Science, 2005, , 582-591.                       | 1.3 | 10        |
| 23 | Distance-aware round-robin mapping for large NUCA caches. , 2009, , .                                                                                            |     | 9         |
| 24 | Efficient and scalable barrier synchronization for many-core CMPs., 2010,,.                                                                                      |     | 9         |
| 25 | EMC <sup>2</sup> : Extending Magny-Cours coherence for large-scale servers., 2010,,.                                                                             |     | 9         |
| 26 | STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators. IEEE Computer Architecture Letters, 2021, 20, 122-125.                | 1.5 | 9         |
| 27 | An efficient cache design for scalable glueless shared-memory multiprocessors. , 2006, , .                                                                       |     | 8         |
| 28 | Characterization of Conflicts in Log-Based Transactional Memory (LogTM)., 2008,,.                                                                                |     | 8         |
| 29 | Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE. Journal of Supercomputing, 2012, 62, 787-803.                        | 3.6 | 7         |
| 30 | Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory. IEEE Transactions on Parallel and Distributed Systems, 2013, 24, 2192-2201. | 5.6 | 7         |
| 31 | Parallel implementations of the 3D fast wavelet transform on a Raspberry Pi 2 cluster. Journal of Supercomputing, 2018, 74, 1765-1778.                           | 3.6 | 7         |
| 32 | An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology. Parallel Computing, 2007, 33, 54-72.                          | 2.1 | 6         |
| 33 | A fault-tolerant directory-based cache coherence protocol for CMP architectures. , 2008, , .                                                                     |     | 6         |
| 34 | Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory. , 2011, , .                                                                    |     | 6         |
| 35 | On the design of energy-efficient hardware transactional memory systems. Concurrency Computation Practice and Experience, 2013, 25, 862-880.                     | 2.2 | 6         |
| 36 | Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors. , 2007, , 147-160.                                             |     | 6         |

| #  | Article                                                                                                                                                                        | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs. Lecture Notes in Computer Science, 2009, , 11-27.                                        | 1.3 | 6         |
| 38 | Efficient Message Management in Tiled CMP Architectures Using a Heterogeneous Interconnection Network., 2007,, 133-146.                                                        |     | 6         |
| 39 | MPI–Delphi: an MPI implementation for visual programming environments and heterogeneous computing. Future Generation Computer Systems, 2002, 18, 317-333.                      | 7.5 | 5         |
| 40 | CellStats: A Tool to Evaluate the Basic Synchronization and Communication Operations of the Cell BE. , 2008, , .                                                               |     | 5         |
| 41 | Selective dynamic serialization for reducing energy consumption in hardware transactional memory systems. Journal of Supercomputing, 2014, 68, 914-934.                        | 3.6 | 5         |
| 42 | To be silent or not: on the impact of evictions of clean data in cache-coherent multicores. Journal of Supercomputing, 2017, 73, 4428-4443.                                    | 3.6 | 5         |
| 43 | Evaluating IA-32 web servers through simics: a practical experience. Journal of Systems Architecture, 2005, 51, 251-264.                                                       | 4.3 | 4         |
| 44 | Way-combining directory., 2017,,.                                                                                                                                              |     | 4         |
| 45 | Photonic-based express coherence notifications for many-core CMPs. Journal of Parallel and Distributed Computing, 2018, 113, 179-194.                                          | 4.1 | 4         |
| 46 | Energy-Efficient Hardware Prefetching for CMPs Using Heterogeneous Interconnects. , 2010, , .                                                                                  |     | 3         |
| 47 | Dealing with Transient Faults in the Interconnection Network of CMPs at the Cache Coherence Level. IEEE Transactions on Parallel and Distributed Systems, 2010, 21, 1117-1131. | 5.6 | 3         |
| 48 | Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory. , 2011, , .                                                                                   |     | 3         |
| 49 | Hardware transactional memory with software-defined conflicts. Transactions on Architecture and Code Optimization, 2012, 8, 1-20.                                              | 2.0 | 3         |
| 50 | Dynamic Serialization: Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems., 2012,,.                                                             |     | 3         |
| 51 | Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs., 2012,,.                                                    |     | 3         |
| 52 | Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses. IEEE Transactions on Computers, 2014, , 1-1.                                                           | 3.4 | 3         |
| 53 | ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory. IEEE Transactions on Parallel and Distributed Systems, 2014, 25, 1359-1369.                        | 5.6 | 3         |
| 54 | Early Experiences with Separate Caches for Private and Shared Data. , 2015, , .                                                                                                |     | 3         |

| #  | Article                                                                                                                                                                       | IF  | Citations |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | Are distributed sharing codes a solution to the scalability problem of coherence directories in manycores? An evaluation study. Journal of Supercomputing, 2016, 72, 612-638. | 3.6 | 3         |
| 56 | InsideNet: A tool for characterizing convolutional neural networks. Future Generation Computer Systems, 2019, 100, 298-315.                                                   | 7.5 | 3         |
| 57 | Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades.<br>Lecture Notes in Computer Science, 2008, , 456-465.                       | 1.3 | 3         |
| 58 | Directory-Based Conflict Detection in Hardware Transactional Memory. Lecture Notes in Computer Science, 2008, , 541-554.                                                      | 1.3 | 3         |
| 59 | Memory Subsystem Characterization in a 16-Core Snoop-Based Chip-Multiprocessor Architecture.<br>Lecture Notes in Computer Science, 2005, , 213-222.                           | 1.3 | 3         |
| 60 | Optimizing a 3D-FWT Video Encoder for SMPs and HyperThreading Architectures., 0,,.                                                                                            |     | 2         |
| 61 | Address Compression and Heterogeneous Interconnects for Energy-Efficient High-Performance in Tiled CMPs. , 2008, , .                                                          |     | 2         |
| 62 | An Experience of Early Initiation to Parallelism in the Computing Engineering Degree at the University of Murcia, Spain. , $2012$ , , .                                       |     | 2         |
| 63 | Extending Magny-Cours Cache Coherence. IEEE Transactions on Computers, 2012, 61, 593-606.                                                                                     | 3.4 | 2         |
| 64 | ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs., 2013,,.                                                                              |     | 2         |
| 65 | DASC-DIR: a low-overhead coherence directory for many-core processors. Journal of Supercomputing, 2015, 71, 781-807.                                                          | 3.6 | 2         |
| 66 | Way Combination for an Adaptive and Scalable Coherence Directory. IEEE Transactions on Parallel and Distributed Systems, 2019, 30, 2608-2623.                                 | 5.6 | 2         |
| 67 | PfTouch: Concurrent page-fault handling for Intel restricted transactional memory. Journal of Parallel and Distributed Computing, 2020, 145, 111-123.                         | 4.1 | 2         |
| 68 | Concurrent Irrevocability in Best-Effort Hardware Transactional Memory. IEEE Transactions on Parallel and Distributed Systems, 2020, 31, 1301-1315.                           | 5.6 | 2         |
| 69 | Characterization of a List-Based Directory Cache Coherence Protocol for Manycore CMPs. Lecture Notes in Computer Science, 2014, , 254-265.                                    | 1.3 | 2         |
| 70 | A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators. , 2021, , .                                                                      |     | 2         |
| 71 | P-EDR: An algorithm for parallel implementation of Parzen density estimation from uncertain observations. , 0, , .                                                            |     | 1         |
| 72 | On the Evaluation of Dense Chip-Multiprocessor Architectures. , 2006, , .                                                                                                     |     | 1         |

| #  | Article                                                                                                                                                                                                  | IF  | CITATIONS |
|----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | Extending the TokenCMP Cache Coherence Protocol for Low Overhead Fault Tolerance in CMP Architectures. IEEE Transactions on Parallel and Distributed Systems, 2008, 19, 1044-1056.                       | 5.6 | 1         |
| 74 | The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems. , 2011, , .                                                                                                            |     | 1         |
| 75 | Deploying Hardware Locks to Improve Performance and Energy Efficiency of Hardware Transactional Memory. Lecture Notes in Computer Science, 2013, , 220-231.                                              | 1.3 | 1         |
| 76 | Design of an efficient communication infrastructure for highly contended locks in many-core CMPs. Journal of Parallel and Distributed Computing, 2013, 73, 972-985.                                      | 4.1 | 1         |
| 77 | Efficient Eager Management of Conflicts for Scalable Hardware Transactional Memory. IEEE Transactions on Parallel and Distributed Systems, 2013, 24, 59-71.                                              | 5.6 | 1         |
| 78 | Optimization of a Linked Cache Coherence Protocol for Scalable Manycore Coherence. Lecture Notes in Computer Science, 2016, , 100-112.                                                                   | 1.3 | 1         |
| 79 | A dedicated privateâ€shared cache design for scalable multiprocessors. Concurrency Computation Practice and Experience, 2017, 29, e3871.                                                                 | 2.2 | 1         |
| 80 | On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster. Scientific Programming, 2018, 2018, 1-10.                                                                                         | 0.7 | 1         |
| 81 | SAWS: Simple and Adaptive Warp Scheduling for Improved Performance in Throughput Processors. , 2018, , .                                                                                                 |     | 1         |
| 82 | DeTraS: Delaying Stores for Friendly-Fire Mitigation in Hardware Transactional Memory. IEEE Transactions on Parallel and Distributed Systems, 2021, , 1-1.                                               | 5.6 | 1         |
| 83 | Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades. Lecture Notes in Computer Science, 2009, , 900-911.                                               | 1.3 | 1         |
| 84 | ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous Multithreading. , 2021, , .                                                                                                                  |     | 1         |
| 85 | On the Evaluation of x86 Web Servers Using Simics: Limitations and Trade-Offs. Lecture Notes in Computer Science, 2004, , 541-544.                                                                       | 1.3 | 1         |
| 86 | Fault-Tolerant Cache Coherence Protocols for CMPs: Evaluation and Trade-Offs. Lecture Notes in Computer Science, 2008, , 555-568.                                                                        | 1.3 | 1         |
| 87 | Efficient Hardware-Supported Synchronization Mechanisms for Manycores., 2015,, 753-803.                                                                                                                  |     | 1         |
| 88 | Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration. , 0, , .                                                                                       |     | 0         |
| 89 | Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 2008, 68, 1413-1424. | 4.1 | 0         |
| 90 | Characterizing the basic synchronization and communication operations in Dual Cell-based Blades through CellStats. Journal of Supercomputing, 2010, 53, 247-268.                                         | 3.6 | 0         |

| #  | Article                                                                                                                                                                 | IF  | Citations |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 91 | Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs. Journal of Systems Architecture, 2010, 56, 429-441.      | 4.3 | O         |
| 92 | Using Heterogeneous Networks to Improve Energy Efficiency in Direct Coherence Protocols for Many-Core CMPs. , $2012$ , , .                                              |     | 0         |
| 93 | Efficient DirOB Cache Coherency for Many-core CMPs. Procedia Computer Science, 2013, 18, 2545-2548.                                                                     | 2.0 | 0         |
| 94 | Fast and efficient commits for Lazy-Lazy hardware transactional memory. Journal of Supercomputing, 2015, 71, 4305-4326.                                                 | 3.6 | 0         |
| 95 | Foreword to the Special Issue on Processors, Interconnects, Storage, and Caches for Exascale Systems. Concurrency Computation Practice and Experience, 2019, 31, e5408. | 2.2 | O         |
| 96 | Analysing software prefetching opportunities in hardware transactional memory. Journal of Supercomputing, $0, 1$ .                                                      | 3.6 | 0         |
| 97 | Towards Efficient Dynamic LLC Home Bank Mapping with NoC-Level Support. Lecture Notes in Computer Science, 2013, , 178-190.                                             | 1.3 | O         |
| 98 | Hardware Approaches to Transactional Memory in Chip Multiprocessors. , 2015, , 805-835.                                                                                 |     | 0         |
| 99 | Analysis of the Interactions Between ILP and TLP With Hardware Transactional Memory. , 2022, , .                                                                        |     | O         |