List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/186771/publications.pdf Version: 2024-02-01



| #  | Article                                                                                                                                                                                        | IF  | CITATIONS |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 1  | CASH-RF: A Compiler-Assisted Hierarchical Register File in GPUs. IEEE Embedded Systems Letters, 2022, 14, 187-190.                                                                             | 1.3 | 3         |
| 2  | FLIXR: Embedding Index into Flash Translation Layer in SSDs. IEEE Transactions on Computers, 2022, , 1-1.                                                                                      | 2.4 | 0         |
| 3  | PIMCaffe: Functional Evaluation of a Machine Learning Framework for In-Memory Neural Processing<br>Unit. IEEE Access, 2021, 9, 96629-96640.                                                    | 2.6 | 4         |
| 4  | SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations. , 2021, , .                                                                                         |     | 14        |
| 5  | Two-Stage In-Storage Processing and Scheduling for Pattern Matching Applications. IEEE Access, 2021, 9, 95702-95715.                                                                           | 2.6 | 0         |
| 6  | Check-In: In-Storage Checkpointing for Key-Value Store System Leveraging Flash-Based SSDs. , 2020, , .                                                                                         |     | 3         |
| 7  | Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs. IEEE<br>Access, 2020, 8, 127768-127780.                                                          | 2.6 | 8         |
| 8  | BODCA: Heterogeneous CPU-GPU computing system with Bandwidth-Optimized DRAM cache design. ,<br>2020, , .                                                                                       |     | 0         |
| 9  | Access Characteristic-based Cache Replacement Policy in an SSD. , 2020, , .                                                                                                                    |     | 3         |
| 10 | REACT: Scalable and High-Performance Regular Expression Pattern Matching Accelerator for<br>In-Storage Processing. IEEE Transactions on Parallel and Distributed Systems, 2020, 31, 1137-1151. | 4.0 | 8         |
| 11 | Duplo: Lifting Redundant Memory Accesses of Deep Neural Networks for GPU Tensor Cores. , 2020, , .                                                                                             |     | 10        |
| 12 | Interaction Data Analysis for Personalized Recommendation System. , 2020, , .                                                                                                                  |     | 2         |
| 13 | Fast CU Depth Decision for HEVC Using Neural Networks. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29, 1462-1473.                                                    | 5.6 | 50        |
| 14 | OverCome: Coarse-Grained Instruction Commit with Handover Register Renaming. IEEE Transactions on Computers, 2019, 68, 1802-1816.                                                              | 2.4 | 1         |
| 15 | Adaptive Cooperation of Prefetching and Warp Scheduling on GPUs. IEEE Transactions on Computers, 2019, 68, 609-616.                                                                            | 2.4 | 6         |
| 16 | Contents-aware partitioning algorithm for parallel high efficiency video coding. Multimedia Tools<br>and Applications, 2019, 78, 11427-11442.                                                  | 2.6 | 1         |
| 17 | WIR: Warp Instruction Reuse to Minimize Repeated Computations in GPUs. , 2018, , .                                                                                                             |     | 15        |
| 18 | Exploiting Pseudo-Quadtree Structure for Accelerating HEVC Spatial Resolution Downscaling Transcoder. IEEE Transactions on Multimedia, 2018, 20, 2262-2275.                                    | 5.2 | 3         |

| #  | Article                                                                                                                                                                      | IF  | CITATIONS |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Efficient and reliable NAND flash channel for high-speed solid state drives. , 2018, , .                                                                                     |     | 3         |
| 20 | WASP: Selective Data Prefetching with Monitoring Runtime Warp Progress on GPUs. IEEE Transactions on Computers, 2018, 67, 1366-1373.                                         | 2.4 | 4         |
| 21 | FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput. , 2018, , .                                                                                    |     | 9         |
| 22 | Simultaneous and Speculative Thread Migration for Improving Energy Efficiency of Heterogeneous Core Architectures. IEEE Transactions on Computers, 2018, 67, 498-512.        | 2.4 | 4         |
| 23 | Dynamic Resizing on Active Warps Scheduler to Hide Operation Stalls on CPUs. IEEE Transactions on<br>Parallel and Distributed Systems, 2017, 28, 3142-3156.                  | 4.0 | 4         |
| 24 | Dynamic Load Balancing of Dispatch Scheduling for Solid State Disks. IEEE Transactions on Computers, 2017, 66, 1034-1047.                                                    | 2.4 | 7         |
| 25 | Access Pattern-Aware Cache Management for Improving Data Utilization in GPU. , 2017, , .                                                                                     |     | 44        |
| 26 | Improving Energy Efficiency of GPUs through Data Compression and Compressed Execution. IEEE Transactions on Computers, 2017, 66, 834-847.                                    | 2.4 | 16        |
| 27 | Characterizing convolutional neural network workloads on a detailed GPU simulator. , 2017, , .                                                                               |     | Ο         |
| 28 | Measuring error-tolerance in SRAM architecture on hardware accelerated neural network. , 2016, , .                                                                           |     | 7         |
| 29 | Warped-preexecution: A GPU pre-execution approach for improving latency hiding. , 2016, , .                                                                                  |     | 29        |
| 30 | Warped-Slicer: Efficient Intra-SM Slicing through Dynamic Resource Partitioning for GPU Multiprogramming. , 2016, , .                                                        |     | 42        |
| 31 | Fairness-aware thread scheduling for multithreaded program using Intel® Software Guarded Extensions. , 2016, , .                                                             |     | Ο         |
| 32 | Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit. , 2016, , .                                                                                 |     | 33        |
| 33 | Exploiting Thread-Level Parallelism on HEVC by Employing a Reference Dependency Graph. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26, 736-749.    | 5.6 | 2         |
| 34 | Parallel GPU Architecture Simulation Framework Exploiting Architectural-Level Parallelism with Timing Error Prediction. IEEE Transactions on Computers, 2016, 65, 1253-1265. | 2.4 | 4         |
| 35 | Server side, play buffer based quality control for adaptive media streaming. Multimedia Tools and Applications, 2016, 75, 5397-5415.                                         | 2.6 | 5         |
| 36 | APRES. Computer Architecture News, 2016, 44, 191-203.                                                                                                                        | 2.5 | 20        |

| #  | Article                                                                                                                                                                  | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | An accelerated separable median filter with sorting networks. , 2015, , .                                                                                                |     | 2         |
| 38 | Highly Secure Mobile Devices Assisted with Trusted Cloud Computing Environments. ETRI Journal, 2015, 37, 348-358.                                                        | 1.2 | 12        |
| 39 | Enhancing Software Dependability and Security with Hardware Supported Instruction Address Space Randomization. , 2015, , .                                               |     | 1         |
| 40 | A frequency scaling model for energy efficient DVFS designs based on circuit delay optimization. , 2015, , .                                                             |     | 2         |
| 41 | Contention-Free Fair Queuing for High-Speed Storage with RAID-0 Architecture. , 2015, , .                                                                                |     | 1         |
| 42 | True motion compensation with feature detection for frame rate up-conversion. , 2015, , .                                                                                |     | 4         |
| 43 | Network Variation and Fault Tolerant Performance Acceleration in Mobile Devices with Simultaneous Remote Execution. IEEE Transactions on Computers, 2015, 64, 2862-2874. | 2.4 | 1         |
| 44 | Integrity Protection for Big Data Processing with Dynamic Redundancy Computation. , 2015, , .                                                                            |     | 5         |
| 45 | Warped-compression. , 2015, , .                                                                                                                                          |     | 81        |
| 46 | Another Look at Secure Big Data Processing: Formal Framework and a Potential Approach. , 2015, , .                                                                       |     | 1         |
| 47 | A Performance-Energy Model to Evaluate Single Thread Execution Acceleration. IEEE Computer Architecture Letters, 2015, 14, 99-102.                                       | 1.0 | 4         |
| 48 | Hyper threading-aware Virtual Machine migration. , 2014, , .                                                                                                             |     | 1         |
| 49 | LUT based secure cloud computing — An implementation using FPGAs. , 2014, , .                                                                                            |     | 0         |
| 50 | A Malicious Pattern Detection Engine for Embedded Security Systems in the Internet of Things.<br>Sensors, 2014, 14, 24188-24211.                                         | 2.1 | 87        |
| 51 | Workload synthesis: Generating benchmark workloads from statistical execution profile. , 2014, , .                                                                       |     | 2         |
| 52 | Multicore speedup models using frequency scaling with fixed power budget. , 2014, , .                                                                                    |     | 1         |
| 53 | DPM: Data Partitioning Method for pipelined MapReduce on GPU. , 2014, , .                                                                                                |     | Ο         |
| 54 | Development of efficient VCPU pinning mechanism in Xen. , 2014, , .                                                                                                      |     | 1         |

| #  | Article                                                                                                                                                       | IF  | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | Accelerating HEVC transcoder by exploiting decoded quadtree. , 2014, , .                                                                                      |     | 8         |
| 56 | Maximizing DRAM performance using selective operating frequency boosting. , 2014, , .                                                                         |     | 0         |
| 57 | Architectural investigation of matrix data layout on multicore processors. Future Generation Computer Systems, 2014, 37, 64-75.                               | 4.9 | 3         |
| 58 | Boosting CUDA Applications with CPU–GPU Hybrid Computing. International Journal of Parallel<br>Programming, 2014, 42, 384-404.                                | 1.1 | 23        |
| 59 | Swarm Processor System: hardware process scheduler based energy efficient multi-core system. IEICE<br>Electronics Express, 2014, 11, 20140424-20140424.       | 0.3 | 0         |
| 60 | Accelerating gesture recognition algorithm using coarse grained reconfigurable architectures. , 2014, , .                                                     |     | 0         |
| 61 | Mark-Sharing: A Parallel Garbage Collection Algorithm for Low Synchronization Overhead. , 2013, , .                                                           |     | 0         |
| 62 | Parallel GPU architecture simulation framework exploiting work allocation unit parallelism. , 2013, , .                                                       |     | 17        |
| 63 | A Low-Cost Standard Mode MPI Hardware Unit for Embedded MPSoC. IEICE Transactions on Information and Systems, 2011, E94-D, 1497-1501.                         | 0.4 | 6         |
| 64 | A Novel Sequential Tree Algorithm Based on Scoreboard for MPI Broadcast Communication. IEICE Transactions on Information and Systems, 2011, E94-D, 2523-2527. | 0.4 | 1         |
| 65 | Hardware implementation of a tessellation accelerator for the OpenVG standard. IEICE Electronics Express, 2010, 7, 440-446.                                   | 0.3 | 6         |