## Wonyong Sung

List of Publications by Year in descending order

Source: https://exaly.com/author-pdf/10763472/publications.pdf Version: 2024-02-01



| #  | Article                                                                                                                                                                                                       | IF  | CITATIONS |
|----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 1  | Fixed-point feedforward deep neural network design using weights +1, 0, and −1. , 2014, , .                                                                                                                   |     | 162       |
| 2  | Simulation-based word-length optimization method for fixed-point digital signal processing systems.<br>IEEE Transactions on Signal Processing, 1995, 43, 3087-3090.                                           | 3.2 | 159       |
| 3  | Fixed point optimization of deep convolutional neural networks for object recognition. , 2015, , .                                                                                                            |     | 139       |
| 4  | Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash<br>Memories. , 2006, , .                                                                                           |     | 121       |
| 5  | Combined word-length optimization and high-level synthesis of digital signal processing systems. IEEE<br>Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2001, 20, 921-930.         | 1.9 | 117       |
| 6  | Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU. Journal of Signal Processing Systems, 2011, 64, 149-159.                                                             | 1.4 | 79        |
| 7  | Estimation of NAND Flash Memory Threshold Voltage Distribution for Optimum Soft-Decision Error Correction. IEEE Transactions on Signal Processing, 2013, 61, 440-449.                                         | 3.2 | 76        |
| 8  | VLSI Implementation of BCH Error Correction for Multilevel Cell NAND Flash Memory. IEEE<br>Transactions on Very Large Scale Integration (VLSI) Systems, 2010, 18, 843-847.                                    | 2.1 | 74        |
| 9  | A voice activity detector employing soft decision based noise spectrum adaptation. , 0, , .                                                                                                                   |     | 62        |
| 10 | AUTOSCALER for C: an optimizing floating-point to integer C program converter for fixed-point digital signal processors. IEEE Transactions on Circuits and Systems Part 2: Express Briefs, 2000, 47, 840-848. | 2.3 | 62        |
| 11 | FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks. , 2016, , .                                                                                                                           |     | 55        |
| 12 | Rate-0.96 LDPC Decoding VLSI for Soft-Decision Error Correction of NAND Flash Memory. IEEE<br>Transactions on Very Large Scale Integration (VLSI) Systems, 2014, 22, 1004-1015.                               | 2.1 | 50        |
| 13 | FPGA based implementation of deep neural networks using on-chip memory only. , 2016, , .                                                                                                                      |     | 50        |
| 14 | Dynamic hand gesture recognition for wearable devices with low complexity recurrent neural networks. , 2016, , .                                                                                              |     | 39        |
| 15 | Efficient Software-Based Encoding and Decoding of BCH Codes. IEEE Transactions on Computers, 2009, 58, 878-889.                                                                                               | 2.4 | 36        |
| 16 | Character-level incremental speech recognition with recurrent neural networks. , 2016, , .                                                                                                                    |     | 32        |
| 17 | A high-speed layered min-sum LDPC decoder for error correction of NAND Flash memories. , 2011, , .                                                                                                            |     | 28        |
| 18 | A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access. IEEE Transactions on Circuits and Systems I: Regular Papers, 2010, 57, 2119-2131.                                            | 3.5 | 26        |

| #  | Article                                                                                                                                                                 | IF  | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 19 | Single stream parallelization of generalized LSTM-like RNNs on a GPU. , 2015, , .                                                                                       |     | 26        |
| 20 | Decision Directed Estimation of Threshold Voltage Distribution in NAND Flash Memory. IEEE<br>Transactions on Signal Processing, 2014, 62, 919-927.                      | 3.2 | 25        |
| 21 | Fixed-point optimization utility for C and C++ based digital signal processing programs. , 0, , .                                                                       |     | 24        |
| 22 | Adaptive Threshold Technique for Bit-Flipping Decoding of Low-Density Parity-Check Codes. IEEE<br>Communications Letters, 2010, 14, 857-859.                            | 2.5 | 24        |
| 23 | X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks. , 2014, , .                                                                           |     | 23        |
| 24 | Fixed-point error analysis and word length optimization of 8×8 IDCT architectures. IEEE Transactions<br>on Circuits and Systems for Video Technology, 1998, 8, 935-940. | 5.6 | 22        |
| 25 | Performance of rate 0.96 (68254, 65536) EG-LDPC code for NAND Flash memory error correction. , 2012, , .                                                                |     | 22        |
| 26 | Load Balanced Resampling for Real-Time Particle Filtering on Graphics Processing Units. IEEE Transactions on Signal Processing, 2013, 61, 411-419.                      | 3.2 | 22        |
| 27 | An FPGA implementation of speech recognition with weighted finite state transducers. , 2010, , .                                                                        |     | 21        |
| 28 | Character-level language modeling with hierarchical recurrent neural networks. , 2017, , .                                                                              |     | 21        |
| 29 | Memory access pattern-aware DRAM performance model for multi-core systems. , 2011, , .                                                                                  |     | 19        |
| 30 | Fixed-point optimization of deep neural networks with adaptive step size retraining. , 2017, , .                                                                        |     | 18        |
| 31 | Fault tolerance analysis of digital feed-forward deep neural networks. , 2014, , .                                                                                      |     | 16        |
| 32 | OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system. , 2009, , .                                                              |     | 15        |
| 33 | H.264 decoder optimization exploiting SIMD instructions. , 0, , .                                                                                                       |     | 14        |
| 34 | VLSI design of a CORDIC-based derotator. , 0, , .                                                                                                                       |     | 13        |
| 35 | Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit. , 2009, , .                                                       |     | 12        |
| 36 | VLSI for 5000-word continuous speech recognition. , 2009, , .                                                                                                           |     | 12        |

| #  | Article                                                                                                                                                                        | IF  | CITATIONS |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 37 | A floating-point to integer C converter with shift reduction for fixed-point digital signal processors. , 1999, , .                                                            |     | 11        |
| 38 | Least Squares Based Coupling Cancelation for MLC NAND Flash Memory with a Small Number of Voltage Sensing Operations. Journal of Signal Processing Systems, 2013, 71, 189-200. | 1.4 | 11        |
| 39 | Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations. , 2017, , .                                                            |     | 11        |
| 40 | Fixed-Point Optimization of Transformer Neural Network. , 2020, , .                                                                                                            |     | 10        |
| 41 | An FPGA based SIMD processor with a vector memory unit. , 0, , .                                                                                                               |     | 8         |
| 42 | Scalable HMM based inference engine in large vocabulary continuous speech recognition. , 2009, , .                                                                             |     | 8         |
| 43 | VLSI Implementation of a High-Throughput Soft-Bit-Flipping Decoder for Geometric LDPC Codes. IEEE Transactions on Circuits and Systems I: Regular Papers, 2010, 57, 1083-1094. | 3.5 | 8         |
| 44 | Least squares based cell-to-cell interference cancelation technique for multi-level cell nand flash memory. , 2012, , .                                                        |     | 8         |
| 45 | A CORDIC-based digital quadrature mixer: comparison with a ROM-based architecture. , 0, , .                                                                                    |     | 7         |
| 46 | SIMD processor based implementation of recursive filtering equations. , 2009, , .                                                                                              |     | 7         |
| 47 | Block-interleaving based parallel CRC computation for multi-processor systems. , 2010, , .                                                                                     |     | 7         |
| 48 | Reduced complexity Chase-Pyndiah decoding algorithm for turbo product codes. , 2011, , .                                                                                       |     | 7         |
| 49 | Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition. Journal of Signal Processing Systems, 2011, 63, 95-105.                                              | 1.4 | 7         |
| 50 | Memorization Capacity of Deep Neural Networks under Parameter Quantization. , 2019, , .                                                                                        |     | 7         |
| 51 | A parser processor for MPEG-2 audio and AC-3 decoding. , 0, , .                                                                                                                |     | 6         |
| 52 | Software optimization of MPEG audio layer-III for a 32 bit RISC processor. , 0, , .                                                                                            |     | 6         |
| 53 | Architectural Design and Implementation of an FPGA Softcore Based Speech Recognition System. , 2006, , .                                                                       |     | 6         |
| 54 | Low-power implementation of a high-throughput LDPC decoder for IEEE 802.11N standard. , 2009, , .                                                                              |     | 6         |

| #  | Article                                                                                                                                                                      | IF  | CITATIONS |
|----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 55 | Error performance and decoder hardware comparison between EG-LDPC and BCH codes. , 2010, , .                                                                                 |     | 6         |
| 56 | Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers.<br>Journal of Signal Processing Systems, 2012, 66, 235-244.                      | 1.4 | 6         |
| 57 | Optimum wordlength determination of $8 \tilde{A} - 8$ IDCT architectures conforming to the IEEE standard specifications. , 0, , .                                            |     | 5         |
| 58 | Fixed-point C compiler for TMS320C50 digital signal processor. , 0, , .                                                                                                      |     | 5         |
| 59 | VLSI Implementation of An Adaptive Equalizer for ATSC Digital TV Receivers. Journal of Signal Processing Systems, 2005, 40, 301-310.                                         | 1.0 | 5         |
| 60 | Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit. Signal Processing Systems Design and Implementation (siPS), IEEE Workshop on, 2006, , . | 0.0 | 5         |
| 61 | Multi-core and SIMD architecture based implementation of recursive digital filtering algorithms. , 2010, , .                                                                 |     | 5         |
| 62 | H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units. , 2011, , .                                                           |     | 5         |
| 63 | A fast direction sequence generation method for CORDIC processors. , 0, , .                                                                                                  |     | 4         |
| 64 | An efficient Reed-Solomon decoder VLSI with erasure correction. , 0, , .                                                                                                     |     | 4         |
| 65 | Implementation of an H.264 motion estimation algorithm on a VLIW programmable digital signal processor. , 0, , .                                                             |     | 4         |
| 66 | Access-Pattern-Aware On-Chip Memory Allocation for SIMD Processors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2009, 28, 158-163.        | 1.9 | 4         |
| 67 | Multi-user real-time speech recognition with a GPU. , 2012, , .                                                                                                              |     | 4         |
| 68 | Optimized timed hardware software cosimulation without roll-back. , 0, , .                                                                                                   |     | 3         |
| 69 | A hardware software cosimulation backplane with automatic interface generation. , 0, , .                                                                                     |     | 3         |
| 70 | A Compiler-Friendly RISC-Based Digital Signal Processor Synthesis and Performance Evaluation.<br>Journal of Signal Processing Systems, 2001, 27, 297-312.                    | 1.0 | 3         |
| 71 | Speaking partner: an ARM7-based multimedia handheld device. , 0, , .                                                                                                         |     | 3         |
| 72 | Optimization of power consumption for an ARM7-based multimedia handheld device. , 0, , .                                                                                     |     | 3         |

| #  | Article                                                                                                                                                                 | IF  | CITATIONS |
|----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 73 | Mobile CPU Based Optimization of Fast Likelihood Computation for Continuous Speech Recognition. , 2007, , .                                                             |     | 3         |
| 74 | Parallel implementation of an error diffusion halftoning algorithm with a general purpose graphics processing unit. , 2010, , .                                         |     | 3         |
| 75 | Optimal Output Quantization of Binary Input AWGN Channel for Belief-Propagation Decoding of LDPC Codes. , 2012, , .                                                     |     | 3         |
| 76 | Performance analysis of multi-bank DRAM with increased clock frequency. , 2012, , .                                                                                     |     | 3         |
| 77 | Soft-Decision Error Correction of NAND Flash Memory with a Turbo Product Code. Journal of Signal Processing Systems, 2013, 70, 235-247.                                 | 1.4 | 3         |
| 78 | Soft-decision decoding with cell to cell interference removed signal in nand flash memory. , 2013, , .                                                                  |     | 3         |
| 79 | Low-Latency Lightweight Streaming Speech Recognition with 8-Bit Quantized Simple Gated Convolutional Neural Networks. , 2020, , .                                       |     | 3         |
| 80 | Fixed-point C language for digital signal processing. , 0, , .                                                                                                          |     | 2         |
| 81 | Finite wordlength effects analysis and wordlength optimization of a multiplier-adder based 8×8<br>2D-IDCT architecture. , 0, , .                                        |     | 2         |
| 82 | An efficient compiled simulation system for VLIW code verification. , 0, , .                                                                                            |     | 2         |
| 83 | A 2 way VLIW processor architecture for embedded multimedia applications. , 0, , .                                                                                      |     | 2         |
| 84 | A low resolution pulse position coding method for improved excitation modeling of speech transition. , 1999, , .                                                        |     | 2         |
| 85 | Implementation of speech recognition algorithm for a 32-bit CPU-based portable device. , 0, , .                                                                         |     | 2         |
| 86 | Implementation of a digital copier using TMS320C6414 VLIW DSP processor. , 0, , .                                                                                       |     | 2         |
| 87 | VLIW SIMD architecture based implementation of a multi-level dot diffusion algorithm. , 0, , .                                                                          |     | 2         |
| 88 | Algorithm and Software Optimization of Variable Block Size Motion Estimation for H.264/AVC on a VLIW–SIMD DSP. Journal of Signal Processing Systems, 2008, 51, 289-302. | 1.4 | 2         |
| 89 | VLSI implementation of a soft bit-flipping decoder for PG-LDPC codes. , 2009, , .                                                                                       |     | 2         |
| 90 | COPR: a cost-oriented recycling policy for flash translation layer. IEEE Transactions on Consumer Electronics, 2010, 56, 673-681.                                       | 3.0 | 2         |

1

| #   | Article                                                                                                                                                                    | IF  | CITATIONS |
|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 91  | GPU based implementation of recursive digital filtering algorithms. , 2013, , .                                                                                            |     | 2         |
| 92  | Evaluation of block turbo codes for long-haul optical networks. , 2016, , .                                                                                                |     | 2         |
| 93  | High-throughput decoding of block turbo codes on graphics processing units. , 2017, , .                                                                                    |     | 2         |
| 94  | Quantization effects on the acquisition performance of direct-sequence spread-spectrum CDMA. , 0, , .                                                                      |     | 1         |
| 95  | Wordlength optimization of an MPEG-2 audio decoder. , 0, , .                                                                                                               |     | 1         |
| 96  | Fixed-point error analysis and wordlength optimization of a distributed arithmetic based 8×8 2D-IDCT architecture. , 0, , .                                                |     | 1         |
| 97  | Finite wordlength effects analysis and wordlength optimization of Dolby digital audio decoder. , 0, , .                                                                    |     | 1         |
| 98  | Variable dimensional algebraic CELP coding of prototype waveforms. , 0, , .                                                                                                |     | 1         |
| 99  | A multi-level block priority based instruction caching scheme for multimedia processors. , 0, , .                                                                          |     | 1         |
| 100 | VLSI implementation of an adaptive equalizer for ATSC digital TV receivers. , 0, , .                                                                                       |     | 1         |
| 101 | Implementation of a digital color copier using a VLIW SIMD architecture. , 0, , .                                                                                          |     | 1         |
| 102 | Memory Access Overhead Reduction for a Digital Color Copier Implementation Using a Vliw Digital Signal Processor. , 0, , .                                                 |     | 1         |
| 103 | Performance Optimization of a Multimedia Player on a Mobile CPU Platform. Signal Processing Systems Design and Implementation (siPS), IEEE Workshop on, 2007, , .          | 0.0 | 1         |
| 104 | Fast Block Mode Decision for H.264/AVC on a Programmable Digital Signal Processor. Signal Processing Systems Design and Implementation (siPS), IEEE Workshop on, 2007, , . | 0.0 | 1         |
| 105 | Software implementation of Chien search process for strong BCH codes. , 2008, , .                                                                                          |     | 1         |
| 106 | Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems. Journal of Signal<br>Processing Systems, 2012, 69, 253-265.                                   | 1.4 | 1         |
| 107 | Signal processing techniques for reliability improvement of sub-20NM NAND flash memory. , 2013, , .                                                                        |     | 1         |
|     |                                                                                                                                                                            |     |           |

Direct and indirect measurement of inter-cell capacitance in NAND flash memory. , 2014, , .

7

| #   | Article                                                                                                                                                          | IF  | CITATIONS |
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 109 | Learning separable fixed-point kernels for deep convolutional neural networks. , 2016, , .                                                                       |     | 1         |
| 110 | Workload-aware Automatic Parallelization for Multi-GPU DNN Training. , 2019, , .                                                                                 |     | 1         |
| 111 | Exploration of On-device End-to-End Acoustic Modeling with Neural Networks. , 2019, , .                                                                          |     | 1         |
| 112 | Feedback-directed memory disambiguation for embedded multimedia VLIW computing. , 0, , .                                                                         |     | 0         |
| 113 | An integrated hardware-software cosimulation environment for heterogeneous systems prototyping. , 0, , .                                                         |     | 0         |
| 114 | A full digital self-timed clock generation scheme. , 0, , .                                                                                                      |     | 0         |
| 115 | Adaptive threshold error diffusion technique for color inkjet printing. , 0, , .                                                                                 |     | 0         |
| 116 | Multiprocessor scheduling of a signal flow graph for workstation clusters. , 0, , .                                                                              |     | 0         |
| 117 | A block priority based instruction caching scheme for multimedia processors. , 0, , .                                                                            |     | 0         |
| 118 | A codebook shaping method for perceptual quality improvement of CELP coders. , 0, , .                                                                            |     | 0         |
| 119 | Memory Access Reduced Software Implementation of H.264/AVC Sub-pixel Motion Estimation Using Differential Data Encoding. , 2007, , .                             |     | Ο         |
| 120 | Parallel computation of adaptive lattice filters. , 2011, , .                                                                                                    |     | 0         |
| 121 | Accelerating tetrahedral interpolation with data-level and Thread-Level Parallel optimization. , 2011, , .                                                       |     | Ο         |
| 122 | A simulation-based study for DRAM power reduction strategies in GPGPUs. , 2012, , .                                                                              |     | 0         |
| 123 | Optimum quantization for signal processing and error correction in NAND flash memory. , 2013, , .                                                                |     | 0         |
| 124 | DRAM access reduction in GPUs by thread-block scheduling for overlapped data reuse. , 2013, , .                                                                  |     | 0         |
| 125 | Area-Efficient Parallel Syndrome Generators for Linear Block Codes. Journal of Signal Processing Systems, 2014, 77, 281-287.                                     | 1.4 | Ο         |
| 126 | Low Energy Signal Processing Techniques for Reliability Improvement of High-Density NAND Flash<br>Memory. Journal of Signal Processing Systems, 2015, 78, 63-71. | 1.4 | 0         |

| #   | Article                                                                                                                               | IF  | CITATIONS |
|-----|---------------------------------------------------------------------------------------------------------------------------------------|-----|-----------|
| 127 | Architecture exploration of a programmable neural network processor for embedded systems. , 2016, ,                                   |     | Ο         |
| 128 | On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns. , 2018, , .                                                    |     | 0         |
| 129 | Compression of Deep Neural Networks with Structured Sparse Ternary Coding. Journal of Signal Processing Systems, 2019, 91, 1009-1019. | 1.4 | Ο         |
| 130 | Optimization of Number Representations. , 2013, , 1303-1333.                                                                          |     | 0         |
| 131 | Optimization of Number Representations. , 2019, , 1141-1171.                                                                          |     | 0         |