## Wonyong Sung ## List of Publications by Year in Descending Order Source: https://exaly.com/author-pdf/10763472/wonyong-sung-publications-by-year.pdf Version: 2024-04-25 This document has been generated based on the publications and citations recorded by exaly.com. For the latest version of this publication list, visit the link given above. The third column is the impact factor (IF) of the journal, and the fourth column is the number of citations of the article. 88 1,265 18 33 g-index h-index citations papers 2.6 1,604 4.88 134 L-index avg, IF ext. papers ext. citations | # | Paper | IF | Citations | |----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----------| | 88 | Fixed-Point Optimization of Transformer Neural Network <b>2020</b> , | | 5 | | 87 | Workload-aware Automatic Parallelization for Multi-GPU DNN Training 2019, | | 1 | | 86 | Memorization Capacity of Deep Neural Networks under Parameter Quantization 2019, | | 3 | | 85 | Optimization of Number Representations <b>2019</b> , 1141-1171 | | | | 84 | Compression of Deep Neural Networks with Structured Sparse Ternary Coding. <i>Journal of Signal Processing Systems</i> , <b>2019</b> , 91, 1009-1019 | 1.4 | | | 83 | Fixed-point optimization of deep neural networks with adaptive step size retraining 2017, | | 10 | | 82 | Character-level language modeling with hierarchical recurrent neural networks 2017, | | 12 | | 81 | Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations <b>2017</b> , | | 7 | | 80 | High-throughput decoding of block turbo codes on graphics processing units 2017, | | 1 | | 79 | Dynamic hand gesture recognition for wearable devices with low complexity recurrent neural networks <b>2016</b> , | | 20 | | 78 | Learning separable fixed-point kernels for deep convolutional neural networks 2016, | | 1 | | 77 | Character-level incremental speech recognition with recurrent neural networks 2016, | | 13 | | 76 | FPGA based implementation of deep neural networks using on-chip memory only 2016, | | 35 | | 75 | FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks 2016, | | 44 | | 74 | Fixed point optimization of deep convolutional neural networks for object recognition 2015, | | 85 | | 73 | Single stream parallelization of generalized LSTM-like RNNs on a GPU 2015, | | 19 | | 7 <sup>2</sup> | Low Energy Signal Processing Techniques for Reliability Improvement of High-Density NAND Flash Memory. <i>Journal of Signal Processing Systems</i> , <b>2015</b> , 78, 63-71 | 1.4 | | | 71 | Fault tolerance analysis of digital feed-forward deep neural networks 2014, | | 11 | |----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----| | 70 | Fixed-point feedforward deep neural network design using weights +1, 0, and 🛭 2014, | | 106 | | 69 | Direct and indirect measurement of inter-cell capacitance in NAND flash memory 2014, | | 1 | | 68 | X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks <b>2014</b> , | | 16 | | 67 | Decision Directed Estimation of Threshold Voltage Distribution in NAND Flash Memory. <i>IEEE Transactions on Signal Processing</i> , <b>2014</b> , 62, 919-927 | 4.8 | 17 | | 66 | Area-Efficient Parallel Syndrome Generators for Linear Block Codes. <i>Journal of Signal Processing Systems</i> , <b>2014</b> , 77, 281-287 | 1.4 | | | 65 | . IEEE Transactions on Very Large Scale Integration (VLSI) Systems, <b>2014</b> , 22, 1004-1015 | 2.6 | 38 | | 64 | Least Squares Based Coupling Cancelation for MLC NAND Flash Memory with a Small Number of Voltage Sensing Operations. <i>Journal of Signal Processing Systems</i> , <b>2013</b> , 71, 189-200 | 1.4 | 9 | | 63 | Estimation of NAND Flash Memory Threshold Voltage Distribution for Optimum Soft-Decision Error Correction. <i>IEEE Transactions on Signal Processing</i> , <b>2013</b> , 61, 440-449 | 4.8 | 61 | | 62 | Signal processing techniques for reliability improvement of sub-20NM NAND flash memory <b>2013</b> , | | 1 | | 61 | Soft-Decision Error Correction of NAND Flash Memory with a Turbo Product Code. <i>Journal of Signal Processing Systems</i> , <b>2013</b> , 70, 235-247 | 1.4 | 2 | | 60 | GPU based implementation of recursive digital filtering algorithms 2013, | | 1 | | 59 | Load Balanced Resampling for Real-Time Particle Filtering on Graphics Processing Units. <i>IEEE Transactions on Signal Processing</i> , <b>2013</b> , 61, 411-419 | 4.8 | 16 | | 58 | Soft-decision decoding with cell to cell interference removed signal in nand flash memory 2013, | | 3 | | 57 | Optimization of Number Representations <b>2013</b> , 1303-1333 | | | | 56 | Flexible and Expandable Speech Recognition Hardware with Weighted Finite State Transducers. <i>Journal of Signal Processing Systems</i> , <b>2012</b> , 66, 235-244 | 1.4 | 4 | | 55 | Parallel Computation of Adaptive Filtering Algorithms on Multi-Core Systems. <i>Journal of Signal Processing Systems</i> , <b>2012</b> , 69, 253-265 | 1.4 | 1 | | 54 | Multi-user real-time speech recognition with a GPU <b>2012</b> , | | 2 | | 53 | Performance of rate 0.96 (68254, 65536) EG-LDPC code for NAND Flash memory error correction <b>2012</b> , | | 17 | |----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|----| | 52 | Least squares based cell-to-cell interference cancelation technique for multi-level cell nand flash memory <b>2012</b> , | | 8 | | 51 | A high-speed layered min-sum LDPC decoder for error correction of NAND Flash memories <b>2011</b> , | | 20 | | 50 | Reduced complexity Chase-Pyndiah decoding algorithm for turbo product codes <b>2011</b> , | | 3 | | 49 | Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition. <i>Journal of Signal Processing Systems</i> , <b>2011</b> , 63, 95-105 | 1.4 | 6 | | 48 | Memory Access Optimized Implementation of Cyclic and Quasi-Cyclic LDPC Codes on a GPGPU. <i>Journal of Signal Processing Systems</i> , <b>2011</b> , 64, 149-159 | 1.4 | 48 | | 47 | Memory access pattern-aware DRAM performance model for multi-core systems 2011, | | 11 | | 46 | H- and C-level WFST-based large vocabulary continuous speech recognition on Graphics Processing Units <b>2011</b> , | | 5 | | 45 | An FPGA implementation of speech recognition with weighted finite state transducers 2010, | | 12 | | 44 | Multi-core and SIMD architecture based implementation of recursive digital filtering algorithms <b>2010</b> , | | 1 | | 43 | Error performance and decoder hardware comparison between EG-LDPC and BCH codes 2010, | | 6 | | 42 | VLSI Implementation of BCH Error Correction for Multilevel Cell NAND Flash Memory. <i>IEEE Transactions on Very Large Scale Integration (VLSI) Systems</i> , <b>2010</b> , 18, 843-847 | 2.6 | 58 | | 41 | Adaptive Threshold Technique for Bit-Flipping Decoding of Low-Density Parity-Check Codes. <i>IEEE Communications Letters</i> , <b>2010</b> , 14, 857-859 | 3.8 | 18 | | 40 | COPR: a cost-oriented recycling policy for flash translation layer. <i>IEEE Transactions on Consumer Electronics</i> , <b>2010</b> , 56, 673-681 | 4.8 | 2 | | 39 | Block-interleaving based parallel CRC computation for multi-processor systems 2010, | | 4 | | 38 | VLSI Implementation of a High-Throughput Soft-Bit-Flipping Decoder for Geometric LDPC Codes. <i>IEEE Transactions on Circuits and Systems I: Regular Papers</i> , <b>2010</b> , 57, 1083-1094 | 3.9 | 8 | | 37 | A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access. <i>IEEE Transactions on Circuits and Systems I: Regular Papers</i> , <b>2010</b> , 57, 2119-2131 | 3.9 | 20 | | 36 | Optimization of Number Representations <b>2010</b> , 707-738 | | | ## (2001-2009) | 35 | SIMD processor based implementation of recursive filtering equations 2009, | | 3 | |----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|----| | 34 | VLSI implementation of a soft bit-flipping decoder for PG-LDPC codes <b>2009</b> , | | 1 | | 33 | Scalable HMM based inference engine in large vocabulary continuous speech recognition 2009, | | 3 | | 32 | Massively parallel implementation of cyclic LDPC codes on a general purpose graphics processing unit <b>2009</b> , | | 8 | | 31 | VLSI for 5000-word continuous speech recognition <b>2009</b> , | | 11 | | 30 | OpenMP-based parallel implementation of a continuous speech recognizer on a multi-core system <b>2009</b> , | | 8 | | 29 | Low-power implementation of a high-throughput LDPC decoder for IEEE 802.11N standard 2009, | | 5 | | 28 | Efficient Software-Based Encoding and Decoding of BCH Codes. <i>IEEE Transactions on Computers</i> , <b>2009</b> , 58, 878-889 | 2.5 | 22 | | 27 | Access-Pattern-Aware On-Chip Memory Allocation for SIMD Processors. <i>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</i> , <b>2009</b> , 28, 158-163 | 2.5 | 3 | | 26 | Algorithm and Software Optimization of Variable Block Size Motion Estimation for H.264/AVC on a VLIWBIMD DSP. <i>Journal of Signal Processing Systems</i> , <b>2008</b> , 51, 289-302 | 1.4 | 2 | | 25 | Mobile CPU Based Optimization of Fast Likelihood Computation for Continuous Speech Recognition <b>2007</b> , | | 2 | | 24 | Fast Block Mode Decision for H.264/AVC on a Programmable Digital Signal Processor. <i>Signal Processing Systems Design and Implementation (siPS), IEEE Workshop on</i> , <b>2007</b> , | | 1 | | 23 | Architectural Design and Implementation of an FPGA Softcore Based Speech Recognition System <b>2006</b> , | | 3 | | 22 | Performance Evaluation of an SIMD Architecture with a Multi-bank Vector Memory Unit. Signal Processing Systems Design and Implementation (siPS), IEEE Workshop on, 2006, | | 5 | | 21 | Low-Power High-Throughput BCH Error Correction VLSI Design for Multi-Level Cell NAND Flash Memories <b>2006</b> , | | 82 | | 20 | VLSI Implementation of An Adaptive Equalizer for ATSC Digital TV Receivers. <i>Journal of Signal Processing Systems</i> , <b>2005</b> , 40, 301-310 | | 2 | | 19 | A Compiler-Friendly RISC-Based Digital Signal Processor Synthesis and Performance Evaluation. <i>Journal of Signal Processing Systems</i> , <b>2001</b> , 27, 297-312 | | 2 | | 18 | Combined word-length optimization and high-level synthesis of digital signal processing systems. <i>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</i> , <b>2001</b> , 20, 921-930 | 2.5 | 83 | | 17 | AUTOSCALER for C: an optimizing floating-point to integer C program converter for fixed-point digital signal processors. <i>IEEE Transactions on Circuits and Systems Part 2: Express Briefs</i> , <b>2000</b> , 47, 840-8 | 48 | 40 | |----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-----| | 16 | A floating-point to integer C converter with shift reduction for fixed-point digital signal processors <b>1999</b> , | | 2 | | 15 | . IEEE Transactions on Circuits and Systems for Video Technology, 1998, 8, 935-940 | 6.4 | 14 | | 14 | Simulation-based word-length optimization method for fixed-point digital signal processing systems. <i>IEEE Transactions on Signal Processing</i> , <b>1995</b> , 43, 3087-3090 | 4.8 | 111 | | 13 | Fixed-point C compiler for TMS320C50 digital signal processor | | 4 | | 12 | Optimized timed hardware software cosimulation without roll-back | | 3 | | 11 | A hardware software cosimulation backplane with automatic interface generation | | 1 | | 10 | An efficient compiled simulation system for VLIW code verification | | 1 | | 9 | H.264 decoder optimization exploiting SIMD instructions | | 3 | | 8 | Memory access overhead reduction for a digital color copier implementation using a VLIW digital signal processor | | 1 | | 7 | Software optimization of MPEG audio layer-III for a 32 bit RISC processor | | 3 | | 6 | A voice activity detector employing soft decision based noise spectrum adaptation | | 13 | | 5 | VLSI design of a CORDIC-based derotator | | 4 | | 4 | Fixed-point optimization utility for C and C++ based digital signal processing programs | | 9 | | 3 | Optimum wordlength determination of 8/spl times/8 IDCT architectures conforming to the IEEE standard specifications | | 2 | | 2 | Finite wordlength effects analysis and wordlength optimization of a multiplier-adder based 8/spl<br>times/8 2D-IDCT architecture | | 1 | | 1 | A CORDIC-based digital quadrature mixer: comparison with a ROM-based architecture | | 5 |