YJ Choe

Research Scientist at Kakao
PhD Student (on leave) at Carnegie Mellon University

Discovery of Natural Language Concepts in Individual Units [OpenReview]

Seil Na, Yo Joong Choe, Dong-Hyun Lee, and Gunhee Kim
Accepted to the International Conference on Learning Representations (ICLR) 2019

Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such intriguing phenomenon, we propose a concept alignment method based on how units respond to replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language.

Local White Matter Architecture Defines Functional Brain Dynamics
[arXiv] [Slides]

Yo Joong Choe, Sivaraman Balakrishnan, Aarti Singh, Jean M. Vettel, and Timothy Verstynen
In Proceedings of the IEEE Conference on Systems, Man, and Cybernatics (SMC) 2018
Franklin V. Taylor Memorial Award


Large bundles of myelinated axons, called white matter, anatomically connect disparate brain regions together and compose the structural core of the human connectome. We recently proposed a method of measuring the local integrity along the length of each white matter fascicle, termed the local connectome. If communication efficiency is fundamentally constrained by the integrity along the entire length of a white matter bundle, then variability in the functional dynamics of brain networks should be associated with variability in the local connectome. We test this prediction using two statistical approaches that are capable of handling the high dimensionality of data. First, by performing statistical inference on distance-based correlations, we show that similarity in the local connectome between individuals is significantly correlated with similarity in their patterns of functional connectivity. Second, by employing variable selection using sparse canonical correlation analysis and cross-validation, we show that segments of the local connectome are predictive of certain patterns of functional brain dynamics. These results are consistent with the hypothesis that structural variability along axon bundles constrains communication between disparate brain regions.

A Statistical Analysis of Neural Networks [PDF]

Final project for 10/36-702 Statistical Machine Learning

I wrote a brief review on known minimax rates and generalization error bounds for feedforward neural networks with nonlinear activation functions. The results suggest that (1) two-layer neural networks can avoid the curse of dimensionality and that (2) they are adaptive to an underlying sparse structure—if it exists. However, it is unclear whether these results generalize to deep neural networks.

Sparse Additive Models with Shape Constraints [PDF] [Slides] [Code]

Advised by John Lafferty
Joint work with Sabyasachi Chatterjee and Min Xu
As part of Chicago Theory Center CS REU Summer 2014


We studied a new type of high-dimensional regression model that fits an additive model where each component is either convex, concave, or identically zero. This has led to a challenging and fascinating problem we call “convexity pattern selection,” which is to infer the correct sparsity and convexity pattern of \(p\) variables, among the \(3^p\) possible patterns. Other shape constraints such as monotonicity can be used. These models extend the idea of sparse additive models (Ravikumar et al. 2009).

Deep Learning and Socioeconomic Inference [Blog]

Advised by James Evans
Joint work with Nathaniel Sauder and Zhongtian Dai
Knowledge Lab, Computation Institute, University of Chicago


Sociologists design and conduct extensive surveys to study factors behind high crime rates or low income levels in certain neighborhoods. Aiming to build an effective alternative to these costly and time-consuming methods, we studied data-driven methods that model the latent factors using neighborhood-level Google Street View images.

We implemented a prediction model using the ImageNet-pretrained features of Caffe, an efficient convolutional neural network (CNN) implementation for image classification (Jia et al. 2014). We also collected survey data using the Amazon Mechanical Turk service where we asked people to compare the perceived safety and affluence given two images.