Software & Datasets

Here is a list of open-source code (& datasets) related to my projects in statistics, machine learning, and natural language processing over the years.

All public repositories can also be found on my GitHub page.

Python Packages on PyPI

comparecast: Package for comparing sequential forecasters using confidence sequences & e-processes. Code accompanying our OR'23 paper.
word2word: Easy-to-use word-to-word translations for 3,564 language pairs. Code accompanying our LREC'20 paper.

Selected Repositories on GitHub

CombiningEvidenceAcrossFiltrations: Python code accompanying our preprint on combining evidence across filtrations.
ComparingAbstainingClassifiers: Python code from our NeurIPS'23 paper.
irm-empirical-study: Python code and data from our ICML'20 workshop paper.
KorNLUDatasets: Korean NLI and STS datasets from our EMNLP-F'20 paper.
helo_word: Python code for grammatical error correction (GEC), accompanying our BEA'19 workshop paper.
deep_learning: An (old) Python implementation of standard deep neural network architectures and learning algorithms from scratch, as part of a deep learning course at CMU.