Repetition suppression. Details on my inverse scaling prize submission
In this post I provide some details on my submission to the inverse scaling prize, a contest focusing on finding important tasks where larger language models...
In this post I provide some details on my submission to the inverse scaling prize, a contest focusing on finding important tasks where larger language models...
Assume you’d like to train a gpt2-small-sized model (117m parameters). What is the optimal training set size? I’ll try to estimate that number following Trai...
TLDR: Naively applying RL to aligning language models (LMs) results in distribution collapse: turning an LM into a degenerate distribution putting all probab...
This blog post is a bunch of unstructured notes to my future self on setting up a virtual GPU cluster for machine learning research (i.e. running experiments...
The goal of this blogpost is to present a concise implementation of the Gaussian Mixture Model (GMM) using einsum notation. Along the way, I will also descri...
Helmholtz machines are the predecessors of variational autoencoders (VAEs). They were first proposed by Dayan et al. in 1995 as a probabilistic model of patt...
In this blog post, I show how to implement triplet loss and quadruplet loss in PyTorch via tensor masking. The idea of triplet loss is to learn meaningful re...
Attention mechanisms revolutionized machine learning in applications ranging from NLP through computer vision to reinforcement learning. Attention is the key...
While vanilla linear regression predicts a maximum likelihood estimate of the target variable, Bayesian linear regression predicts a whole distribution over ...
Consider the problem of parsing an arithmetic expression, such as 4*(1+6)/3, into a binary expression tree. The problem would be quite easy with postfix nota...
The relation between syntax (how words are structured in a sentence) and semantics (how words contribute to the meaning of a sentence) is a long-standing ope...
In this blog post, I sketch out a summary of the NeurIPS 2019 conference as I experienced it. Obviously, the motifs I highlight are specific to my somewhat u...
What does it mean for a message to mean? In this blog post, I provide an accessible introduction to one formal framework developed for addressing this questi...