Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Deep Learning of Immune Cell Differentiation

Published in Proceedings of the National Academy of Sciences of the United States of America, 2020

Although we know many sequence-specific transcription factors (TFs), how the DNA sequence of cis-regulatory elements is decoded and orchestrated on the genome scale to determine immune cell differentiation is beyond our grasp. Leveraging a granular atlas of chromatin accessibility across 81 immune cell types, we asked if a convolutional neural network (CNN) could learn to infer cell type-specific chromatin accessibility solely from regulatory DNA sequences. With a tailored architecture and an ensemble approach to CNN parameter interpretation, we show that our trained network (“AI-TAC”) does so by rediscovering ab initio the binding motifs for known regulators and some unknown ones. Motifs whose importance is learned virtually as functionally important overlap strikingly well with positions determined by chromatin immunoprecipitation for several TFs. AI-TAC establishes a hierarchy of TFs and their interactions that drives lineage specification and also identifies stage-specific interactions, like Pax5/Ebf1 vs. Pax5/Prdm1, or the role of different NF-κB dimers in different cell types. AI-TAC assigns Spi1/Cebp and Pax5/Ebf1 as the drivers necessary for myeloid and B lineage fates, respectively, but no factors seemed as dominantly required for T cell differentiation, which may represent a fall-back pathway. Mouse-trained AI-TAC can parse human DNA, revealing a strikingly similar ranking of influential TFs and providing additional support that AI-TAC is a generalizable regulatory sequence decoder. Thus, deep learning can reveal the regulatory syntax predictive of the full differentiative complexity of the immune system.

Recommended citation: Maslova, A; Ramirez, R; Ma, K; Schmutz, H; Wang, C; Fox, C; Ng, B; Benoist, C; Mostafavi, S; The Immunological Genome Project. Deep Learning of Immune Cell Differentiation. Proceedings of the National Academy of Sciences of the United States of America, 2020
Download Paper

Study of the Edge of Stability in Deep Learning

Published in Master's Thesis, 2023

Optimization of deep neural networks has been studied extensively, but our understanding of it is still very limited. In particular, it is still unclear how to optimally set hyperparameters such as the step size for gradient descent when training neural networks. We explore the issue of tuning the step size for full batch gradient descent, examining a proposed phenomenon in the literature called the “edge of stability”. This refers to a phase of training for neural networks when the training loss non-monotonically decreases over time while the sharpness (the maximum eigenvalue of the Hessian matrix) oscillates around the threshold of precisely two divided by the step size. In this thesis, we start by providing the necessary background to understand the current state of the art in this area. We then perform various experiments on the sharpness, and study its trajectory over the course of training with full batch gradient descent. Unlike the vast majority of the previous literature, we focus on investigating the sharpness with respect to the various layers of neural networks individually. We observe that different layers of a neural network have differing behavior in their sharpness trajectories. We focus on fully connected networks and examine how varying hyperparameters of the network, such as the number of layers or hidden units per layer, impact the sharpness. Finally, we explore how various architecture choices impact step size selection when training with gradient descent. We see that changing just one parameter of a deep neural network, such as the non-linear activation functions used in each layer, can greatly impact which step sizes can lead to divergence during training. This motivates further investigation into how architecture choices affect hyperparameter selection.

Recommended citation: Fox, C. A Study of the Edge of Stability in Deep Learning. Master's Thesis, 2023
Download Paper

Upper and lower memory capacity bounds of transformers for next-token prediction

Published in Arxiv, 2024

Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are equal up to a multiplicative constant. We prove these bounds in the general setting where next-token distributions can be arbitrary as well as the empirical setting where they are calculated from a finite number of document sequences. Our lower bounds are for one-layer transformers and our proofs highlight an important injectivity property satisfied by self-attention. Furthermore, we provide numerical evidence that the minimal number of parameters for memorization is sufficient for being able to train the model to the entropy lower bound.

Recommended citation: Madden, L; Fox, C; Thrampoulidis, C. Upper and lower memory capacity bounds of transformers for next-token prediction. arXiv preprint arXiv:2405.13718, 2024
Download Paper

talks

Talk 1 on Relevant Topic in Your Field

Published: March 01, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

Tutorial 1 on Relevant Topic in Your Field

Published: March 01, 2013

More information here

Talk 2 on Relevant Topic in Your Field

Published: February 01, 2014

More information here

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

teaching

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2015

CPSC 110 - Computation, Programs, and Programming

Fundamental program and computation structures. Introductory programming skills. Computation as a tool for information processing, simulation and modelling, and interacting with the world.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2016

CPSC 110 - Computation, Programs, and Programming

Fundamental program and computation structures. Introductory programming skills. Computation as a tool for information processing, simulation and modelling, and interacting with the world.

Teaching Assistant

Undergraduate Course, Department of Statistics, University of British Columbia, 2016

STAT 200 - Elementary Statistics for Applications

Classical, nonparametric, and robust inferences about means, variances, and analysis of variance, using computers. Emphasis on problem formulation, assumptions, and interpretation.

Teaching Assistant

Undergraduate Course, Department of Statistics, University of British Columbia, 2016

STAT 302 - Introduction to Probability

Basic notions of probability, random variables, expectation and conditional expectation, limit theorems.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2017

CPSC 213 - Introduction to Computer Systems

Software architecture, operating systems, and I/O architectures. Relationships between application software, operating systems, and computing hardware; critical sections, deadlock avoidance, and performance; principles and operation of disks and networks.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2017

CPSC 221 - Basic Algorithms and Data Structures

Design and analysis of basic algorithms and data structures; algorithm analysis methods, searching and sorting algorithms, basic data structures, graphs and concurrency.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2018

CPSC 421 - Introduction to Theory of Computing

Characterizations of computability (using machines, languages and functions). Universality, equivalence and Church’s thesis. Unsolvable problems. Restricted models of computation. Finite automata, grammars and formal languages.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2019

CPSC 406 - Computational Optimization

Formulation and analysis of algorithms for continuous and discrete optimization problems; linear, nonlinear, network, dynamic, and integer optimization; large-scale problems; software packages and their implementation; duality theory and sensitivity.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2021

CPSC 302 - Numerical Computation for Algebraic Problems

Numerical techniques for basic mathematical processes involving no discretization, and their analysis. Solution of linear systems, including analysis of round-off errors; norms and condition number; introduction to iterative techniques in linear algebra, including eigenvalue problems; solution to nonlinear equations.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2022

CPSC 340 - Machine Learning and Data Mining

Models of algorithms for dimensionality reduction, nonlinear regression, classification, clustering and unsupervised learning; applications to computer graphics, computer games, bio-informatics, information retrieval, e-commerce, databases, computer vision and artificial intelligence.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2022

CPSC 406 - Computational Optimization

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2023

CPSC 213 - Introduction to Computer Systems

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2023

CPSC 340 - Machine Learning and Data Mining

Head Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2024

CPSC 340 - Machine Learning and Data Mining

Curtis Fox

Sitemap

Pages

Posts

portfolio

publications

talks

teaching

CPSC 110 - Computation, Programs, and Programming

CPSC 110 - Computation, Programs, and Programming

STAT 200 - Elementary Statistics for Applications

STAT 302 - Introduction to Probability

CPSC 213 - Introduction to Computer Systems

CPSC 221 - Basic Algorithms and Data Structures

CPSC 421 - Introduction to Theory of Computing

CPSC 406 - Computational Optimization

CPSC 302 - Numerical Computation for Algebraic Problems

CPSC 340 - Machine Learning and Data Mining

CPSC 406 - Computational Optimization

CPSC 213 - Introduction to Computer Systems

CPSC 340 - Machine Learning and Data Mining

CPSC 340 - Machine Learning and Data Mining