Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Deep Learning of Immune Cell Differentiation

Published in Proceedings of the National Academy of Sciences of the United States of America, 2020

Although we know many sequence-specific transcription factors (TFs), how the DNA sequence of cis-regulatory elements is decoded and orchestrated on the genome scale to determine immune cell differentiation is beyond our grasp. Leveraging a granular atlas of chromatin accessibility across 81 immune cell types, we asked if a convolutional neural network (CNN) could learn to infer cell type-specific chromatin accessibility solely from regulatory DNA sequences. With a tailored architecture and an ensemble approach to CNN parameter interpretation, we show that our trained network (“AI-TAC”) does so by rediscovering ab initio the binding motifs for known regulators and some unknown ones. Motifs whose importance is learned virtually as functionally important overlap strikingly well with positions determined by chromatin immunoprecipitation for several TFs. AI-TAC establishes a hierarchy of TFs and their interactions that drives lineage specification and also identifies stage-specific interactions, like Pax5/Ebf1 vs. Pax5/Prdm1, or the role of different NF-κB dimers in different cell types. AI-TAC assigns Spi1/Cebp and Pax5/Ebf1 as the drivers necessary for myeloid and B lineage fates, respectively, but no factors seemed as dominantly required for T cell differentiation, which may represent a fall-back pathway. Mouse-trained AI-TAC can parse human DNA, revealing a strikingly similar ranking of influential TFs and providing additional support that AI-TAC is a generalizable regulatory sequence decoder. Thus, deep learning can reveal the regulatory syntax predictive of the full differentiative complexity of the immune system.

Recommended citation: A. Maslova, R. N. Ramirez, K. Ma, H. Schmutz, C. Wang, C. Fox, B. Ng, C. Benoist, S. Mostafavi, Immunological Genome Project. "Deep Learning of Immune Cell Differentiation". Proceedings of the National Academy of Sciences of the United States of America, 2020
Download Paper

Study of the Edge of Stability in Deep Learning

Published in Master's Thesis, 2023

Optimization of deep neural networks has been studied extensively, but our understanding of it is still very limited. In particular, it is still unclear how to optimally set hyperparameters such as the step size for gradient descent when training neural networks. We explore the issue of tuning the step size for full batch gradient descent, examining a proposed phenomenon in the literature called the “edge of stability”. This refers to a phase of training for neural networks when the training loss non-monotonically decreases over time while the sharpness (the maximum eigenvalue of the Hessian matrix) oscillates around the threshold of precisely two divided by the step size. In this thesis, we start by providing the necessary background to understand the current state of the art in this area. We then perform various experiments on the sharpness, and study its trajectory over the course of training with full batch gradient descent. Unlike the vast majority of the previous literature, we focus on investigating the sharpness with respect to the various layers of neural networks individually. We observe that different layers of a neural network have differing behavior in their sharpness trajectories. We focus on fully connected networks and examine how varying hyperparameters of the network, such as the number of layers or hidden units per layer, impact the sharpness. Finally, we explore how various architecture choices impact step size selection when training with gradient descent. We see that changing just one parameter of a deep neural network, such as the non-linear activation functions used in each layer, can greatly impact which step sizes can lead to divergence during training. This motivates further investigation into how architecture choices affect hyperparameter selection.

Recommended citation: C. Fox. "A Study of the Edge of Stability in Deep Learning". Masters Thesis, 2023
Download Paper

Nonmonotone Line Searches Operate at the Edge of Stability

Published in NeurIPS OPT Workshop, 2024

The traditional convergence analysis of Gradient Descent (GD) assumes the step size to be bounded from above by twice the reciprocal of the sharpness, i.e., the largest eigenvalue of the Hessian of the objective function. However, recent numerical observations on neural networks have shown that GD also converges with larger step sizes. In this case, GD may enter into the so-called edge of stability phase in which the objective function decreases faster than with smaller steps, but nonmonotonically. Interestingly enough, this same behaviour was already observed roughly 40 years ago when the first nonmonotone line search was proposed. These methods are designed to accept larger steps than their monotone counterparts (e.g., Armijo) as they do not impose a decrease in the objective function, while still being provably convergent. In this paper, we show that nonmonotone line searches are able to operate at the edge of stability regime right from the start of the training. Moreover, we design a new resetting technique that speeds up training and yields flatter solutions, by keeping GD at the edge of stability without requiring hyperparameter-tuning or prior knowledge of problem-dependent constants. To conclude, we observe that the large steps yielded by our method seem to mimic the behavior of the well-known Barzilai-Borwein method.

Recommended citation: C. Fox, L. Galli, M. Schmidt, H. Rauhut. "Nonmonotone Line Searches Operate at the Edge of Stability". NeurIPS OPT Workshop, 2024
Download Paper

Glocal Smoothness: Line Search can really help!

Published in Arxiv, 2025

Iteration complexities for first-order optimization algorithms are typically stated in terms of a global Lipschitz constant of the gradient, and near-optimal results are achieved using fixed step sizes. But many objective functions that arise in practice have regions with small Lipschitz constants where larger step sizes can be used. Many local Lipschitz assumptions have been proposed, which have lead to results showing that adaptive step sizes and/or line searches yield improved convergence rates over fixed step sizes. However, these faster rates tend to depend on the iterates of the algorithm, which makes it difficult to compare the iteration complexities of different methods. We consider a simple characterization of global and local (“glocal”) smoothness that only depends on properties of the function. This allows upper bounds on iteration complexities in terms of iterate-independent constants and enables us to compare iteration complexities between algorithms. Under this assumption it is straightforward to show the advantages of line searches over fixed step sizes, and that in some settings, gradient descent with line search has a better iteration complexity than accelerated methods with fixed step sizes. We further show that glocal smoothness can lead to improved complexities for the Polyak and AdGD step sizes, as well other algorithms including coordinate optimization, stochastic gradient methods, accelerated gradient methods, and non-linear conjugate gradient methods.

Recommended citation: C. Fox, A. Mishkin, S. Vaswani, M. Schmidt. "Glocal Smoothness: Line Search can really help!". arXiv preprint arXiv:2506.12648, 2025
Download Paper

Upper and lower memory capacity bounds of transformers for next-token prediction

Published in IEEE Transactions on Information Theory, 2025

Given a sequence of tokens, such as words, the task of next-token prediction is to predict the next-token conditional probability distribution. Decoder-only transformers have become effective models for this task, but their properties are still not fully understood. In particular, the largest number of distinct context sequences that a decoder-only transformer can interpolate next-token distributions for has not been established. To fill this gap, we prove upper and lower bounds on this number, which are equal up to a multiplicative constant. We prove these bounds in the general setting where next-token distributions can be arbitrary as well as the empirical setting where they are calculated from a finite number of document sequences. Our lower bounds are for one-layer multi-head decoder-only transformers and our proofs highlight an important injectivity property satisfied by self-attention. Furthermore, we provide numerical evidence that the minimal number of parameters for memorization is sufficient for being able to train the model to the entropy lower bound.

Recommended citation: L. Madden, C. Fox, C. Thrampoulidis. "Next-token Prediction Capacity: General Upper Bounds and a Lower Bound for Transformers". IEEE Transactions on Information Theory, 2025
Download Paper

DRBench: A Realistic Benchmark for Enterprise Deep Research

Published in ICLR, 2026

We introduce DRBench, a benchmark for evaluating AI agents on complex, open-ended deep research tasks in enterprise settings. Unlike prior benchmarks that focus on simple questions or web-only queries, DRBench evaluates agents on multi-step queries (for example, “What changes should we make to our product roadmap to ensure compliance with this standard?”) that require identifying supporting facts from both the public web and private company knowledge base. Each task is grounded in realistic user personas and enterprise context, spanning a heterogeneous search space that includes productivity software, cloud file systems, emails, chat conversations, and the open web. Tasks are generated through a carefully designed synthesis pipeline with human-in-the-loop verification, and agents are evaluated on their ability to recall relevant insights, maintain factual accuracy, and produce coherent, well-structured reports. We release 100 deep research tasks across 10 domains, such as Sales, Cybersecurity, and Compliance. We demonstrate the effectiveness of DRBench by evaluating diverse DR agents across open- and closed-source models (such as GPT, Llama, and Qwen) and DR strategies, highlighting their strengths, weaknesses, and the critical path for advancing enterprise deep research.

Recommended citation: A. Abaskohi, T. Chen, M. Muñoz-Mármol, C. Fox, A. V. Ramesh, É. Marcotte, X. H. Lù, N. Chapados, S. Gella, C. Pal, A. Drouin, I. H. Laradji. "DRBench: A Realistic Benchmark for Enterprise Deep Research". ICLR, 2026
Download Paper

talks

Talk 1 on Relevant Topic in Your Field

Published: March 01, 2012

This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!

Tutorial 1 on Relevant Topic in Your Field

Published: March 01, 2013

More information here

Talk 2 on Relevant Topic in Your Field

Published: February 01, 2014

More information here

Conference Proceeding talk 3 on Relevant Topic in Your Field

Published: March 01, 2014

This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.

teaching

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2015

CPSC 110 - Computation, Programs, and Programming

Fundamental program and computation structures. Introductory programming skills. Computation as a tool for information processing, simulation and modelling, and interacting with the world.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2016

CPSC 110 - Computation, Programs, and Programming

Fundamental program and computation structures. Introductory programming skills. Computation as a tool for information processing, simulation and modelling, and interacting with the world.

Teaching Assistant

Undergraduate Course, Department of Statistics, University of British Columbia, 2016

STAT 200 - Elementary Statistics for Applications

Classical, nonparametric, and robust inferences about means, variances, and analysis of variance, using computers. Emphasis on problem formulation, assumptions, and interpretation.

Teaching Assistant

Undergraduate Course, Department of Statistics, University of British Columbia, 2016

STAT 302 - Introduction to Probability

Basic notions of probability, random variables, expectation and conditional expectation, limit theorems.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2017

CPSC 213 - Introduction to Computer Systems

Software architecture, operating systems, and I/O architectures. Relationships between application software, operating systems, and computing hardware; critical sections, deadlock avoidance, and performance; principles and operation of disks and networks.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2017

CPSC 221 - Basic Algorithms and Data Structures

Design and analysis of basic algorithms and data structures; algorithm analysis methods, searching and sorting algorithms, basic data structures, graphs and concurrency.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2018

CPSC 421 - Introduction to Theory of Computing

Characterizations of computability (using machines, languages and functions). Universality, equivalence and Church’s thesis. Unsolvable problems. Restricted models of computation. Finite automata, grammars and formal languages.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2019

CPSC 406 - Computational Optimization

Formulation and analysis of algorithms for continuous and discrete optimization problems; linear, nonlinear, network, dynamic, and integer optimization; large-scale problems; software packages and their implementation; duality theory and sensitivity.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2021

CPSC 302 - Numerical Computation for Algebraic Problems

Numerical techniques for basic mathematical processes involving no discretization, and their analysis. Solution of linear systems, including analysis of round-off errors; norms and condition number; introduction to iterative techniques in linear algebra, including eigenvalue problems; solution to nonlinear equations.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2022

CPSC 340 - Machine Learning and Data Mining

Models of algorithms for dimensionality reduction, nonlinear regression, classification, clustering and unsupervised learning; applications to computer graphics, computer games, bio-informatics, information retrieval, e-commerce, databases, computer vision and artificial intelligence.

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2022

CPSC 406 - Computational Optimization

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2023

CPSC 213 - Introduction to Computer Systems

Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2023

CPSC 340 - Machine Learning and Data Mining

Head Teaching Assistant

Undergraduate Course, Department of Computer Science, University of British Columbia, 2024

CPSC 340 - Machine Learning and Data Mining

Curtis Fox

Sitemap

Pages

Posts

portfolio

publications

talks

teaching

CPSC 110 - Computation, Programs, and Programming

CPSC 110 - Computation, Programs, and Programming

STAT 200 - Elementary Statistics for Applications

STAT 302 - Introduction to Probability

CPSC 213 - Introduction to Computer Systems

CPSC 221 - Basic Algorithms and Data Structures

CPSC 421 - Introduction to Theory of Computing

CPSC 406 - Computational Optimization

CPSC 302 - Numerical Computation for Algebraic Problems

CPSC 340 - Machine Learning and Data Mining

CPSC 406 - Computational Optimization

CPSC 213 - Introduction to Computer Systems

CPSC 340 - Machine Learning and Data Mining

CPSC 340 - Machine Learning and Data Mining