This year’s International Conference on Machine Learning (ICML) is being hosted virtually online and is a great opportunity for people to attend without much cost, which is great for involving researchers not necessarily in ML. All paper presentations are pre-recorded and also provided two live Zoom sections for Q&A.

By day 3 I’ve already learned so much from the various papers, tutorials, panel discussions and mentoring sessions. In this series I decided to share some of my notes on the papers I found interesting. This list is by no means a fair selection after exhaustive perusing all 1,086 accepted papers. The papers are ordered randomly and certainly biased towards the topics of my interest. So here we go:

1. Understanding Contrastive Representation Learning through Geometry on the Hypersphere

[paper][presentation][code]

Contrastive representation learning has been one of my favorite topics recently. I experimented with contrastive losses (see here and here) and found it work like a charm for learning useful representation for other tasks without supervision.

In this paper, the authors provided elegant geometrical interpretations for contrastive objectives. They decomposed contrastive loss objectives into two quantities evaluating the geometry of the learned representation space:

  1. Alignment (closeness): do embeddings from positive pairs close to each other?
  2. Uniformity: are samples projected into the embedding space uniformly scattered?

The authors found the representation learned by optimizing contrastive objective indeed have these two properties compared to representation learned using supervised objective. Those two quantities can also be used as loss functions for a neural network to explicitly optimize, which achieved similar effect with using contrastive loss. They also showed that both alignment and uniformity are required for learning a good representation for supervised tasks.

2. Continuous Graph Neural Networks

[paper][presentation]

Authors of this paper proposed a method to solve the diminishing performance for discrete graph neural nets (GNNs) when performing forward pass through multiple propagation layers. It is known that GNNs suffer from the over-smoothing problem when there are too many GNN layers. This is because Laplacian smoothing has a tendency of making the propagated information from nodes with the same degree more similar to each other, thus overshadowing the unique features from individual nodes.

The authors devised their method Continuous GNN (CGNN), which is inspired by Neural ODE, to model the continuous dynamics on node representation. Their empirical results showed that CGNN beats graph convolution network (GCN) and graph attention networks (GAT) on standard benchmarks (Cora, Citeseer, PubMed, etc.) for semi-supervised node classification task.

Although the derivation details of how the authors came to invent CGNN is quite difficult to follow, I do find this paper to be an important advance in GNNs as it enables developing “deeper” GNNs with multiple message passing layers without losing performance and long-range dependency between nodes on a graph. For more about GCN, read my previous post here.

3. On Variational Learning of Controllable Representations for Text without Supervision

[paper][presentation]

Why can’t you do style-transfer for texts?

In this paper, the authors explained why variational autoencoder (VAE) doesn’t work on text generation through latent space manipulation. Using topological analysis, they found that the latent space (z) learned by VAE on text data has many more “holes” than the latent space learned on image data.

To mitigate this effect, they developed CP-VAE (constrain the posterior), which adds two terms on the VAE loss to 1) encourage the learned latent space to have orthogonal basis (I guess this is inspired by PCA?); and 2) make the latent space more “filled” with a structural reconstruction loss similar to a contrastive loss. With these additional terms in the loss function, the authors demonstrated the latent space of CP-VAE are more “filled” and can perform “style-transfer” on texts. While, sort of, I would argue the example the authors provided also changed the content of the input sentence a bit.

#machine-learning #icml-2020 #deep learning

Interesting papers I read from ICML 2020
1.70 GEEK