Transformers are Efficient Compilers, Provably

Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon S. Du

We propose Cybertron as a proof vehicle for transformers’ expressive ability and show that for a compilation task, transformers need only a logarithm number of parameters while any recurrent neural network needs at least a linear number of parameters.

Access abstract here