Just like how "Hello world!" becomes discrete tokens for text processing, a photograph gets chopped into image patches, and a song becomes a sequence of audio codes.
Well written, though you could tag this on to the title-> "Where Do We Go From Here?"
Well written, though you could tag this on to the title-> "Where Do We Go From Here?"