GPT-3 is a marvel of engineering due to its breathtaking scale. It contains 175 billion parameters (the weights in the connections between the “neurons” or units of the network) distributed over 96 layers. It produces embeddings in a vector space with 12,288 dimensions. And it was trained on hundreds of billions of words representing a significant subset of the Internet—including the entirety of English Wikipedia, countless books, and a dizzying number of web pages. Training the final model alone is estimated to have cost around $5 million. – Nautilus