GPT-4: A Powerful Yet Imperfect Language Model

OpenAI's new GPT-4 language model showcases remarkable advances in AI, with multimodality and scan-and-solve capabilities pushing the frontier of machine understanding. However, as GPT-4 gains impressive skills, it also inherits familiar flaws. This newsletter explores what GPT-4 can do, its technical architecture, and why we should temper our enthusiasm with realism.
Technically, GPT-4 is a large transformer network trained on vast datasets of text, images, and human feedback. Its multimodal architecture processes text, images, and other structured data, which lets it generate descriptive captions, translate between modalities, and more. The scan-and-solve feature tackles worksheets and diagrams, summarizing data or solving problems. Benchmark scores show GPT-4 outperforming GPT-3.5 in language and common sense.
While remarkable, GPT-4 remains narrow and limited. It excels at specific tasks but cannot match human intelligence—it cannot reason about the world, reflect on its mistakes, or understand complex ideas. It also echoes biases and makes errors, occasionally "hallucinating" facts or spewing nonsense. As GPT-4 is applied to real-world problems, ensuring it's monitored and used safely is vital.
Availability
OpenAI is offering GPT-4 in two ways:
ChatGPT Plus, a paid subscription service for interactive demos
A developer API for integrating GPT-4 into products and services
Some companies already using GPT-4 include Duolingo, Khan Academy, and Bing.
Overall, GPT-4 signifies continuing progress in AI, but it is not human-level machine intelligence. Rather than hype GPT-4 as an unqualified breakthrough, we should consider its technical merits in context. GPT-4 may transform technologies and services, but keeping aspirations for AI grounded in scientific reality is key. The future of AI will depend on guidance from researchers and users, not just models chasing benchmark scores.