GPT-4o Unveiled: Siri Faces Unprecedented Challenge

In February of this year, Apple suddenly abandoned its long-prepared car manufacturing project, dissolving its research and development team, which sent shockwaves through the industry.

Although Apple did not explicitly reveal the reasons for this decision, a widely circulated explanation is that the company decided to pivot toward artificial intelligence. Apple believes that AI represents a bigger opportunity for the future than car manufacturing. If they do not join the AI market now, the gap between them and their competitors will only widen, placing them at a growing disadvantage.

Many readers might wonder:

Is Apple’s approach overly cautious and pessimistic? Are their concerns and doubts too severe? Is the development speed, future, and potential of artificial intelligence overly exaggerated? Is the progress of AI development truly that fast, and can it disrupt and dominate traditional applications?

Hello GPT-4o
⬆️ Hello GPT-4o (Image Credit: OpenAI)

Now, I can seriously answer this question: Apple’s concerns are indeed real, and the threats they fear are not years, months, or even days away, but are happening right now. The things Apple fears the most have already occurred.

On May 13th, OpenAI officially released its latest LLM, the GPT-4o model, where the “o” stands for “Omni,” meaning “all-powerful” in Chinese.

The human-machine interaction with GPT-4o is simpler and more natural, accepting any combination of text, audio, and images as input, and generating any combination of text, audio, and image outputs.

Although before GPT-4o, users could interact with ChatGPT through voice mode (speaking directly), the latency was relatively high, with an average delay of 2.8 seconds for GPT-3.5 and 5.4 seconds for GPT-4.

Text Evaluation among GPT-4o, GPT-4T, GPT-4, Claude 3 Opus, Gemini Pro 1.5, Gemini Ultra 1.0, and Llama3 400b
⬆️ Text Evaluation among GPT-4o, GPT-4T, GPT-4, Claude 3 Opus, Gemini Pro 1.5, Gemini Ultra 1.0, and Llama3 400b (Image Credit: Internet)

Why is the latency high? Because the whole process is completed by three independent models performing three parts of the work: the first step transcribes the user’s spoken words (audio) into text. The second step involves GPT-3.5 or GPT-4 receiving the text, understanding and thinking, and then providing an answer in the text. The third step converts the text back into audio.

This traditional method has many disadvantages, the most obvious being the long wait time between a user speaking and the AI responding, leading to a poor user experience. It also struggles to recognize and analyze the user’s tone, multiple speakers, or background noise, and cannot produce laughter, singing, or convey emotions.

GPT-4o was developed and optimized to address these shortcomings. OpenAI has trained a new model from end to end in text, visual, and audio, handling all inputs and outputs, the entire process through a single neural network, which reduces response times significantly and improves the overall experience.

Response times significantly with GPT-4o
⬆️ Response times significantly with GPT-4o (Image Credit: Internet)

So, how much can the response time be reduced? This is a key question. The answer is that GPT-4o can respond to audio inputs within 232 milliseconds, with an average response time of 320 milliseconds.

Regarding the availability of GPT-4o, OpenAI has stated that the text and image functionality of GPT-4o is available in ChatGPT from today, and this feature is currently accessible, including for free users. The most advanced voice mode feature of GPT-4o will be launched in ChatGPT Plus in the coming weeks, indicating that this feature is not free.

GPT-4o’s voice mode feature targets competitors like Apple’s Siri, Microsoft’s Cortana, and similar domestic applications such as Xiao Ai, Xiao Yi, and Tmall Genie. These applications will face a dimensional reduction-style strike, with an overall advantage equivalent to the difference between an elementary and a high school student.

Apple Siri
⬆️ Apple Siri (Image Credit: Internet)

Additionally, it must be emphasized that this is not all OpenAI aims to achieve. The company will also release separate ChatGPT client programs for Mac, Windows, and Linux platforms and introduce an AI-integrated search engine, directly challenging Google.

For years, Google has been the dominant leader in the search engine market. Although many competitors have tried to challenge its position, including Microsoft, none have managed to disrupt its dominance. However, the emergence of OpenAI and artificial intelligence could change the current landscape.

It is not an exaggeration to say this, as there are reports that Microsoft Bing’s search engine saw an increase of 40 million users in the past year after integrating Copilot. This indicates that such strategies are highly effective. Thus, Google will soon face the most formidable competitor in the search domain.

Explore the Future Microsoft Mesh and Teams Integration

Overall, Apple’s concerns have not been exaggerated but are genuinely substantial. In the foreseeable future, artificial intelligence will integrate into various traditional applications and ecosystems. OpenAI (and Microsoft) currently have a significant first-mover advantage and are rapidly evolving, consistently staying ahead.

If in the future, GPT-4o (voice mode) indeed offers a comprehensive and overwhelming advantage over Apple’s Siri, becoming the preferred voice assistant for Apple users and completely replacing Siri, then this would be a significant blow to Apple.

Therefore, the speed of AI development and its impact on traditional applications and ecosystems far exceed most people’s expectations. The competition is now incredibly fierce. Of course, Apple is not likely to sit idly by and will take measures in response. The real competition and struggle have just begun, and I will share more relevant updates as they occur. Please stay tuned.

Related:

  1. GPT-5 Approaching PhD Level by 2026, OpenAI Exec Reveals
End-of-Yunze-blog

Disclaimer: This article is created by the original author. The content of the article represents their personal opinions. Our reposting is for sharing and discussion purposes only and does not imply our endorsement or agreement. If you have any objections, please contact us through the provided channels.

Leave a Reply